Gene EcSMS35_4043 details

Gene Information       Plasmid Coverage information       Fosmid Coverage information       Sequence       

Gene Information

Locus tagEcSMS35_4043 
Symbol 
ID6145093 
TypeCDS 
Is gene splicedNo 
Is pseudo geneNo 
Organism nameEscherichia coli SMS-3-5 
KingdomBacteria 
Replicon accessionNC_010498 
Strand
Start bp4133168 
End bp4134661 
Gene Length1494 bp 
Protein Length497 aa 
Translation table11 
GC content52% 
IMG OID641618868 
Productsulfatase 
Protein accessionYP_001746006 
Protein GI170681583 
COG category[P] Inorganic ion transport and metabolism 
COG ID[COG3119] Arylsulfatase A and related enzymes 
TIGRFAM ID 


Plasmid Coverage information

Num covering plasmid clones29 
Plasmid unclonability p-value
Plasmid hitchhikingNo 
Plasmid clonabilitynormal 
 

Fosmid Coverage information

Num covering fosmid clones49 
Fosmid unclonability p-value
Fosmid HitchhikerNo 
Fosmid clonabilitynormal 
 

Sequence

Gene sequence
ATGAAACGCC CCAATTTTCT GTTCATCATG ACCGATACCC AGGCCACCAA TATGGTCGGT 
TGCTATAGCG GTAAGCCGCT GAATACGCAA AATATTGATA GTCTGGCGGC GGAAAGTATT
CGCTTTAATT CCGCCTATAC CTGTTCACCG GTTTGTACAC CAGCGCGCGC CGGGCTGTTT
ACCGGTATCT ACGCTAACCA GTCCGGCCCG TGGACCAACA ACGTCGCGCC GGGCAAAAAC
ATCTCCACTA TGGGACGCTA CTTTAAGGAT GCGGGCTATC ACACCTGTTA CATCGGCAAA
TGGCATCTCG ACGGTCATGA CTATTTCGGC ACTGGCGAGT GTCCGCCGGA GTGGGACGCT
GATTACTGGT TCGATGGGGC GAACTACCTT AGCGAACTGA CGGAAAAAGA GATCAGCCTG
TGGCGCAATG GCCTAAACAG CGTTGAGGAT TTACAGGCGA ACCATATCGA CGAAACCTTC
ACCTGGGCGC ATCGCATCAG CAATCGGGCG GTAGATTTTC TGCAACAGCC CGCGCGCGCC
GAGGAACCCT TCCTGATGGT GGTTTCGTAT GATGAGCCGC ATCACCCGTT CACCTGTCCG
GTGGAGTATT TAGAGAAATA CGCTGATTTT TACTACGAAC TTGGCGAGAA ATCACAGGAT
GACCTGGCGA ACAAACCGGA ACATCACCGC TTATGGGCGC AGGCGATGCC ATCGCCAGTC
GGTGATGACG GGCTTTATCA CCATCCGCTC TATTTTGCCT GCAATGACTT TGTTGATGAC
CAAATCGGAC GGGTCATCAA TGCCTTAACG CCAGAGCAAC GTGAAAATAC GTGGGTCATT
TATACCTCCG ATCACGGTGA AATGATGGGC GCACATAAGT TGATCAGTAA AGGAGCGGCG
ATGTATGACG ACATCACCCG TATTCCGCTG ATCATCCGTT CGCCGCAAGG GGAGCGGCGG
CAGGTCGATA CGCCAGTCAG TCATATCGAT TTACTGCCGA CAATGATGGC GCTGGCAGAT
ATTGAAAAAC CAGAGATTCT GCCGGGGGAA AATATCCTTG CCGTGAAAGA GCCACGCGGC
GTAATGGTGG AATTTAACCG CTACGAGATT GAGCATGACA GCTTTGGCGG TTTTATTCCG
GTGCGTTGCT GGGTGACGGA TGACTTTAAA CTGGTACTCA ACCTCTTCAC CAGTGATGAA
CTTTACGATC GCCGTAATGA CCTAAATGAA ATGCATAATC TGATCGATGA TATCCGTTTT
GCCGACGTTC GCCGCAAAAT GCATGACGCC TTATTGGATT ACATGGATAA AATTCGCGAT
CCGTTCCGCA GTTACCAATG GAGTCTGCGT CCGTGGCGTA AAGATGCACG ACCGCGCTGG
ATGGGGGCGT TTCGTCCACG CCCACAAGAT GGCTATTCGC CAGTGGTACG CGACTATGAC
ACCGGCCTAC CGACACAAGG GGTGAAGGTG GAAGAGAAAA AACAGAAGTT CTGA
 
Protein sequence
MKRPNFLFIM TDTQATNMVG CYSGKPLNTQ NIDSLAAESI RFNSAYTCSP VCTPARAGLF 
TGIYANQSGP WTNNVAPGKN ISTMGRYFKD AGYHTCYIGK WHLDGHDYFG TGECPPEWDA
DYWFDGANYL SELTEKEISL WRNGLNSVED LQANHIDETF TWAHRISNRA VDFLQQPARA
EEPFLMVVSY DEPHHPFTCP VEYLEKYADF YYELGEKSQD DLANKPEHHR LWAQAMPSPV
GDDGLYHHPL YFACNDFVDD QIGRVINALT PEQRENTWVI YTSDHGEMMG AHKLISKGAA
MYDDITRIPL IIRSPQGERR QVDTPVSHID LLPTMMALAD IEKPEILPGE NILAVKEPRG
VMVEFNRYEI EHDSFGGFIP VRCWVTDDFK LVLNLFTSDE LYDRRNDLNE MHNLIDDIRF
ADVRRKMHDA LLDYMDKIRD PFRSYQWSLR PWRKDARPRW MGAFRPRPQD GYSPVVRDYD
TGLPTQGVKV EEKKQKF