Gene EcSMS35_3939 details

Gene Information       Plasmid Coverage information       Fosmid Coverage information       Sequence       

Gene Information

Locus tagEcSMS35_3939 
Symbol 
ID6146732 
TypeCDS 
Is gene splicedNo 
Is pseudo geneNo 
Organism nameEscherichia coli SMS-3-5 
KingdomBacteria 
Replicon accessionNC_010498 
Strand
Start bp4013819 
End bp4018468 
Gene Length4650 bp 
Protein Length1549 aa 
Translation table11 
GC content53% 
IMG OID641618765 
Producthaemagluttinin family protein 
Protein accessionYP_001745904 
Protein GI170684255 
COG category[U] Intracellular trafficking, secretion, and vesicular transport
[W] Extracellular structures 
COG ID[COG5295] Autotransporter adhesin 
TIGRFAM ID 


Plasmid Coverage information

Num covering plasmid clones11 
Plasmid unclonability p-value
Plasmid hitchhikingNo 
Plasmid clonabilitynormal 
 

Fosmid Coverage information

Num covering fosmid clones42 
Fosmid unclonability p-value0.78822 
Fosmid HitchhikerNo 
Fosmid clonabilitynormal 
 

Sequence

Gene sequence
ATGAACAAAA TTTTTAAAGT TATCTGGAAT CCAGCGACGG GCAGTTACAG CGTTGCCAGC 
GAAACGGCGA AAAGTCGTGG GAAGAAGAGC GGGCGCAGTA AGCTGTTAAT TTCTGCACTG
GTTGCGGGTG GATTATTGTC GTCGTCTGGG GCATTAGCTC AGGCAGGGTT AGATACAGGT
ACTGGTGTTA CCCCTGCTGG TCATAATAAT GGAACAGGTT GGATAGCTAT TGGTACCGAT
GCTGAAGCAA GCACTCATAC CACAACGAAT GGCGCAGCAA CTGCCGTGGG CTACTACTCC
AAAGCGTTAG GTATGTGGAG TACTGCGTTA GGTGCATATA GCGAATCGAA TGGCAATGCT
TCACTGGCTC TTGGGGTTAA AGCACAATCT AACGGTGACC GCTCTATTTC GATGGGCGCT
TCGTCGAGTG CAAGCATAAA TGCGGGTTAC TCCATCGCGA TGGGGGTATT TGCTTTCACC
GATGCAGAGT ACGCGGTAGC CCTTGGCAAT GAGAGCAAGG CACTTGGTAA ATACAGCCTT
GCATTAGGAA ACGCAAGTCA GGCATCTGGC GAATCCAGTA TTGCATTAGG TAACACAAGT
GAAGCCAGCG AACAAAACGC GATTGCGCTG GGGCAAGGCA GCATTGCGAG CAAAGTGAAC
TCAATCGCGT TGGGCAGTAA CAGTTCGTCT GCAGGAGAGA ATGCCATCGC GCTAGGAGAG
GGCAGTGCCG CGGGAGGCAG CAACAGTCTA GCTTTCGGTA GTCAATCCAG GGCAAGCGGC
AATGATTCTG TCGCCCTCGG TGTAGGGGCT ACAGCAGCGA CCGACAATTC TGTCGCTATC
GGCGCAGGAT CGACCACTGA TGCAAGCAAT ACAGTTTCAG TTGGCAACAG CACAACAAAA
CGCAAAATTG TTAATATGGC TGCCGGTGCC ATAAGCAACA CCAGTACCGA TGCCATCAAC
GGCTCACAGC TTTATACGAT CAGTGATTCA GTCGCCAAAC GGCTCGGGGG AGGCGCTACT
GTAGGCAGCG ACGGCACCGT AACCGCACTA AGCTACGCGT TGAAAAGCGG CACCTATAAT
AACGTGGGTG ATGCTCTGTC AGGAATCGAC AATAATACTC TGCAATGGAA TAGAACCGCG
GGGGCATTCA CCGCTGCACA CGGCTCAAAT ACCACCAGTA AAATCACTAA TGTTGCTAAA
GGTACGGTTT CTGCAACCAG CACCGATGTA GTAAACGGCT CTCAATTGTA CGACCTGCAG
CAGGATGCTC TGTTGTGGAA CGGCACCGCG TTCAGCGCTG CACACGGCAC CGACGTCACC
AGCAAAATCA CCAATGTCAC CGCTGGTGAG CTCTCTGACA CCAGCACCGA CGCCGTCAAC
GGTTCTCAGC TGAAAGCGAC CAAGGACGAT GTGGCGGCAA ACACCACCAA CATCACTAAC
CTGACGAGCG AAGTGGCTGG CAACACCACC AGTATCACTA ACCTGACTGA TACGGTGACT
AACCTCGGTG AAGACGCCCT GAAATGGGAC GACGCCGCAG GCGCATTCAC CGCTGCACAC
GGCACTAACG CCACCAACAA AATCACCAAT GTCACCGCTG GCGAACTCTC TGATACCAGC
ACCGACGCCG TCAACGGTTC TCAGCTGAAA ACCACCAACG ATAACGTGGC GACCAACACC
ACCAATATCG CCACTAACAC CACCAATATC ACCAACCTGA CTAACGCTGT TGACAGTCTC
GGTGATGATT CCCTGCTGTG GAACAAAGCG GCTGGCGCAT TCAGCGCCGC GCACGGCACC
GATGCCACCA GCAAAATCAC CAACGTCACC GCTGGCGACC TGACTGCTGG CAGCACCGAC
GCCGTCAACG GTTCTCAGCT GAAAACCACC AACGATAACG TGGCGACCAA CACCACCAAT
ATCGCCACTA ACACCACCAA TATCACCAAC CTGACTAACG CTGTTGACAG TCTTGGTGAT
GATTCCCTGC TGTGGAACAA AGCGGCTGGC GCATTCAGCG CCGCGCACGG CACCGATGCC
ACCAGCAAAA TCACCAACGT CACCGCTGGC GACCTGACTG CTGGCAGCAC TGACGCGGTT
AACGGCTCCC AGCTGAAAAC CACCAACGAT AACGTGGCGA CCAACACCAC CAATATCGCC
ACTAACACCA CCAATATCAC TAACCTGACT GATACGGTGA ATAATCTCGG TGAAGACGCC
CTGAAATGGG ACGACGCCGC AGGCGCATTC ACCGCTGCAC ACGGTACTAA CGCCACCAAC
AAAATCAGCA ACGTACAAGC TGGCATAGTC TCCTCTGACA GCACTGACGC CATAAATGGC
TCACAACTAT ATGGTTTGGC TGATTCATTC ACGTCCTATC TGGGTGGTGG TGCTGATATT
AGCGATACAG GTGTATTAAC CGGGCCAACC TATAGTATTG GCGGCACTGA CTACACTAAC
GTCGGGGATG CTCTGGCCGC AATTAACACT TCATTTAGTG ATTCTCTCGG TGATGCCCTG
CTCTGGGATG CGACAGCCAA TGACGGTGCT GGTGCATTCA GCGCCGGTCG CGGGGTAGAT
AACACCGCCA GTAAGATTAC TAACGTCGCA AATGGTGCAA TCTCTGCCAC CAGCAGCGAC
GCGATTAACG GCTCACAACT CTATACCACC AATAAGTACA TCGCTGATGC GCTGGGCGGT
AACGCAGAAG TCAACGCTGA CGGCACTATC ACTGCGCCGA CTTACACCAT TGCAAATACC
GATTACAACA ACGTCGGTGA AGCTCTGGAT GCGCTTGATG AGAACGCGTT GCTGTGGGAT
GCGACAGCCA ATAACGGCGA AGGGGCTTAC AACGCCAGTC ATGATGGCAA AGCCAGCATC
ATCACTAATG TCGCTGATGG TAATATCGGG GAAGGCAGCA CCGATGCTAT CAACGGTTCT
CAGCTGTTTA ACACCAATAT GCTGATCCAG CAGAACAGCG AAGTCATTAA TCAGCTTGCT
GGTAACACCA GTGAAACCTA CATCGAAGAA AATGGTGCAG GTATTAACTA TGTGCGTACC
AATGACACCG GTTTAACCTT CACCGATGCC AGCGCACAGG GTGTTGGCGC TACAGCAGTG
GGTTATAACT CTGTTGCTTC CAAAGCCAGC AGCGTAGCCA TTGGTCAGGA CAGCCGCAGC
GAAGTTGAGA CGGGTATCGC CCTGGGTAGC AGTTCCGTTT CCAGCCGTTT AATAGTTAAA
GGTTCTCGTG ACACCAGCGT GTCGGAAGAA GGTGTTGTGA TTGGTTATGA CACAACTGAT
GGTGAACTGC TTGGCGCATT GTCGATCGGT GACGATGGTA AATATCGTCA AATCATCAAC
GTAGCCGATG GTTCCGAAGC CCATGACGCC GTTACGGTTC GCCAGTTGCA AAACGCTATT
GGCGCGGTCG CCACTACGCC AACCAAGTAC TATCACGCCA ACTCAACGGC AGAAGACTCA
CTGGCAGTCG GTGAAGACTC GCTGGCAATG GGCGCAAAAA CCATCGTTAA TGGTAATGCG
GGTATTGGTA TCGGCCTGAA CACTTTAGTT CTGGCTGATG CGATCAATGG TATTGCTATC
GGTAGCAACG CAAGTGCAAA CCATGCAAAC AGCATTGCAA TGGGGAATGG TTCTCAGACT
ACCCGTGGTG CGCAGACCAA CTACAGCGCC TACAACATGG ACGCACCACA GAACTCTGTG
GGTGAGTTCT CTGTCGGCAG TGAAGACGGT CAACGTCAGA TCACCAACGT CGCGGCTGGT
TCAGCGGATA CCGATGCGGT TAACGTGGGT CAGTTGAAAG TAACGGACGC GCAGGTTTCC
CAGAATACCC AGAGCATTAC TAACCTGAAC AATCAGGTGA CGAATCTGGA TACTCGCGTG
ACCAATATCG AAAACGGCAT TGGCGATATC GTAACCACCG GTAGCACCAA ATACTTCAAG
ACCAACACCG ATGGCGTAGA TGCCAACGCG CAGGGTAAAG ACAGTGTTGC AATTGGTTCT
GGTTCCATTG CTGCCGCTGA CAACAGCGTC GCGCTGGGTA CCGGTTCCGT GGCCAACGAA
GAAAACACCA TCTCTGTGGG TTCTTCTACC AACCAGCGCC GTATCACCAA CGTTGCTGCC
GGTGTTAATG CCACCGATGC GGTTAACGTT TCACAACTGA AGTCTTCTGA AGCAGGCGGC
GTTCGCTACG ACACCAAAGC TGATGGCTCT ATCGACTACA GCAACATCAC TCTCGGTGGC
GGAAATGGCG GTACGACTCG CATCAGCAAC GTTTCTGCTG GCGTGAACAA CAACGACGCG
GTGAACTATG CGCAGCTGAA GCAAAGTGTG CAGGAAACGA AGCAATACAC CGATCAGCGC
ATGGTTGAGA TGGATAACAA ACTGTCCAAA ACCGAAAGCA AGTTGAGCGG TGGTATCGCT
TCCGCAATGG CAATGACCGG TCTGCCGCAG GCTTACACGC CGGGAGCCAG CATGGCCTCT
ATTGGTGGTG GTACTTACAA CGGTGAATCG GCTGTTGCTT TAGGTGTGTC GATGGTGAGC
GCCAATGGTC GTTGGGTCTA CAAATTACAA GGTAGTACCA ATAGCCAGGG TGAATACTCC
GCCGCACTCG GTGCCGGTAT TCAGTGGTAA
 
Protein sequence
MNKIFKVIWN PATGSYSVAS ETAKSRGKKS GRSKLLISAL VAGGLLSSSG ALAQAGLDTG 
TGVTPAGHNN GTGWIAIGTD AEASTHTTTN GAATAVGYYS KALGMWSTAL GAYSESNGNA
SLALGVKAQS NGDRSISMGA SSSASINAGY SIAMGVFAFT DAEYAVALGN ESKALGKYSL
ALGNASQASG ESSIALGNTS EASEQNAIAL GQGSIASKVN SIALGSNSSS AGENAIALGE
GSAAGGSNSL AFGSQSRASG NDSVALGVGA TAATDNSVAI GAGSTTDASN TVSVGNSTTK
RKIVNMAAGA ISNTSTDAIN GSQLYTISDS VAKRLGGGAT VGSDGTVTAL SYALKSGTYN
NVGDALSGID NNTLQWNRTA GAFTAAHGSN TTSKITNVAK GTVSATSTDV VNGSQLYDLQ
QDALLWNGTA FSAAHGTDVT SKITNVTAGE LSDTSTDAVN GSQLKATKDD VAANTTNITN
LTSEVAGNTT SITNLTDTVT NLGEDALKWD DAAGAFTAAH GTNATNKITN VTAGELSDTS
TDAVNGSQLK TTNDNVATNT TNIATNTTNI TNLTNAVDSL GDDSLLWNKA AGAFSAAHGT
DATSKITNVT AGDLTAGSTD AVNGSQLKTT NDNVATNTTN IATNTTNITN LTNAVDSLGD
DSLLWNKAAG AFSAAHGTDA TSKITNVTAG DLTAGSTDAV NGSQLKTTND NVATNTTNIA
TNTTNITNLT DTVNNLGEDA LKWDDAAGAF TAAHGTNATN KISNVQAGIV SSDSTDAING
SQLYGLADSF TSYLGGGADI SDTGVLTGPT YSIGGTDYTN VGDALAAINT SFSDSLGDAL
LWDATANDGA GAFSAGRGVD NTASKITNVA NGAISATSSD AINGSQLYTT NKYIADALGG
NAEVNADGTI TAPTYTIANT DYNNVGEALD ALDENALLWD ATANNGEGAY NASHDGKASI
ITNVADGNIG EGSTDAINGS QLFNTNMLIQ QNSEVINQLA GNTSETYIEE NGAGINYVRT
NDTGLTFTDA SAQGVGATAV GYNSVASKAS SVAIGQDSRS EVETGIALGS SSVSSRLIVK
GSRDTSVSEE GVVIGYDTTD GELLGALSIG DDGKYRQIIN VADGSEAHDA VTVRQLQNAI
GAVATTPTKY YHANSTAEDS LAVGEDSLAM GAKTIVNGNA GIGIGLNTLV LADAINGIAI
GSNASANHAN SIAMGNGSQT TRGAQTNYSA YNMDAPQNSV GEFSVGSEDG QRQITNVAAG
SADTDAVNVG QLKVTDAQVS QNTQSITNLN NQVTNLDTRV TNIENGIGDI VTTGSTKYFK
TNTDGVDANA QGKDSVAIGS GSIAAADNSV ALGTGSVANE ENTISVGSST NQRRITNVAA
GVNATDAVNV SQLKSSEAGG VRYDTKADGS IDYSNITLGG GNGGTTRISN VSAGVNNNDA
VNYAQLKQSV QETKQYTDQR MVEMDNKLSK TESKLSGGIA SAMAMTGLPQ AYTPGASMAS
IGGGTYNGES AVALGVSMVS ANGRWVYKLQ GSTNSQGEYS AALGAGIQW