Gene EcSMS35_4437 details

Gene Information       Plasmid Coverage information       Fosmid Coverage information       Sequence       

Gene Information

Locus tagEcSMS35_4437 
SymbolthiH 
ID6146882 
TypeCDS 
Is gene splicedNo 
Is pseudo geneNo 
Organism nameEscherichia coli SMS-3-5 
KingdomBacteria 
Replicon accessionNC_010498 
Strand
Start bp4535266 
End bp4536399 
Gene Length1134 bp 
Protein Length377 aa 
Translation table11 
GC content55% 
IMG OID641619257 
Productthiamine biosynthesis protein ThiH 
Protein accessionYP_001746373 
Protein GI170681122 
COG category[H] Coenzyme transport and metabolism
[R] General function prediction only 
COG ID[COG1060] Thiamine biosynthesis enzyme ThiH and related uncharacterized enzymes 
TIGRFAM ID[TIGR02351] thiazole biosynthesis protein ThiH 


Plasmid Coverage information

Num covering plasmid clones30 
Plasmid unclonability p-value
Plasmid hitchhikingNo 
Plasmid clonabilitynormal 
 

Fosmid Coverage information

Num covering fosmid clones22 
Fosmid unclonability p-value0.0000208104 
Fosmid HitchhikerYes 
Fosmid clonabilityhitchhiker 
 

Sequence

Gene sequence
ATGAAAAACT TCAGCGATCG CTGGCGACAA CTGGACTGGG ACGACATCCG CCTGCGTATC 
AACGGCAAAA CGGCTGCTGA CGTAGAGCGG GCGCTAAATG CCTCGCAACT CACCCGCGAC
GATATGATGG CGCTGTTATC GCCAGCCGCC ATTGGCTATC TGGAACCACT GGCCCAACGG
GCGCAGCGTC TGACCCGTCA ACGATTTGGC AACACTGTTA GCTTCTACGT CCCGCTTTAT
CTTTCCAATC TTTGCGCTAA CGACTGCACG TACTGCGGAT TCTCTATGAG CAACCGCATC
AAGCGCAAAA CGCTGGATGA AGCGGATATT GCCAGGGAAA GCGCCGCTAT ACGGGAGATG
GGCTTTGAAC ATCTGCTTTT AGTCACTGGT GAACATCAGG CGAAAGTGGG GATGGATTAC
TTTCGTCGTC ATCTCCCTGC CCTGCGTGAA CAGTTCTCTT CACTACAGAT GGAAGTGCAA
CCGCTGGCGG AGACGGAATA CGCCGAGTTA AAGCAACTAG GTCTGGATGG CGTGATGGTT
TATCAGGAGA CATATCACGA GGCGACTTAT GCCCGCCATC ATCTGAAAGG TAAAAAACAG
GACTTCTTCT GGCGGCTGGA AACGCCGGAT CGGCTAGGGC ATGCGGGGAT TGATAAGATA
GGCCTCGGCG CGCTAATCGG CCTTTCCGAC AACTGGCGAG TTGACTGCTA TATGGTTGCC
GAACATTTGC TATGGCTGCA ACAACATTAC TGGCAAAGCC GCTACTCTGT CTCCTTCCCA
CGCCTGCGTC CATGTACTGG CGGCATTGAG CCTGCGTCGA TTATGGATGA ACGCCAGTTA
GTGCAAACCA TCTGCGCCTT CCGGCTGCTT GCACCGGAGA TTGAACTGTC ACTCTCCACG
CGGGAATCAC CGTGGTTTCG CGATCGCGTT ATTCCGCTGG CAATTAATAA CGTCAGCGCT
TTTTCGAAAA CGCAGCCAGG TGGCTATGCC GACAATCACC CCGAGCTGGA ACAGTTCTCA
CCACACGACG ATCGCAGGCC GGAAGCGGTT GCTGCCGCGT TAACCGCTCA GGGTTTGCAG
CCGGTATGGA AAGACTGGGA CAGCTATCTG GGACGCGCCT CGCAAAGGCC ATGA
 
Protein sequence
MKNFSDRWRQ LDWDDIRLRI NGKTAADVER ALNASQLTRD DMMALLSPAA IGYLEPLAQR 
AQRLTRQRFG NTVSFYVPLY LSNLCANDCT YCGFSMSNRI KRKTLDEADI ARESAAIREM
GFEHLLLVTG EHQAKVGMDY FRRHLPALRE QFSSLQMEVQ PLAETEYAEL KQLGLDGVMV
YQETYHEATY ARHHLKGKKQ DFFWRLETPD RLGHAGIDKI GLGALIGLSD NWRVDCYMVA
EHLLWLQQHY WQSRYSVSFP RLRPCTGGIE PASIMDERQL VQTICAFRLL APEIELSLST
RESPWFRDRV IPLAINNVSA FSKTQPGGYA DNHPELEQFS PHDDRRPEAV AAALTAQGLQ
PVWKDWDSYL GRASQRP