Gene Moth_1664 details

Gene Information       Plasmid Coverage information       Fosmid Coverage information       Sequence       

Gene Information

Locus tagMoth_1664 
SymbolthiH 
ID3831935 
TypeCDS 
Is gene splicedNo 
Is pseudo geneNo 
Organism nameMoorella thermoacetica ATCC 39073 
KingdomBacteria 
Replicon accessionNC_007644 
Strand
Start bp1697298 
End bp1698428 
Gene Length1131 bp 
Protein Length376 aa 
Translation table11 
GC content54% 
IMG OID637829589 
Productthiamine biosynthesis protein ThiH 
Protein accessionYP_430509 
Protein GI83590500 
COG category[H] Coenzyme transport and metabolism
[R] General function prediction only 
COG ID[COG1060] Thiamine biosynthesis enzyme ThiH and related uncharacterized enzymes 
TIGRFAM ID[TIGR02351] thiazole biosynthesis protein ThiH 


Plasmid Coverage information

Num covering plasmid clones20 
Plasmid unclonability p-value0.198557 
Plasmid hitchhikingNo 
Plasmid clonabilitynormal 
 

Fosmid Coverage information

Num covering fosmid clones
Fosmid unclonability p-value0.0013298 
Fosmid HitchhikerYes 
Fosmid clonabilityhitchhiker 
 

Sequence

Gene sequence
ATGGGTTTTT ACGACGTCTA CAAGCAGTAT GAGGGGTTTG ATTTCGAAGG CTTTTTCCAG 
AGTAGGACCC CTGACGACGT CAGGAAGGCC CTGGCAAAGG AGCACCTCGA GGTAACCGAT
TACCTGACCC TTCTATCGCC CGCGGCAGGA AATTTCCTGG AGGAAATGGC CCAAAAAGCC
CACCGTATAA CCCTGAGGAA TTTCGGCCGG GTCATATTTC TCTTTACACC GTTATACCTG
TCCGACTACT GCGTGAACCA GTGCGCCTAC TGCAGTTTCA ATGCCCGGAA TAAATTTGCC
CGGACCAAGC TCACCTTAGA GCAGGTCGAA GAAGAAGCCA GGGCCATAGC CCAAACAGGA
ATGAAAGATA TCCTCATCCT GACGGGAGAA TCGCGCCAGC ACAATCCGGT GTCGTATATA
AAGGACTGCG TCGGTGTTTT AAAGAAGTAT TTCTGCAGTA TTTGCATAGA AGTCTATCCC
CTGGAAGAAG AGGAGTACCG GGAGCTGGTA GCAGCCGGGG TGGATGGCCT CACCATGTTT
CAGGAAGTCT ATGACCCCGG AGTCTACGCC AGGTACCATA ACGGTCCCAA GAAAAATTAC
CATTACCGGC TGGACGCCCC GGAAAGGAGC TGCCGGGCGG GTATGCGGAC CGTGGGTGTC
GGGGCCCTGC TGGGCCTGGC CGACTGGCGG AAGGAGGCCT TCTTCACCGG ACTGCACGCC
GATTATTTGC AGCAAAAGTT CTGGGATGTG CAGGTCAGTA TCTCTTTGCC CAGATTTCGC
CCTAGTATCG GCGGCTTTCA ACCCGACTAC CCGGTGGACG ACAAGAGCTT CGTCCAGATC
CTCCTGGCCC ACAGGCTGTT TTTACCCCGG GTCGGCATAA CCATTTCCAC CAGGGAAAGC
CCCGAGTTCC GGGACAACAT CCTACCCCTG GGTGTCACGA AAATATCGGC CGGTTCTTCC
GTTACGGTGG GAGGCTATGC CCGTCCTGAC GGCATGGCAC CCCAGTTTGA AATATCCGAC
CCGCGTAGTG TAGCGGAAAT AAAACAAATG CTAATCCAGA AGGGCTACCA GCCGGTTTTC
GAAGACTGGC AGCAGTGGGA TAGCCTGGAG AAACAGCTAT ATAATTTCTA G
 
Protein sequence
MGFYDVYKQY EGFDFEGFFQ SRTPDDVRKA LAKEHLEVTD YLTLLSPAAG NFLEEMAQKA 
HRITLRNFGR VIFLFTPLYL SDYCVNQCAY CSFNARNKFA RTKLTLEQVE EEARAIAQTG
MKDILILTGE SRQHNPVSYI KDCVGVLKKY FCSICIEVYP LEEEEYRELV AAGVDGLTMF
QEVYDPGVYA RYHNGPKKNY HYRLDAPERS CRAGMRTVGV GALLGLADWR KEAFFTGLHA
DYLQQKFWDV QVSISLPRFR PSIGGFQPDY PVDDKSFVQI LLAHRLFLPR VGITISTRES
PEFRDNILPL GVTKISAGSS VTVGGYARPD GMAPQFEISD PRSVAEIKQM LIQKGYQPVF
EDWQQWDSLE KQLYNF