BGF User's Guide
This is the help document for BGF(Beijing Gene Finder) Web Service. BGF is an ab initio gene finding program based on DP(Dynamic Programming) & HSMM(Hidden Semi-Markov
Model).
Input
BGF takes sequence only in FASTA format.
The following is an example:
>MT000349AGCTATCAGCTTATCACCACACACAGACACAGAAGGAAATGTCGGCTTCTGCGGCGCCCACGCACATCCGCTTCTCCTCCGCCGCACCTCCATCGGCCGCCGCTCTCCGGCGGCCTCGCCGCCGGTGCGCCACGCCCGTCCGATGCTCCCTCGCCGCAGCGCCGGGTCTCCGGGCGCCGCCTGAGCTCATCGACTCCATCCTCTCCAAGGTGATGCTGAAACATCATCTCTCTGCTCTTCTTGGGTGCACTCTTTCCGGATTCCTAGCTTCTGCTCAGTGCTCGGTTTATGCAATCTTGTACTCCACATTAGAACACATTTCTTGACTCAGAAATTAACGAGTTCTGGTGTTCCATAGTACTAATTTTAACCTCAATTAGTTTTTTCTTATCTTACACCAAAAAAATTCGGATTTTTGTATAAATCCTGGAGAATATTCCTTATTATTGGTTTCCAATTACTTACCTTTTTCCTTCTGTCATTTTCAGTTTCTTGTATAT..................................GGATCCGTGTATTCCTGCAACACAGTGCCTTGCGCTATGGTCTTTTTTTCTGAGAGCTTGTACTTGTACCTGTAGAATGTAGTGTATGCATCAAGCTGCTGCTACTGAATAAAAGAAAAAAGAAAAATATATGTTGTGGGTTGGGCTGAATGCCTGTACCCCATGAGCACAGGATGCTCTCGATCATTGAGCGTGCTGTGCACGTCGTGGGCCTCCAACTAAAACTGTAATCATCCTTGGGCAGAAGACGGCAGAAATCTTGAACTTTTTGTTTTGTCTTGTTCTTCGGCTGATAATGCTGCTTCTTCTGATAACAATTGCCCCTGGAAATGCTAATAATGTAAGAAGAGCACTGCTATACGT
How to run it
First, you should paste your sequence
into the Sequence box in FASTA
format. SNP characters are allowed and will be translated to one of the letters
they presented random. Specially, `N' will be always translated to `C' to avoid
stop codons unexpected. Alternatively, you can upload
a sequence file in your computer by click File upload button. Then, you can
choose Species and press Submit
button to run BGF. If your sequence contains bunches of `N's inside, which
indicate an assemble gap is there, you should set the Gap(`N's)
option to the indicator `N's length.
BGF outputGene# - predicted gene number, starting from start of sequence;
S - DNA strand (+ for direct or - for complementary);
Exon# - predicted exon number in current gene;
Type - type of coding sequence or transcription site:
Init - First (starting with start codon) Intr - internal (internal exon) Term - last coding segment, ending with stop codon) Sngl - single exon gene; Prom - TSS (TATA-box or cap site); PolA - PolyA signal site; Start/End - position of start or end of the Type;
ORF_S/E - positions where the first complete codon starts and the last codon ends;
Score - exon score for the Type;
Len - length of current exon;
For example
Program : BGF
Version :
Time : Sun Jan
15 11:48:07 2006
Parameter : Rice
Sequence : MT000349
Length : 10813
GC% :
43.29
Total
Genes: 3 (
Total Exons: 18
(
Gene# S Exon# Type
Start
End ORF_S ORF_E Score Len
===== =
===== ==== ======= = ======= ======= = ======= ======= ======
1 + 1 Intr 27 - 124 29 - 124 1.04 98
1 + 2 Intr 598 - 721 598 - 720 8.08 124
1 + 3 Intr 907 - 1083 909 - 1082 4.89 177
1 + 4 Intr 1198 - 1259 1200 - 1259 8.05 62
1 + 5 Intr 1631 - 2030 1631 - 2029 0.05 400
1 + 6 Intr 2264 - 2295 2266 - 2295 5.06 32
1 + 7 Intr 2709 - 2851 2709 - 2849 8.26 143
1 + 8 Intr 3084 - 3150 3085 - 3150 15.12 67
1 + 9 Intr 3253 - 3330 3253 - 3330 12.25 78
1 + 10 Intr 3448 - 3593 3448 - 3591 4.16 146
1 + 11 Term 3839 - 3878 3840 - 3878 4.32 40
1 + PolA
4052 -
-1.87
2 - PolA
4797 -
-0.27
2 - 1 Term 4915 - 5117 4915 - 5115 9.50 203
2 - 2 Intr 5587 - 5729 5588 - 5728 6.02 143
2 - 3 Intr 5958 - 6044 5960 - 6043 7.20 87
2 - 4 Intr 6862 - 7037 6864 - 7037 5.55 176
2 - 5 Init 7454 - 7552 7454 - 7552 14.01 99
2 - Prom 7872 -
-4.24
3 + Prom 7922 -
-5.79
3 + 1 Init 8043 - 8487 8043 - 8486 12.66 445
3 + 2 Term 9433 - 9497 9435 - 9497 -0.25 65
3 + PolA
9996 -
0.48
Predicted
protein(s):
>BGF: Gene:1 Exon(s):11 AA:454 Chain+ H-T+
TEGNVGFCGAHAHPLLLRRTSIGRRSPAASPPVKGTDRGVLLPKDGHQEVADVALQLAKY
CIDDPVKSPLIFGEWEVVYCSVPTSPGGLYRTPLGRLIFKTDEMAQVVQAPDVVKNKVSF
SVFGFDGAVSLKGKLNVLDGKWIQVIFEPPEVKTNEHGYGFLVNPAMKLLLLVYTVFARR
FQHFCRQLLVTEHFWIYEHRQISIKRSRLFQTSKCISIADMPPPACSNVLYGDRTCTVEK
SPLEKENAFLEKPSCSSPHPRRGGVPSSSRVSRLLDGGVAVELPLWDKRSKYSAQSVRAM
PMRVLTVGKKRSRGAQLIVEEYKEKLGYYCDIEDTLIKSNPKLTSDVKVQVEAEDMAMML
QLKPEDFVVVLDENGKDVTSEQVADLVGDAGNTGSSRLTFCIGGPYGFGLQVRERADATI
RLSSMVLNHQVALIVLMEQLYRAWTIIKGQKYHH
>BGF: Gene:2 Exon(s):5 AA:235 Chain- H+T+
MAEADAQTQSRAHSSTAAPVAGETAGEPVGFPQNGAINGAPLMFPVMYPMLMTGMHPQQS
LDDQAQGPGIYAIQQNQFMGSTLMPLTYRIPTESVGAVAGEEQAQDARQQHGPQRQVVVR
RYQTGAITPLLRWLQRAGGAAARPPQAPARPENRAPLAAQNDGNVQPPGGNLADPANNDQ
AAENQEPGAAAANENQQEVDGEGNRRNWLGGVFKEVQLIVVGFVASLLPGFQHND
>BGF: Gene:3 Exon(s):2 AA:169 Chain+ H+T+
MARLLSRTLALARADSAAVPSYGRLHVRGVSSKVEFIEIDLSSEDAPSSSSSSGVEGGGF
GPREMGMRRLEDAIHGVLVRRAAPEWLPFVPGGSYWVPEMRRGVAADLVGTAVRSAIGAA
WNAEAMTEEEMMCLTTMRGWPSEAYFVEDCLEPAVVGWASCLGSFVYMG
Reference
[1] Bellman, R., Dynamic Programming,
[2] Bellman, R., Dreyfus, S. E., Applied Dynamic Programming,
[3]
[4]
[5] Burset, M. and Guig'o,
R., Evaluation of gene structure prediction programs, Genomics, 34
(1996) 353-367.
[6] Fickett, J. W., Finding genes by computer: the
state of the art, Trends in Genet., 12 (1996)
316-320.
[7] Krogh, A. et al., A hidden Markov model that finds genes in E.coli DNA, Nucleic Acids Research, 22 (1994)
4768-4778.
[8] Krogh, A. et al., Hidden Markov Models in computational biology
applications to protein modeling, J. Mol. Biol., 235 (1994) 1501-1531.
[9] Mood, A. M. and Graybill, F. A., Introduction
to the Theory of Statistics, 2nd ed.,
[10] Rabiner, L. R. and Juang,
B. H., An introduction to Hidden Markov Models, IEEE ASSP Magazine, 3
(1986) 4-16.
[11] Rabiner, L. R., A tutorial on Hidden Markov
Models and selected applications in speech recognition, Proceedings on the
IEEE, 77 (1989) 257-286.
[12] Waterman, M. S., Introduction to Computational Biology, Maps, sequences
and genomes, Chapman & Hall, London, 1995.
[13] Fickett JW., Tung CS., Assessment of protein coding measures, Nucleic Acids Res. 1992 Dec
25;20(24):6441-50. Review.
[14] Hui-min Xie, DP and
HMM (Unpublished note).
[15] Hui-min Xie, A Note for Alpha, Beta & Gamma
(Unpublished note).
[16] Hui-min Xie, A
Experiment on HMM (Unpublished note).
[17] Wei-Mou Zheng, Genomic
signal enhancement by clustering, Commun. Theor. Phys. 39 (2003) 631.
[18] Wei-Mou Zheng, Finding
Signals for plant promoters, Geno., Prot. & Bioinfo. 1 (2003) 68.
[19] Wei-Mou Zheng, Genomic
signal search by dynamic programming, Commun. Theor. Phys. 39 (2003) 761.
[20] Tao Jiang, Ying Xu, Michael Q. Zhang, Current
Topics in Computational Molecular Biology, Tsing Hua press and MIT press
Authors : Jin-song Liu, Zhao Xu
Tutors : Bai-lin
Hao, Hui-min Xie, Wei-mou Zheng,
Guo-ying Li, Jun Wang
Partners: Lin Fang, Jiao Jin, Lei Gao, Heng Li, Hai-hong Li
Yan Li, Zi-xing
Xing, Qi-zhai Li, Shao-gen Gao