Table of Contents
CpGIslandHMM predicts CpG islands with Hidden Markov Models (HMMs), the Viterbi, and the Baum-Welch algorithm.
🌟 Highlights:
- CpG islands, associated with genetic promoter regions, could be reliably predicted.
- Viterbi and forward/backward algorithms are susceptible to single nucleotide changes.
- Introducing the Baum-Welch algorithm improved the prediction stability considerably.
This project consists of two main parts
CpG islands are regions with a high frequency of CpG sites. Many genes in mammalian genomes have CpG islands associated with the promoter (start of the gene). In mammals, 70% to 80% of CpG cytosines are methylated changing the expression of the gene downstream. This field of study is part of epigenetics.
Hidden Markov Model & Viterbi algorithm
We model CpG islands by creating a Hidden Markov Model (HMM) of the 1st order. The model assumes the presence of two “hidden” states: CpG island and nonCpG island. We estimate parameters of the model by calculating transition, emission and initiation probabilities from a set of sequences. Using the Viterbi algorithm we find the most likely assignment of CpG islands in the test sequence. The results show that changing a nucleotide affects neighbors because the forward/backward algorithm is used which is sequential in nature.
We estimate transition and emission probabilities of the model only based on sequences containing CpG islands without known keys. We use the Baum Welch algorithm, which is a special case of the EM (expectation maximization) algorithm. It makes use of the forward-backward algorithm to compute the statistics for our expectation step.
- The dataset was kindly received from department of Computational Biology at the University of Basel
Found a bug? Want more features? Find something missing in the documentation? Let us know! Please don't hesitate to file an issue and make a recommendation.
CpGIslandHMM - CpG island prediction with Hidden Markov Models, Viterbi, and Baum-Welch algorithm.
The MIT License (MIT)
Copyright (c) 2020 Thierry Grimm
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software ("CpGIslandHMM") and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.