Bayesian statistics in phylogenetic inference

Scheepens, J.F. (2006) Bayesian statistics in phylogenetic inference. Master's Thesis / Essay, Biology.

Preview

Text
Bio_Ma_2006_JFScheepens.CV.pdf - Published Version
Download (473kB) | Preview

Abstract

The development of traditional approaches to phylogeny inference has been governed by the trade-off between available time and accuracy of the method. The Neighbor-Joining method is the simplest and fastest of the discussed methods, but its lack of an optimality criterion (i.e. a numerical fonnula which calculates the optimal phylogeny according to the criterion) severely affects the probability to find the true phylogenetic tree. The Maximum Parsimony method includes an optimality criterion when it searches for the tree with the least nucleotide substitutions. The main problem with this method is that it thereby can underestimate the number of nucleotide changes that took place in reality such depending on the evolutionary time involved. The Maximum Likelihood method searches the tree that fits the data best by calculating likelihood values for the data regarding each possible tree. Although this approach is very accurate, it is computationally heavy and therefore restricted with regard to the number of taxa and/or number of informative nucleotide positions used in the phylogenetic inference. Bayesian inference of phylogeny, like Maximum Likelihood, includes a likelihood value, but transforms it into a posterior probability which indicates the degree of belief for a certain tree regarding the data. The procedure allows the incorporation of prior knowledge, which is subsequently updated in the posterior probability in light of the new data. Like the Maximum Likelihood method, Bayesian inference is very accurate but slow. The development of the Markov chain Monte Carlo algorithm, which estimates posterior probabilities by regarding a subset of all trees slowly converging to the optimal tree, made the use of Bayesian inference of phylogeny feasible. Because Bayesian inference with MCMC is both accurate and fast, it has gained much attention and is increasingly used in answering phylogenetic questions. Bayesian inference of phylogeny deals with some difficulties. First, the use of prior knowledge is controversial because it is thought to introduce subjectivity in the calculation which is harmful when the prior exerts much influence on the outcome of the calculation. Second, it is hard to guess when the MCMC has run long enough in order to having converged to the optimal tree. Third, Bayesian inference is said to be too liberal, i.e. presenting posterior probabilities which are too high. Tricks exist that overcome the first two points of critic, the use of prior knowledge and the length of the MCMC run. The third point, concerning the liberality of the calculation, is a more difficult problem, which will hopefully be solved in the future. Despite these difficulties, Bayesian inference is still the best method available at the moment since it is both fast and accurate.

Item Type:	Thesis (Master's Thesis / Essay)
Degree programme:	Biology
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	15 Feb 2018 07:31
Last Modified:	15 Feb 2018 07:31
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/9122

Actions (login required)

View Item