BADGER

Bayesian Analysis to Describe Genomic Evolution by Rearrangement

Version 1.02 beta, June 11, 2004

Copyright © 2004 by Bret Larget & Don Simon


Model of Genome Rearrangement

Our model of genome rearrangement has the following characteristics.
  1. Tree topology is uniformly distributed on a set of (possibly constrained) unrooted tree topologies.
  2. Each edge draws its length, which represents the expected number of gene inversions, independently from a gamma distribution for which the two hyper-parameters mu (the mean) and psi (the variance/mean) can either be selected or estimated from the neighbor-joining tree.
  3. Given the edge lengths, the number of realized inversions on each edge are mutually independent and have Poisson distributions with the respective means. The gamma prior is conjugate for the Poisson mean, and the unconditional distribution on the number of inversions per edge is negative binomial.
  4. Given a realized number of inversions per edge, the times are distributed independently and uniformly at random.
  5. All inversions are equally likely to be any possible inversion and are mutually independent.
  6. An arbitrary labeling of genes may be assigned to the genome arrangement at any node of the tree. Arrangements at each other node on the tree are then determined by the complete history of realized inversions.

The expression for the unnormalized posterior of a complete history includes a discrete component (tree topology, gene inversion counts, and an ordered list of specific gene inversions on each edge) as well as a continuous component (edge lengths and times of inversions). The unnormalized posterior for the discrete component may be computed analytically by integrating out the continuous component. The discrete component is the state space of the Markov chain for MCMC computation.

A more thorough description of the model is here. [PDF]


Back to the table of contents.


This page was most recently updated on May 25, 2004.

badger@badger.duq.edu