Bayesian Analysis to Describe Genomic Evolution by Rearrangement

Version 1.02 beta, June 11, 2004

Copyright © 2004 by Bret Larget & Don Simon

Running the programs

The main program for sampling trees and parameter values by Markov chain Monte Carlo is badger. The program expects input to be directed from a run control file, which specifies the name of a data file, an optional tree file, and other run characteristics. If the run control file is named defaultrc, the program is called by:

badger < defaultrc

The program may be run in background.

badger < defaultrc &

Information on the progress of the run is written to standard output (normally the screen, unless redirected). The program badger opens between eight and ten files for output, depending on how the run controls are set.

The root of these files may be changed in the run control file.

To easily do multiple runs with the same run controls, but with different seeds for the random number generator, use the program genrc. For example, the command

genrc -i default.base.rc -r default.rc -n 4

will generate 4 run control files (default.rc.0, default.rc.1, ...) all using the run controls of default.base.rc but using different seeds.

After doing a run, the program summarize may be used to tabulate the results. The command

summarize -s 200 > run1.sum

skips the first 200 lines of, counts and tabulates the number of times each tree topology appears, automatically defines clades and tabulates the transitions between clade subtree topologies, reports the posterior probability of each internal node from the most probable tree topology, summarizes the posterior ignoring differences within clades. and finds a list of the common clades.

A chart to compare the frequencies the common clades from different summary files can be made, using the chart program:

chart run1.sum run2.sum run3.sum run4.sum

Make sure to be familiar with methods of assessing MCMC convergence before summarizing the run output for inference.

