| Introduction |
| |
The analysis of large amounts of SNP data creates difficulties for the analysis of haplotypes and their association to traits of interest. Commonly fairly simple methods, such as two- or three-SNP sliding windows are used to create haplotypes across large regions, but these may be of limited value when adjacent SNPs are in strong LD and provide redundant information. We have created a novel program, “HaploBuild” for constructing and testing haplotypes for SNPs in close physical proximity to one another but which are not necessarily contiguous. Furthermore, the number of SNPs contained in the haplotype is not restricted, thereby permitting the evaluation of complex haplotype structures. |
| |
| Algorithm |
| |
The HaploBuild algorithm defines a heuristic for choosing markers that are combined as a haplotype and tested for association with a disease phenotype. Given a set of genotyped markers our algorithm works in three steps. The first step tests for association with all two-marker haplotypes where the markers in the haplotype are within some physical distance d to each other (typically 50kb). If the P-value for association of any of these two marker haplotypes is less than a specified alpha level the pair of markers is saved for step 2. The second step of HaploBuild builds a graph from each of the two-marker haplotype that reached significance in step 1. The goal of the graph construction is to iteratively add markers one at a time to the haplotype increasing the overall haplotype association significance in a depth-first manner. In this context, the source node represents the base haplotype and one of its children corresponds to a successful addition of a marker that increases the haplotype length from n to n + 1 markers. Consequently, the sink nodes of a completed tree represent full-length haplotypes that no marker, within a distance d of the haplotype markers, can be added that will strengthen the P-value for the association test. |
| |
| Required software/modules that need to be installed to run HaploBuild |
| |
Perl Modules |
| |
. Tree::Nary
. GraphViz
. Getopt::Std
. Chart::Graph::Gnuplot qw(gnuplot)
. Math::Random
|
| |
Software |
| |
. FBAT
. GraphViz (optional for graph drawing)
. R Project for Statistical Computing
. qvalue package in R
. Haploview (optional for LD calculations)(For Linux and Mac users the Haploview.jar file must be in the directory were HaploBuild will be executed from)
|
| |
| Download |
| |
Please click here to email me for a copy of the software HaploBuild |
| User Manual |
| |
HaploBuild_user_manual.pdf |
| Example Files |
| |
example_file.zip |
| Planned Updates |
| |
.Use the batch file option of fbat to greatly increase the speed of HaploBuild
.Include a chromosome number in the marker information file to accommodate multiple analysis with one set of files
.Allow for case-control study designs by using the R package haplo.stats
|