A linear time algorithm for the PPH problem


In this page we present a new approach to haplotype inference under the perfect phylogeny model. Usually, this problem is known as Perfect Phylogeny Haplotype Problem (pph-problem, in short).

PPH-problem can be formally defined as follows: finding, if possible, a completion H of a given incomplete matrix such that H represents a collection of sets that has a tree representation.

Linear time algorithms that solve  this problem are already known. However, our approach is quite different because it is based on a graph characterization of incomplete matrices. More precisely, we define a partial order among columns of incomplete matrices such that satisfies specific properties. The construction of the poset proceeds incrementally and led, if the input  matrix admits a tree completion, to a graph containing  every possible completion of the matrix. Finally, a  completion can be obtained in linear time from the given graph.

A reference implementation of the algorithm and some examples of input data are given here.