Eva K. Lee, Todd Easton, and Ellis Johnson
July 2001
In this paper we present theoretical results on a problem we refer to as the minimum weight common mutated sequence (MWCMS) problem. Our motivation for first defining this problem arose from the desire to help quantify the concept of ``best'' representative sequence in the evolutionary distance problem. Given a set of input sequences, the MWCMS problem seeks to mutate every input sequence to the same apriori-unknown sequence using the operations of insertion, deletion and substitution. Weights are assigned for each operation, and the total weight associated with all mutations is to be minimized. We construct a multi-layer supergraph which allows us to model the MWCMS problem and perform theoretical investigations into the complexity of this problem. In particular, we prove that MWCMS is NP-complete. Furthermore, a conflict graph is generated based on the multi-layer supergraph such that a solution to the associated node-packing problem of the conflict graph corresponds to a solution of the MWCMS problem. In this case, we prove that when the number of input sequences is a constant, MWCMS is polynomial-time solvable. We also establish that some well-known combinatorial problems can be viewed as special cases of the MWCMS problem. In particular, we present theoretical results implied by the MWCMS theory for the minimum weight supersequence problem, the minimum weight superstring problem, and the longest common subsequence problem.
Key words: Minimum Weight Common Mutated Sequence, Conflict graph, Node-Packing Polytope, Multiple DNA Sequencing problem.