Computational Advances in Maximum Likelihood Methods for Molecular Phylogeny

Abstract

We have developed a generalization of Kimura’s Markov chain model for base substitution at a single nucleotide site. This generalized model incorporates more flexible transition rates and consequently allows irreversible as well as reversible chains. Because the model embodies just the right amount of symmetry, it permits explicit calculation of finite-time transition probabilities and equilibrium distributions. The model also meshes well with maximum likelihood methods for phylogenetic analysis. Quick calculation of likelihoods and their derivatives can be carried out by adapting Baum’s forward and backward algorithms from the theory of hidden Markov chains. Analysis of HIV sequence data illustrates the speed of the algorithms on trees with many contemporary taxa. Analysis of some of Lake’s data on the origin of the eukaryotic nucleus contrasts the reversible and irreversible versions of the model.

The object of phylogenetic analysis is to infer the correct evolutionary relationships among three or more contemporary species from amino acid or nucleotide sequence data on a representative member of each species. As more nucleotide sequence data accumulate from a multitude of contemporary species and as computers become ever faster, phylogenetic methods should evolve accordingly. In our opinion, properly implemented maximum likelihood methods are the best vehicles for statistical inference. First, maximum likelihood methods use all of the available data. Second, they can incorporate realistic evolutionary models. (They can also incorporate naive models.) Third, they permit estimation of parameters and their associated standard errors. Fourth, and finally, they allow comparison of different evolutionary trees through their maximum likelihood statistics. Competing methods of analysis, such as maximum parsimony, involve much less computation but neglect relevant portions of the data. They also ordinarily do not yield reliable parameter estimates.

In this paper we present a generalization of Kimura’s model for base substitution at a single nucleotide site. Our generalization strikes a good compromise between parameter richness and analytic tractability. By including just enough symmetry, we can extend Kimura’s arguments for deducing finite-time transition probabilities from his two-parameter model to our eight-parameter model. We also describe an algorithm that simultaneously calculates the likelihood and its derivatives quickly. These calculations can be implemented on a parallel computer for extremely fast analysis of many evolutionary trees. We demonstrate the feasibility of the model and the new likelihood algorithms by analyzing HIV sequence data and RNA sequence data bearing on the origin of the eukaryotic nucleus.

The model adopted in this paper specifies the expected branch length separating a mother and a daughter node of an evolutionary tree as the average number of substitutions per nucleotide position (site) between the nodes. This expectation is a function of the conditional probability p_ij (t) = Pr (Z_t = j|Z ₀ = i) that the daughter node occupies state j at time tgiven that the mother node occupies state i at time 0. Herei and j are any of the four possible nucleotides, namely, the purines A (adenine) and G (guanine) and the pyrimidines C (cytosine) and T (thymine). The probabilities p_ij (t) are entries of the finite-time transition matrix P(t) characterizing the Markov chain Z_t of successive nucleotides occurring along the branch at the given site. Under the Markov chain model, the probability of the observed nucleotides at the site depends on (1) the rooted tree connecting the contemporary taxa, (2) the distribution of nucleotides at the root, and (3) the matrix functionP(t). Phylogenetic reconstruction using maximum likelihood (Felsenstein 1981) attempts to infer the correct tree by maximizing the likelihood of the observed data with respect to entries of the various matrices P(t) or parameters determining these entries. The tree yielding the largest maximum likelihood is declared to be the best tree.

For the purposes of this paper, we make the simplifying assumption that the nucleotide patterns at different sites are independently and identically distributed. Many models for the 4 × 4 matrixP(t) have been proposed. Barry and Hartigan (1987)place no restrictions on P(t). Because all row sums of P(t) equal 1, this entails 12 parameters per branch. In practice, this many parameters can not be reliably estimated, even though, in some cases, the tree topology and branch lengths can be. Beginning with Kimura (1980) and Felsenstein (1981), various researchers have constructed models that capture the changes occurring at each site as a continuous-time Markov chain involving the four states A, G, C, and T. These time-homogeneous models are nicely summarized by Yang (1994). If in this contextω _ij is the infinitesimal transition rate of moving from state i to state j ≄′i and if ω_ii = −Σ_{j ≄′ i}ω_ij , thenP(t) can be expressed as the matrix exponential

P (t) = e^{t Ω} = \sum_{k = 0}^{\infty} \frac{t^{k}}{k!} Ω^{k}

of the infinitesimal generator (or infinitesimal transition matrix) Ω = (ω_ij ). Formula 1 permits, in principle, estimation of the off-diagonal entries of Ω and a branch timet_b for each branch b of an evolutionary tree. Because the branch times are confounded with the infinitesimal transition rates, it is necessary to set one arbitrary branch time equal to 1 and to determine all other branch times relative to this reference branch time.

In practice, numerical evaluation of matrix exponential 1 is a major computational bottleneck hindering maximum likelihood estimation (Yang 1994). Thus, models such as Kimura’s (1980) that yield analytic expressions for the entries of P(t) are extremely valuable. Kimura’s highly symmetric two-parameter model incorporates a common infinitesimal rate α for all evolutionary transitions (purine to purine or pyrimidine to pyrimidine) and a common infinitesimal rate λ for all evolutionary transversions (purine to pyrimidine or pyrimidine to purine). the underlying Markov chain along any branch is reverisble.

The Generalized Kimura Model

The infinitesimal generator under our generalization of Kimura’s model is

{ label needed for disp-formula[@id='E2'] }

Λ = \begin{matrix} A \\ G \\ C \\ T \end{matrix} (\begin{matrix} A & G & C & T \\ - (α + γ + λ) & α & γ & λ \\ ε & - (ε + γ + λ) & γ & λ \\ δ & κ & - (δ + κ + β) & β \\ δ & κ & ς & - (δ + κ + ς) \end{matrix})

This model obviously makes fewer biological assumptions than Kimura’s model. In essence, we impose the constraint that the infinitesimal rate for a transversion depends only on the destination state and not on the initial state. In contrast, Lake’s model (Lake 1987) of evolutionary parsimony imposes the opposite constraint that the infinitesimal rate for a transversion depends only on the initial state and not on the destination state. Oddly enough, Lake’s model does not lead to analytic expressions for the entries ofP(t), whereas ours does.

Kimura’s two-parameter model is a special case of the model determined by Equation { label needed for disp-formula[@id='E2'] }. In view of Kolmogorov’s circulation criterion (Kelly 1979), it is easy to verify that Λ in Equation { label needed for disp-formula[@id='E2'] } leads to a reversible Markov chain if and only if

{ label needed for disp-formula[@id='E3'] }

β γ = λ ς, α δ = ε κ

When these reversibility conditions hold, Λ parametrizes the special case of Tavare’s (1986) general reversible model discussed by Tamura and Nei (1993).

Other special cases of the infinitesimal generator Λ appear in the dnaml program of Felsenstein’s computer package PHYLIP (Kishino and Hasegawa 1989; Felsenstein and Churchill 1996) and in the nucml program of the MOLPHY package (Hasegawa et al. 1985; Adachi et al. 1996). PHYLIP depends on two parameters k and u and the root probabilities (π_A, π_G, π_C, π_T). These five independent parameters define the entries of Λ as follows:

α = (\frac{k}{π_{A} + π_{G}} + 1) π_{G} u

β = (\frac{k}{π_{C} + π_{T}} + 1) π_{T} u

ε = (\frac{k}{π_{A} + π_{G}} + 1) π_{A} u

ς = (\frac{k}{π_{C} + π_{T}} + 1) π_{C} u

(γ, λ, δ, κ) = u (π_{C}, π_{T}, π_{A}, π_{G})

Equivalently,

u = δ + κ + λ + γ

k = \frac{(α - κ) + (ε - δ) + (β - λ) + (ς - γ)}{2 (δ + κ + γ + λ)}

(π_{A}, π_{G}, π_{C}, π_{t}) = u^{- 1} (δ, κ, γ, λ)

In practice, PHYLIP either requires the user to specify the components of the stationary distribution π or estimates these by the average nucleotide frequencies observed in the existing organisms. The two remaining parameters u and k are subject to the constraints u > 0,k + π_A + π_G > 0, andk + π_C + π_T > 0. Note that the PHYLIP model satisfies the reversibility criteria (Equation { label needed for disp-formula[@id='E3'] }).

If branch lengths are unconstrained and equilibrium has been reached, then phylogenies with underlying reversible Markov chains can not be rooted (Felsenstein 1981). This pulley principle of Felsenstein fails if any of the three necessary conditions mentioned are violated. Thus, in principle, methods based on the irreversible matrix Λ can single out a best rooted tree when sufficient data are available. We will comment further on this point in our application to Lake’s data.

In calculating the finite-time transition matrix P(t) for the infinitesimal generator Λ, we pattern our arguments afterKimura (1980), who exploits symmetry and derives two coupled linear differential equations for the entries of P(t). In our calculations, it is convenient to let Y denote either pyrimidine and R either purine. We defineq_AY (t) to be the probability the chain is in either of the two pyrimidines C or T at timet given it starts in A at time 0. We likewise defineq_YA (t) to be the probability the chain is in A at time t given it starts in either pyrimidine at time 0 and q_YY (t) to be the probability the chain is in either pyrimidine at time t given it started in either pyrimidine at time 0. These probabilities all make sense owing to the symmetry assumptions incorporated in Λ. Similar probabilities can be defined for the purines instead of the pyrimidines.

The probability q_AY (t + h) satisfies the Chapman–Kolmogorov relation

q_{AY} (t + h) = q_{AY} (t) q_{YY} (h) + \sum_{k \notin Y} p_{Ak} (t) q_{kY} (h)

= q_{AY} (t) (1 - q_{YA} (h) - q_{YG} (h))

+ p_{AA} (t) q_{AY} (h) + p_{AG} (t) q_{GY} (h)

Here p_AA (t) andp_AG (t) are entries of the matrixP(t). For small h, equation 6 becomes

q_{AY} (t + h) = q_{AY} (t) (1 - δ h - κ h) + p_{AA} (t) (γ + λ) h + p_{AG} (t) (γ + λ) h + o (h)

= q_{AY} (t) + (λ + γ) h - (δ + κ + λ + γ) q_{AY} (t) h + o (h)

Rearranging terms and taking the limit as h → 0 in the resulting difference quotient yields the forward differential equation

q ‘_{AY} (t) = c_{2} - c_{1} q_{AY} (t),

where

c_{1} = δ + κ + γ + λ

c_{2} = γ + λ

In view of the initial condition q_AY (0) = 0, equation (7) has solution

{ label needed for disp-formula[@id='E8'] }

q_{AY} (t) = \frac{c_{2}}{c_{1}} (1 - e^{- c_{1} t})

We next solve for p_AG (t) by writing the forward approximation

p_{AG} (t + h) = p_{AG} (t) (1 - ε h - γ h - λ h) + q_{AY} (t) κ h

+ [1 - q_{AY} (t) - p_{AG} (t)] α h + o (h)

Repetition of our previous arguments then gives the forward differential equation

{ label needed for disp-formula[@id='E9'] }

p ‘_{AG} (t) = - c_{3} p_{AG} (t) + c_{4} q_{AY} (t) + α

where

c_{3} = α + γ + ε + λ

c_{4} = κ - α

Substituting the solution (Equation { label needed for disp-formula[@id='E8'] }) forq_AY (t) in Equation { label needed for disp-formula[@id='E9'] } and noting the initial condition p_AG (0) = 0, we find after straightforward calculus that

p_{AG} (t) = \frac{c_{2} c_{4} + α c_{1}}{c_{1} c_{3}} - \frac{c_{2} c_{4}}{c_{1} (c_{3} - c_{1})} e^{- c_{1} t}

+ \frac{c_{2} c_{4} - α (c_{3} - c_{1})}{c_{3} (c_{3} - c_{1})} e^{- c_{3} t}

From the identity p_AG (t) +p _AA(t) +q_AY (t) = 1, we finally harvest the entry p_AA (t) ofP(t).

In summary, we have calculated the entriesp_AG (t) andp_AA (t) and the sump_AY (t) =p_AC (t) +p_AT (t) from the top row [p_AA (t),p_AG (t),p_AC (t),p_AT (t)] of P(t). To fill out the top row, we briefly indicate how to calculate similar expressions for q_RR (t) andq_RC(t) =p_AC (t), where R stands for either purine. The forward differential equations forq_RR (t) and forq_RC (t) turn out to be

{ label needed for disp-formula[@id='E10'] }

q ‘_{RR} (t) = - c_{1} q_{RR} (t) + c_{5}

and

{ label needed for disp-formula[@id='E11'] }

q ‘_{RC} (t) = - c_{6} q_{RC} (t) + c_{7} q_{RR} (t) + ς

where c ₁ is defined as above and

c_{5} = δ + κ

c_{6} = δ + κ + ς + β

c_{7} = γ - ς

The solution of Equation { label needed for disp-formula[@id='E10'] } satisfying the initial conditionq_RR (0) = 1 is

q_{RR} (t) = \frac{c_{5} + (c_{1} - c_{5}) e^{- c_{1} t}}{c_{1}}

and the solution of Equation { label needed for disp-formula[@id='E11'] } satisfying the initial conditionq_RC (0) = 0 is

q_{RC} (t) = \frac{c_{5} c_{7} + ς c_{1}}{c_{1} c_{6}} + \frac{(c_{1} - c_{5}) c_{7}}{c_{1} (c_{6} - c_{1})} e^{- c_{1} t}

- \frac{c_{7} (c_{6} - c_{5}) + ς (c_{6} - c_{1})}{c_{6} (c_{6} - c_{1})} e^{- c_{6} t}

(13)

This completes the top row of P(t).

The remaining rows of P(t) can be similarly calculated and are provided in the . In the limit ast → ∞, the top row of P(t), along with all other rows, tends to the equilibrium distribution π with components

π_{A} = \frac{ε (δ + κ) + δ (γ + λ)}{(α + γ + ε + λ) (γ + δ + κ + λ)}

π_{G} = \frac{α (δ + κ) + κ (γ + λ)}{(α + γ + ε + λ) (γ + δ + κ + λ)}

(14)

π_{C} = \frac{γ (δ + κ) + ς (γ + λ)}{(β + δ + κ + ς) (γ + δ + κ + λ)}

π_{T} = \frac{λ (δ + κ) + β (γ + λ)}{(β + δ + κ + ς) (γ + δ + κ + λ)}

It is straightforward to check that π satisfies the balance condition π Λ = 0 characterizing equilibrium. When reversibility holds, the equilibrium distribution (equation 14) reduces to the equilibrium distribution of the PHYLIP model as listed in equation 5.

Likelihood Computation

Maximum likelihood methods are notoriously slow compared to competing methods such as maximum parsimony (Yang 1994). However, every advance in computing speed makes maximum likelihood methods more attractive and the excuses for preferring other methods less compelling. Of course, fast hardware is only one of the reasons for the improved position of maximum likelihood. Better algorithms also play a role. In this section we sketch an algorithm for the fast computation of the derivatives of the likelihood of an evolutionary tree. This algorithm extends the peeling method already in place for likelihood computation (Felsenstein 1981; Waterman 1996) and builds on an extension (Lange et al. 1995) of Baum’s algorithms from the theory of hidden Markov chains (Baum 1972; Devijver 1985). While derivative-free methods of optimization are viable, quick optimization almost always depends on the use of derivatives, computed either exactly or numerically. The decisive advantage of derivative based methods is already clear in timed comparisons of dnaml with its sister program fastDNAml (Olsen 1994); dnaml employs a derivative-free method and fastDNAml Newton’s method.

Before introducing the peeling algorithm and describing how to modify it to compute derivatives, it is helpful to introduce several definitions. Imagine an evolutionary tree drawn with its root at the top and its tips at the bottom. Every internal node n of the tree, including the root, generates three subtrees rooted atn. One of these is simply the subtree descending fromn and ultimately reaching a subset of descendant tips. This subtree can be divided into a left subtree and a right subtree by taking either the left or right branch descending from n.There is also a complementary subtree starting at the original root and containing all of the branches not included in the subtree rooted atn. Figure 1 illustrates the four subtrees at node 5 of a four-taxon tree. Subfigure a is the whole tree, c is the subtree rooted at 5, d is the left subtree, e is the right subtree, and b is the complementary subtree.

Figure 1.

A four-taxon tree with subtrees at node 5.

Open in new tab Download PowerPointLink to figure

In the peeling algorithm, several probabilities enter into the calculation of the likelihood P of the tree at a single site. These are (1) the conditional probabilityT _m → n(i,j) that the daughter node n of a branch m → nis in state j given that the mother node m is in state i; (2) the conditional probabilityD_n (i) of the subtree rooted at noden given node n is in state i; (3) the conditional probability L_n (i) of the left subtree rooted at node n given node n is in statei; (4) the conditional probabilityR_n (i) of the right subtree rooted at noden given node n is in state i; and (5) the joint probability U_n (i) of node nbeing in state i and the complementary subtree showing its observed bases at its tips. In definition 1, note thatT_m → n (i,j) is nothing more than an entry of the finite-time transition matrixP(t) pertaining to the branchm → n. The letters D and U inD_n (i) and U_n (i) of definitions (2) and (5) are chosen to indicate downward and upward directions in the tree.

In the modified peeling algorithm, we make two passes through the tree. In the first pass, we traverse the tree in a postorder (upward) fashion and compute and store the conditional probabilities [L_n (i), R_n (i),D_n (i)] as we visit interior noden. When we visit the root node at the end of the first pass, we recover the likelihood P of the tree. In the second pass, we traverse the tree in a preorder (downward) fashion and compute the derivatives of the likelihood with respect to each of the model parameters using the stored quantities [L_n (i), R_n (i),D_n (i)] as we visit node n.

More specifically in the postorder phase of the algorithm, we employ the recurrence relations

L_{m} (i) = \sum_{j} T_{m \to l} (i, j) D_{l} (j)

R_{m} (i) = \sum_{j} T_{m \to r} (i, j) D_{r} (j)

to compute the conditional probabilitiesL_m (i) andR_m (i), where l is the left daughter of m, and r is the right daughter ofm. When r and l are tips with observed states s_l and s_r, the above recurrences reduce to

L_{m} (i) = T_{m \to l} (i, s_{l})

R_{m} (i) = T_{m \to r} (i, s_{r})

Given the conditional probabilitiesL_m (i) andR_m (i), we compute

D_{m} (i) = L_{m} (i) R_{m} (i)

Derivation of these more or less obvious recurrences is left to the reader. Clearly, the computations on node m can only be carried out after m’s two daughter nodes have been processed. A postorder transversal obeys this convention. Once we reach the root, we recover the likelihood via

{ label needed for disp-formula[@id='E15'] }

P = \sum_{i} π_{i} D_{root} (i)

where π_i is the prior probability of the root being in state i.

In the preorder phase of the algorithm, we exploit the product rule of differentiation. Any given model parameter θ may impinge on the root distribution π_i and one or more of the branch transition probabilities T_{m → n} (i,j). The product rule implies that the partial derivative of P with respect to θ decomposes as

{ label needed for disp-formula[@id='E16'] }

\frac{\partial}{\partial θ} P = \frac{\partial}{\partial θ} P_{root} + \sum_{m \to n} \frac{\partial}{\partial θ} P_{m \to n}

in notation that will be clear momentarily. The branch contribution (∂/∂θ)P _{m → n} can be computed by replacing the function T _{m →n} (i,j) by its partial derivative (∂/∂θ)T _{m → n}(i,j) in the likelihood representations

P = \sum_{i} \sum_{j} U_{m} (i) R_{m} (i) T_{m \to n} (i, j) D_{n} (j)

involving a left branch and

P = \sum_{i} \sum_{j} U_{m} (i) L_{m} (i) T_{m \to n} (i, j) D_{n} (j)

a right branch, respectively. In view of Equation { label needed for disp-formula[@id='E15'] }, we collect the first contribution

\frac{\partial}{\partial θ} P_{root} = \sum_{i} \frac{\partial}{\partial θ} π_{i} D_{root} (i)

when we commence the preorder traversal at the root.

Readers can rightfully object that the recipes for computing (∂/∂θ)P _{m → n} invoke the undefined vectors U_m (i) and, whenn is a tip, D_n (j). The second vector, D_n (j), can be defined asD_n (j) = 1_{{j=s
_n} _}, wheres_n is the base at the tip. Specification ofU_m (i) is more problematic. By definition U_root (i) = π_i. In general, if m → nis a left branch, then

U_{n} (j) = \sum_{i} U_{m} (i) R_{m} (i) T_{m \to n} (i, j)

and if m → n is a right branch, then

U_{n} (j) = \sum_{i} U_{m} (i) L_{m} (i) T_{m \to n} (i, j)

These recurrences require that mother nodes be visited before daughter nodes. A preorder traversal of a phylogenetic tree achieves this node visiting strategy.

As an example of the two-pass algorithm, consider the four-taxon tree shown in Figure 1a. Pass one and two consist of the steps listed in Tables 1 and 2. The partial derivatives (∂/∂θ)π_i and (∂/∂θ)T _{m → n}(i,j) are calculated exactly and stored in a table prior tothe start of pass two. When node n is visited during pass two,U_n (j) is computed first, then (∂/∂θ)P _{m → n} is computed for each parameter θ in the model. Because the (∂/∂θ)T _{m → n}(i,j) are already computed and stored in a table, computing (∂/∂θ)P _{m → n} for a right branch requires only that the termsU_m (i),R_m (i), (∂/∂θ)T _{m→ n}(i,j), andD_n (j) be multiplied and summed overi and j. For a left branch, we substituteL_m (i) forR_m (i) in this calculation. When all of the nodes have been visited, the partial derivatives of Pare recovered via Equation { label needed for disp-formula[@id='E16'] }.

Table 1.

Likelihood Computation for Fig. 1a

Step	Node	Action
1	5	L ₅(i) = T _5→1 (i, s ₁)
2	5	R ₅(i) = R _5→2 (i, s ₂)
3	5	D ₅(i) = L ₅(i) R ₅(i)
4	6	L ₆(i) = T _6→3(i, s ₃)
5	6	R ₆(i) = T _6→4 (i, s ₄)
6	6	D ₆(i) = L ₆(i) R ₆(i)
7	7	L ₇(i) = ∑_j T _7→5(i,j)D ₅(j)
8	7	R ₇(i) = ∑_j T _7→6(i,j)D ₆(j)
9	7	D ₇(i) = L ₇(i)R ₇(i)
10	7	P = ∑_i π_i D ₇(i)

Open in new tabLink to table

Table 2.

Computation of Derivatives for Fig. 1a

Step	Node	Action
1	7	$U_{7} (i) = π_{i}$
2	7	$\frac{\partial}{\partial θ} P_{7} = \sum_{i} \frac{\partial}{\partial θ} π_{i} D_{7} (i)$
3	5	$U_{5} (j) = \sum_{i} U_{7} (i) R_{7} (j) T_{7 \to 5} (i, j)$
4	5	$\frac{\partial}{\partial θ} P_{7 \to 5} = \sum_{i} \sum_{j} U_{7} (i) R_{7} (i) \frac{\partial}{\partial θ} T_{7 \to 5} (i, j) D_{5} (j)$
5	1	$\frac{\partial}{\partial θ} P_{5 \to 1} = \sum_{i} U_{5} (i) R_{5} (i) \frac{\partial}{\partial θ} T_{5 \to 1} (i, s_{1})$
6	2	$\frac{\partial}{\partial θ} P_{5 \to 2} = \sum_{i} U_{5} (i) L_{5} (i) \frac{\partial}{\partial θ} T_{5 \to 2} (i, s_{2})$
7	6	$U_{6} (j) = \sum_{i} U_{7} (i) L_{7} (j) T_{7 \to 6} (i, j)$
8	6	$\frac{\partial}{\partial θ} P_{7 \to 6} = \sum_{i} \sum_{j} U_{7} (i) L_{7} (i) \frac{\partial}{\partial θ} T_{7 \to 6} (i, j) D_{6} (j)$
9	3	$\frac{\partial}{\partial θ} P_{6 \to 3} = \sum_{i} U_{6} (i) R_{6} (i) \frac{\partial}{\partial θ} T_{6 \to 3} (i, s_{3})$
10	4	$\frac{\partial}{\partial θ} P_{6 \to 4} = \sum_{i} U_{6} (i) L_{6} (i) \frac{\partial}{\partial θ} T_{6 \to 4} (i, s_{4})$

Open in new tabLink to table

Likelihood Maximization and Simulated Annealing

We have implemented the extension of Kimura’s model described above and the likelihood algorithms sketched in the previous section in a C++ program named LINNEAUS. Maximum likelihood estimation is performed within LINNEAUS by a C++ version of the optimization program SEARCH (Lange 1997). SEARCH minimizes a nonlinear function by the method of recursive quadratic programming, with quasi-Newton updates of the approximate second differential (Hessian) of the objective function based on successive values of the first differential (gradient) of the objective function (Polak 1997). It is possible to implement Newton’s method using SEARCH, but the second derivatives in our model are cumbersome to compute, and LINNEAUS avoids explicit calculation of them. Exploiting the ability of SEARCH to perform numerical differentiation, we estimate that the modified peeling algorithm described in the previous section approximately doubles the speed of LINNEAUS (data not shown). These timing trials have the beneficial side effect of validating our code for the derivatives ofP(t). Finally, SEARCH permits upper and lower bounds and linear equality constraints to be imposed on parameters. Linear constraints are convenient because the reversibility conditions (Equation { label needed for disp-formula[@id='E3'] }) can be linearized by taking logarithms. Reparameterizing with log parameters has the added virtue of eliminating nonnegativity constraints.

SEARCH needs initial parameter values and an initial approximate Hessian. If maximum likelihood estimation is performed for a sequence of similar trees, then the parameter estimates from one tree can be used as initial parameter values for the next tree. The identity matrix serves as the default initial Hessian in SEARCH. Within LINNEAUS, the subprogram SEARCH minimizes the negative loglikelihood

- L = \sum_{i = 1}^{m} L_{i}

where L_i is the loglikelihood of the data at sitei of m sites. The sum of cross products of scores

H = \sum_{i = 1}^{m} {dL}_{i}^{t} {dL}_{i}

provides an alternative to the identity matrix in approximating the Hessian

- d^{2} L = - \sum_{i = 1}^{m} d^{2} L_{i}

In this regard, note that H is non-negative definite and has the same expectation as −d ² L relative to the data (Rao 1973).

Maximum likelihood estimation for a single tree must be fast enough to accommodate comparison of large numbers of different trees through their maximum likelihoods. With n taxa, there are (2n − 3)!/[2ⁿ⁻²(n − 2)!] rooted trees and (2n − 5)/[2ⁿ⁻³(n − 3)!] unrooted trees. These numbers are manageable for n ≤ 7, but exhaustive consideration of all trees is out of the question forn > 10, and we must turn to heuristic methods of searching tree space. For example, Felsenstein (1981) suggests building an n-taxon tree by starting with any two-taxon subtree and successively adding one taxon at a time. At each stage the new taxon is attached to the existing branch that gives the subtree with largest maximum likelihood. Because the initial pair of taxa is arbitrary, Felsenstein further recommends repeating the construction using different rearrangement orders. Unfortunately, even further refinements of this greedy strategy still examine only a small fraction of tree space (Olsen 1994; Yang 1994).

Because the combinatorial optimization method of simulated annealing permits a wider search, Lundy (1985) suggested its use in phylogenetic reconstruction. For the sake of brevity, we omit a description of simulated annealing and refer readers to the references (Kirkpatrick 1983; Press 1988) for motivation and most details of implementation. The unique elements in the current setting are the energy function, the proposal mechanism, and the initial temperature. It is natural to define the energy of a tree to be its negative maximum loglikelihood. In the proposal stage, we randomly select any node other than the root node, delete the subtree rooted at this node, and reattach the subtree to a randomly selected branch among the remaining branches. The new tree is accepted with the Metropolis probability

min {1, e^{- [f ({tree}_{new}) - f ({tree}_{cur})] ∣ t_{cur}}}

where f is the energy function (the negative maximum loglikelihood) and t_cur is the current temperature. To obtain a good starting temperature, we randomly generate 10 trees and set the initial temperature equal to a constant times the largest absolute difference between the corresponding energies. The constant 2 has proved to be satisfactory for up to 10 taxa. Because simulated annealing gives no guarantee of convergence to the best tree under realistic cooling schedules, we typically conduct several independent runs.

Program Performance

One obvious way of improving program speed is to exploit parallel processing. The fastDNAml program of Olsen et al. (1994) achieves superior performance by splitting likelihood computation for a phylogenetic tree into several processes that take advantage of massively parallel and clustered computing architectures. In writing LINNEAUS in Microsoft Visual C++ 5.0, we chose the most natural synchronous multithreading approach to program parallelization. LINNEAUS simply assigns each available processor to a different site of an aligned sequence. We achieve a further gain in efficiency by amalgamating those sites with a common pattern of observations and multiplying the loglikelihood and score computed for the pattern by the number of participating sites. Note that most sites in an alignment show complete agreement over all taxa; the exceptions (or segregating sites) generate most of the patterns. Under a symmetric multiprocessing operating system such as Windows NT 4.0, our current code detects the number of processors available at runtime and assigns a different observational pattern to each processor. When a processor finishes its thread, that is, computing the corresponding loglikelihood and score, it is reassigned to a new pattern unless none remains. Because most of the time in likelihood maximization is spent in these threads, our computing times per likelihood maximization are nearly inversely proportional to the number of available processors. Linear scalability of this sort is one of the hallmarks of good parallel code.

To test the performance of LINNEAUS, we examined 12 full-length HIV genome sequences available from the Los Alamos HIV web site (Myers 1996). These taxa were obtained from 10 individuals and also include one clade B consensus sequence. The size (∼10,000 nucleotide sites per sequence) and the number of segregating sites (∼15% of the sites per sequence) represent a challenging computational problem. We selected 20 groups of 4, 5, and 6 taxa each and 7 groups of 10 taxa each from the original 12 taxa. Table 3 lists the average time, plus or minus one standard deviation, necessary to maximize the likelihood of one tree for the different size trees. The ALR Revolution 2× desktop computer used in these timing studies has two 300 MHz Intel Pentium II processors, each with a 512-kb L2 cache and 512 Mb of RAM. We have replicated the linear scalability results evident in Table 3 on an older four-processor machine. These four-processor results are comparable to the two-processor results and are omitted. We have also partially ported our code to a 14-processor Sun Ultra Enterprise 5000 with 1GB of RAM running under the Solaris operating system. Our preliminary results on this high-end computer demonstrate that in excess of 100 likelihoods can be maximized per second for 10-taxon trees.

Table 3.

Per Tree Timing Results for Maximum Likelihood

Number of taxa	Number of segregating sites	Single-processor times (sec)	Multiprocessor times (sec)
4	114–180	0.282 ± 0.062	0.149 ± .0336
5	195–641	0.706 ± 0.250	0.365 ± 0.156
6	307–931	1.54 ± 0.45	0.756 ± 0.248
10	902–2163	4.08 ± 0.96	2.02 ± 0.43

Open in new tabLink to table

Data Example

In this section, we apply our generalization of Kimura’s model to a portion of the four-taxon data used by Lake (1988) to infer the closest prokaryotic relative of the eukaryotes. Lake aligned ribosomal RNA from a eukaryote (Aretemia salina) and from three prokaryotes, the eocyte (Desulfurococcus mobilis), the halobacterium (Halococcus morrhuae), and the eubacterium (Bacillus subtilis). Here we examine a subset of 1092 aligned nucleotide quartets from Lake’s data. Adopting his designation, let E designate the topology with A. salina and D. mobilisas sister taxa, let F designate the topology with A. salinaand H. morrhuae as sister taxa, and let G designate the topology with A. salina and B. subtilis as sister taxa (see Fig. 2).

Figure 2.

The E, F, and G topologies represent all possible topologies for the four-taxon case.

Open in new tab Download PowerPointLink to figure

Table 4 gives parameter estimates and maximum loglikelihoods (base e) for the most probable trees under four different models. Model 1 is the equilibrium version of F84 model described in Kishino and Hasegawa (1989). Model 2 is our model (Equation { label needed for disp-formula[@id='E2'] }) with the reversibility restrictions (Equation { label needed for disp-formula[@id='E3'] }) and equilibrium at the root imposed. Both models 1 and 2 satisfy the pulley principle and therefore, cannot be used to root trees. Model 3 coincides with model 2 except that the equilibrium assumption is relaxed. This entails adding three additional parameters to determine the root distribution. Model 4 coincides with model 3 except that the reversibility assumption is relaxed. Simulated data that we omit show that models 3 and 4 can be used to root trees.

Table 4.

Parameter Estimates for the Best Trees Under Four Sets of Modeling Assumptions

Parameter	Model 1	Model 2	Model 3	Model 4
α			0.155 ± 0.016	0.125 ± 0.015	0.051 ± 0.013
β			0.174 ± 0.019	0.200 ± 0.021	0.217 ± 0.023
γ			0.084 ± 0.007	0.071 ± 0.007	0.078 ± 0.009
δ			0.080 ± 0.007	0.092 ± 0.009	0.060 ± 0.009
ε					0.174 ± 0.018
κ			0.107 ± 0.009	0.083 ± 0.008	0.117 ± 0.013
λ			0.067 ± 0.006	0.086 ± 0.008	0.078 ± 0.009
ς					0.134 ± 0.025
u	0.333 ± 0.023
κ	0.464 ± 0.077
π_A	0.233 ± 0.013			0.202 ± 0.012	0.193 ± 0.012
π_G	0.316 ± 0.014			0.362 ± 0.015	0.373 ± 0.015
π_C	0.245 ± 0.013			0.290 ± 0.014	0.297 ± 0.015
π_T	0.206 ± 0.012			0.141 ± 0.011	0.137 ± 0.011
Branch 1	1 ± 0	1 ± 0	1 ± 0	1 ± 0
Branch 2	0.232 ± 0.047	0.234 ± 0.047	.001	0.110 ± 0.044
Branch 3	0.538 ± 0.065	0.539 ± 0.065	0.544 ± 0.065	0.568 ± 0.065
Branch 4	1.650 ± 0.154	1.642 ± 0.153	1.671 ± 0.155	1.690 ± 0.154
Branch 5	0.194 ± 0.052	0.188 ± 0.051	0.196 ± 0.051	0.190 ± 0.050
Branch 6			0.215 ± 0.046	0.082 ± 0.050
Best tree	topology E	topology E	tree 5	tree 5
Iterations	11	11	11	11
Max log likelihood	−4606.6	−4598.2	−4555.7	−4536.2
χ²(df)	140.8 (7)	124.0 (6)	39.0 (2)

Open in new tabLink to table

Under all four models, the best tree has the E topology. For models 1 and 2, the maximum loglikelihoods of the next best topology, the F topology in both cases, are −4614.2 and −4605.2, respectively. Tree 5 in Figure 3 depicts the best rooted tree for both models 3 and 4. Unfortunately, we have little confidence in this rooting. In the case of model 4, the two remaining trees in the E topology have maximum loglikelihoods of −4537.3 and −4537.4, respectively. Similarly inconclusive results hold for model 3. Under models 3 and 4, the best maximum loglikelihoods for trees with the F or G topology (−4564.1 and −4544.2, respectively) are substantially worse than the maximum likelihoods listed in Table 4. LINNEAUS reached the maximum likelihood estimates for all four models listed in Table 4in just 11 iterations.

Figure 3.

The five rooted trees corresponding to the unrooted tree with the E topology. These rooted trees are obtained by placing the root in each of the five branches of the unrooted tree.

Open in new tab Download PowerPointLink to figure

Models 1, 2, and 3 are nested within model 4. Within the context of the best topology, or where applicable the best rooted tree, we can therefore conduct likelihood ratio tests of the three submodels versus model 4. The χ² statistics given at the bottom of Table 4overwhelming reject each of the submodels. Clearly, assuming that the four nucleotides are at equilibrium is a poor idea in these data. This impression is reinforced by the empirical frequencies shown in Table5. Relaxing the reversibility restrictions also substantially improves model fit. Finally, the fact that we estimate the length of branch 2 in model 3 at the somewhat artificial lower bound of 0.001 is another indication of the superiority of model 4.

Table 5.

Observed Base Proportions for the Four Contemporary Taxa

Base	A. salina (eukaryote)	D. mobilis (eocycte)	H. morrhuae (halobacterium)	B. subtilis (eubacterium)
A	0.250	0.232	0.202	0.255
G	0.319	0.328	0.367	0.279
C	0.230	0.263	0.290	0.234
T	0.202	0.178	0.141	0.232

Open in new tabLink to table

Because all four models represent restrictions of the multinomial model with 4⁴ − 1 = 255 degrees of freedom (Navidi et al. 1991), goodness of fit can be roughly assessed by comparing the maximum loglikelihood (−4536.2) under the best tree of model 4 with the maximum loglikelihood (−4361.3) under the unrestricted multinomial model. The resulting χ² test statistic χ² ₂₃₈ = 349.8 has asymptoticP value ≅0.0001. This apparent lack of fit may spring from the failure of any of three different assumptions: (1) the transversion symmetries in our matrix Λ, (2) the applicability of large sample approximations to sparse data, and (3) the assumption that the various sites act independently and identically. Although it is impossible to make a firm judgment in the current case, it is worth noting that the phenomenon of heterogeneity of substitution rates across sites is well-documented (Gojobori 1982; Yang 1993). In particular, Rzhetsky (1995) shows substantial variation in the substitution rates between the stem and loop regions of 16S-like ribosomal RNA genes from higher eukaryotes.

Conclusions

Maximum likelihood is at an obvious computational disadvantage in molecular phylogeny. To a lesser extent, it is also handicapped by the simplicity of the models it traditionally invokes. The thrust of the current paper is that advances in model construction and algorithm design can go hand in hand. Although our generalization of Kimura’s base-substitution model involves a more complicated infinitesimal generator, the new generator Λ retains just enough symmetry to permit exact calculation of the matrix exponential e^tΛ and its corresponding equilibrium distribution π. Furthermore, this more flexible model can be coupled with better algorithms for maximum likelihood estimation and search of tree space.

A key ingredient in the rapid optimization of any function is the ability to eval-

uate it and its derivatives quickly. Our new algorithm for likelihood differentiation should be viewed in this light. Combining this algorithm with a good, general purpose optimization subprogram and parallelizing the resulting code leads to very fast maximum likelihood estimation per tree. In fact, we now consider our likelihood optimization methods fast enough to harness simulated annealing in a search of tree space for the best tree. While simulated annealing is slower than competing methods in phylogenetic inference, it is slower precisely because it analyzes a greater number of trees, and so, tends to the best tree with greater probability.

Our infinitesimal generator Λ, which is irreversible in general, opens up the possibility of rooting trees. As Felsenstein’s celebrated pulley principle shows, rooting is impossible in a reversible model when the root frequencies are at equilibrium and no constraints are imposed on branch lengths. Abandoning any of these three assumptions makes it possible in principle to root a tree. For instance, if constraints are imposed on branch lengths by assuming a molecular clock, it is possible to root a tree. However, the practical difficulty of rooting in the presence of limited data has led many researchers to dismiss the possibility altogether. Yang (1994) raises the additional numerical concern that the eigenvalues of the infinitesimal generator may be complex when reversibility fails. It is interesting to note that all eigenvalues of our matrix Λ are real.

Our experience only partially bears out the pessimism about rooting. Although topologically equivalent trees usually have similar maximum likelihoods and the root position within the best tree is usually poorly determined, we do see substantial increases in loglikelihoods when passing from simple reversible models to our more complex irreversible model. In our opinion, the massive amounts of data coming out of the various genome sequencing projects will fundamentally change the course of phylogenetic inference. Instead of aligned sequences of 10³ sites, we may well have aligned sequences of 10⁶–10⁸ sites. Of course, this quantitative shift merely emphasizes the necessity of further gains in computing efficiency if maximum likelihood is to be the phylogenetic inference engine of choice.

In any event, the results presented in this article are meant to demonstrate the feasibility of major improvements in computing speed in maximum likelihood methods. Researchers interested in a free copy of LINNEAUS should contact one of the authors.

We thank Professor James A. Lake for providing the aligned DNA sequences for our Section 6 example. The authors would also like to thank the reviewers and the editor for their helpful comments and clarifications. Research was supported in part by U.S. Public Health Service grant GM53275 (to K.L.) and CA16042 (to J.S.S.) and National Institute of General Medical Sciences (NIGMS) Systems and Integrative Biology Training grant GM08185-11 (to E.E.S.).

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

If the constants c ₁ throughc ₇ are defined as in the text, and if

c_{8} = δ - ε

c_{9} = λ - β,

then the 16 entries of P(t) are

p_{AA} (t) = π_{A} - \frac{c_{2} c_{8}}{c_{1} (c_{3} - c_{1})} e^{- c_{1} t} + \frac{α (c_{3} - c_{1}) - c_{2} c_{4}}{c_{3} (c_{3} - c_{1})} e^{- c_{3} t}

p_{AG} (t) = π_{G} - \frac{c_{2} c_{4}}{c_{1} (c_{3} - c_{1})} e^{- c_{1} t} + \frac{c_{2} c_{4} - α (c_{3} - c_{1})}{c_{3} (c_{3} - c_{1})} e^{- c_{3} t}

p_{AC} (t) = π_{C} + \frac{c_{2} c_{7}}{c_{1} (c_{6} - c_{1})} e^{- c_{1} t} + \frac{λ ς - γ β}{c_{6} (c_{6} - c_{1})} e^{- c_{6} t}

p_{AT} (t) = π_{T} + \frac{c_{2} c_{9}}{c_{1} (c_{6} - c_{1})} e^{- c_{1} t} + \frac{β γ - λ ς}{c_{6} (c_{6} - c_{1})} e^{- c_{6} t}

p_{GA} (t) = π_{A} - \frac{c_{2} c_{8}}{c_{1} (c_{3} - c_{1})} e^{- c_{1} t} + \frac{c_{2} c_{8} - ε (c_{3} - c_{1})}{c_{3} (c_{3} - c_{1})} e^{- c_{3} t}

p_{GG} (t) = π_{G} - \frac{c_{2} c_{4}}{c_{1} (c_{3} - c_{1})} e^{- c_{1} t} + \frac{ε (c_{3} - c_{1}) - c_{2} c_{8}}{c_{3} (c_{3} - c_{1})} e^{- c_{3} t}

p_{GC} (t) = p_{AC} (t)

p_{GT} (t) = p_{AT} (t)

p_{CA} (t) = π_{A} + \frac{c_{5} c_{8}}{c_{1} (c_{3} - c_{1})} e^{- c_{1} t} + \frac{ε κ - δ α}{c_{3} (c_{3} - c_{1})} e^{- c_{3} t}

p_{CG} (t) = π_{G} + \frac{c_{5} c_{4}}{c_{1} (c_{3} - c_{1})} e^{- c_{1} t} + \frac{δ α - ε κ}{c_{3} (c_{3} - c_{1})} e^{- c_{3} t}

p_{CC} (t) = π_{C} - \frac{c_{5} c_{7}}{c_{1} (c_{6} - c_{1})} e^{- c_{1} t} + \frac{β (c_{6} - c_{1}) - c_{5} c_{9}}{c_{6} (c_{6} - c_{1})} e^{- c_{6} t}

p_{CT} (t) = π_{T} - \frac{c_{5} c_{9}}{c_{1} (c_{6} - c_{1})} e^{- c_{1} t} + \frac{c_{5} c_{9} - β (c_{7} - c_{1})}{c_{6} (c_{6} - c_{1})} e^{- c_{6} t}

p_{TA} (t) = p_{CA} (t)

p_{TG} (t) = p_{CG} (t)

p_{TC} (t) = π_{C} - \frac{c_{5} c_{7}}{c_{1} (c_{6} - c_{1})} e^{- c_{1} t} + \frac{c_{5} c_{7} - ς (c_{6} - c_{1})}{c_{6} (c_{6} - c_{1})} e^{- c_{6} t}

p_{TT} (t) = π_{T} - \frac{c_{5} c_{9}}{c_{1} (c_{6} - c_{1})} e^{- c_{1} t} + \frac{ς (c_{6} - c_{1}) - c_{5} c_{7}}{c_{6} (c_{6} - c_{1})} e^{- c_{6} t}

The infinitesimal generator Λ has eigenvalues 0, −c ₁, −c ₃, and −c ₆ and corresponding right eigenvectors

ν_{1} = (1, 1, 1, 1)^{t}

ν_{2} = {(1, 1, - \frac{c_{5}}{c_{2}}, - \frac{c_{5}}{c_{2}})}^{t}

ν_{3} = {(\frac{α (c_{5} - c_{3}) + κ c_{2}}{δ (c_{3} - c_{2}) - ε c_{5}}, \frac{- ε (c_{5} - c_{3}) - δ c_{2}}{δ (c_{3} - c_{2}) - ε c_{5}}, 1, 1)}^{t}

ν_{4} = {(1, 1, \frac{β (c_{2} - c_{6}) + λ c_{5}}{γ (c_{6} - c_{5}) - ς c_{2}}, \frac{- ς (c_{2} - c_{6}) - γ c_{5}}{γ (c_{6} - c_{5}) - ς c_{2}})}^{t}

These can be checked by straightforward substitution.

Notes

[1] Corresponding author.

Notes

[2] E-MAIL [email protected]; FAX (650) 855-6115.

REFERENCES

↵
J. AdachiM. Hasegawa (1996) MOLPHY, Version 2.3. (The Institute of Statistical Mathematics, Tokyo, Japan).
Google Scholar
↵
D. BarryJ.A. Hartigan (1987) Statistical analysis of hominoid molecular evolution. Stat. Sci. 2:191–210.
Google Scholar
↵
L.E. Baum (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3:1–8.
Google Scholar
↵
P.A. Devijver (1985) Baum’s forward-backward algorithm revisited. Pattern Recognition Lett. 3:369–373.
Google Scholar CrossRef
↵
J. Felsenstein (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17:368–376.
Google Scholar CrossRef PubMed Web of Science
↵
J. FelsensteinG.A. Churchill (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13:93–104.
Google Scholar CrossRef PubMed Web of Science
↵
T. GojoboriK. IshiiM. Nei (1982) Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotides. J. Mol. Evol. 18:414–423.
Google Scholar CrossRef PubMed Web of Science
↵
M. HasegawaH. KishinoT. Yano (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.
Google Scholar CrossRef PubMed Web of Science
↵
F.P. Kelly (1979) Reversibility and stochastic networks. (Wiley, New York, NY).
Google Scholar
↵
M. Kimura (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.
Google Scholar CrossRef PubMed Web of Science
↵
S. KirkpatrickC.D. GelattM.P. Vecchi (1983) Optimization by simulated annealing. Science 220:671–680.
Google Scholar CrossRef PubMed Web of Science
↵
H. KishinoM. Hasegawa (1989) Evaluation of the maximum likelihood estimates of the evolutionary tree topologies from DNA sequence data and the branching order in Hominoidea. J. Mol. Evol. 29:170–179.
Google Scholar CrossRef PubMed Web of Science
↵
J.A. Lake (1987) A rate-independent technique for analysis of nucleic acid sequences: Evolutionary parsimony. Mol. Biol. Evol. 4:167–191.
Google Scholar PubMed Web of Science
↵
(1988) Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 331:184–186, ibid.
Google Scholar CrossRef PubMed Web of Science
↵
K. Lange (1997) SEARCH, Version 3.0 (University of Michigan, Ann Arbor, MI).
Google Scholar
↵
K. LangeM. BoehnkeD.R. CoxK.L. Lunetta (1995) Statistical methods for polyploid radiation hybrid mapping. Genome Res. 5:136–150.
Google Scholar CrossRef Abstract/Full Text PubMed Web of Science
↵
M. Lundy (1985) Applications of the annealing algorithm to combinatorial problems in statistics. Biometrika 72:191–198.
Google Scholar CrossRef Web of Science
↵
G. MyersB. KorberS. Wain-HobsonK.T. JeangL.E. HendersonG.N. Pavlakis (1996) Human retroviruses and AIDS. (Los Alamos National Laboratory, Los Alamos, NM).
Google Scholar
↵
W.C. NavidiG.A. ChurchillA. von Haeseler (1991) Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol. Biol. Evol. 8:128–143.
Google Scholar PubMed Web of Science
↵
G.J. OlsenH. MatsudaR. HagstromR. Overbeek (1994) fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10:41–48.
Google Scholar CrossRef PubMed
↵
E. Polak (1997) Optimization: Algorithms and consistent approximations. (Springer-Verlag, New York, NY).
Google Scholar
↵
W.H. PressB.P. FlanneryS.A. TeukolskyW.T. Vetterling (1988) Numerical recipes in C. (Cambridge University Press, New York, NY).
Google Scholar
↵
C.R. Rao (1973) Linear statistical inference and its applications (Wiley, New York, NY), 2nd ed.
Google Scholar
↵
A. Rzhetsky (1995) Estimating substitution rates in ribosomal RNA genes. Genetics 141:771–783.
Google Scholar PubMed Web of Science
↵
K. TamuraM. Nei (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512–526.
Google Scholar PubMed Web of Science
↵
S. Tavare (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Sci. 17:57–86.
Google Scholar
↵
M.S. Waterman (1996) Introduction to computational biology: Maps, sequences and genomes. (Chapman & Hall, New York, NY).
Google Scholar
↵
Z. Yang (1993) Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. J. Mol. Evol. 10:1396–1401.
Google Scholar
↵
(1994) Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105–111, ibid.
Google Scholar CrossRef PubMed Web of Science