Syst. Biol. 51(5):786–805, 2002 DOI: 10.1080/10635150290102410
Phylogeny of the Oriental Drosophila melanogaster Species Group: A Multilocus Reconstruction ARTYOM K OPP1 AND J OHN R. TRUE1,2 1
Howard Hughes Medical Institute and Laboratory of Molecular Biology, University of Wisconsin, Madison, Wisconsin 53706, USA; E-mail:
[email protected] 2 Department of Ecology and Evolution, State University of New York, Stony Brook, New York 11794-5245 , USA; E-mail:
[email protected] Abstract.—The melanogaster species group of Drosophila (subgenus Sophophora ) has long been a favored model for evolutionary studies because of its morphological and ecological diversity and wide geographic distribution. However, phylogenetic relationships among species and subgroups within this lineage are not well understood. We reconstructed the phylogeny of 17 species representing 7 “oriental” species subgroups, which are especially closely related to D. melanogaster. We used DNA sequences of four nuclear and two mitochondrial loci in an attempt to obtain the best possible estimate of species phylogeny and to assess the extent and sources of remaining uncertainties. Comparison of trees derived from single-gene data sets allowed us to identify several strongly supported clades, which were also consistently seen in combined analyses. The relationships among these clades are less certain. The combined data set contains data partitions that are incongruent with each other. Trees reconstructed from the combined data set and from internally homogenous data sets consisting of three or four genes each differ at several deep nodes. The total data set tree is fully resolved and strongly supported at most nodes. Statistical tests indicated that this tree is compatible with all individual and combined data sets. Therefore, we accepted this tree as the most likely model of historical relationships. We compared the new molecular phylogeny to earlier estimates based on morphology and chromosome structure and discuss its taxonomic and evolutionary implications. [Diptera; Drosophila melanogaster; evolution; phylogeny.]
The melanogaster species group of Drosophila (subgenus Sophophora) is currently thought to contain at least 174 species, mostly of oriental or Afrotropical distribution (Bock and Wheeler, 1972; Bock, 1980; Ashburner et al., 1984; Lemeunier et al., 1986; Lachaise et al., 1988; Ashburner, 1989; Schawaroch, 2000, 2002). Two factors have combined to make this group a powerful model for evolutionary studies: its morphological and ecological diversity and its inclusion of D. melanogaster, the workhorse of experimental biology and genetics for over a century (Ashburner, 1989; Powell, 1997). In particular, members of the melanogaster species group are attracting increasing attention as models for understanding the genetic and molecular bases of morphological evolution (Stern, 1998; True et al., 1999; Kopp et al., 2000; Sucena and Stern, 2000). The recent sequencing of the D. melanogaster genome (Adams et al., 2000) has further enhanced our experimental and comparative power. Comparative studies must be conducted in a rigorous historical framework, and the need for a reliable phylogeny of the melanogaster species group has become increasingly obvious. Using the special-
ized classication system developed by Drosophila taxonomists, 12 subgroups are currently recognized within this species group (Hsu, 1949; Okada, 1954; Bock and Wheeler, 1972; Bock, 1980; Lemeunier et al., 1986; Toda, 1991; Schawaroch, 2000, 2002). Based on morphological characters and chromosome structure, most of these subgroups fall into three main lineages: montium, ananassae, and the so-called oriental lineage composed of the eugracilis, cusphila, melanogaster, takahashii, suzukii, elegans, and rhopaloa subgroups (Ashburner et al., 1984; Lemeunier et al., 1986; Toda, 1991). The position of three other subgroups (denticulata, avohirta, and longissima) has not been established, and a number of species remain unassigned (Bock, 1980; Schawaroch, 2000). Originally, the term “oriental” was proposed to describe ve subgroups: takahashii, suzukii, elegans, cusphila, and eugracilis (Ashburner et al., 1984). The melanogaster subgroup is actually Afrotropical in distribution but is similar to the oriental subgroups in morphology and chromosome structure (Ashburner et al., 1984; Lemeunier et al., 1986). The recently proposed rhopaloa subgroup is thought to be related to the elegans subgroup (Toda, 1991;
786
2002
KOPP AND TRUE—DROSOPHILA PHYLOGENY
Toda, pers. comm.), a suggestion consistent with our analysis. For brevity, we use the terms “oriental lineage” and “oriental subgroups” to refer collectively to all seven subgroups. The oriental lineage is of particular interest because of its morphological diversity and close proximity to D. melanogaster. However, evolutionary relationships within this lineage are not well resolved. Studies based on morphological (Hsu, 1949; Okada, 1954; Bock and Wheeler, 1972), chromosomal (Ashburner et al., 1984; Lemeunier et al., 1986), and molecular (Pelandakis et al., 1991; Inomata et al., 1997; Goto and Kimura, 2001; Schawaroch, 2002) data have yielded conicting results. In molecular studies, all seven oriental subgroups were never represented at once, leaving many relationships obscure. We used four nuclear and two mitochondrial gene sequences to reconstruct the molecular phylogeny of 17 oriental species representing seven species subgroups. In a separate report, we considered the implications of this phylogeny for understanding morphological evolution in the oriental lineage (Kopp and True, 2002). For comparative analysis, it is important to know which relationships are strongly supported and which are more in doubt. We therefore attempted to obtain the best possible estimate of species phylogeny and to assess the extent and sources of remaining uncertainties. Comparison of trees derived from singlegene data sets has allowed us to identify several strongly supported clades that are also consistently seen in combined analyses. The uncertainty resides in the relationship among these clades. The total data set is heterogeneous, i.e., it contains data partitions that are incongruent with one another. Trees reconstructed from the total data set and from internally homogenous data sets consisting of three or four genes each differ at several deep nodes. However, statistical tests show that the total data set tree is compatible with all individual and combined data sets. Therefore, we accept this tree as the most likely model of historical relationships. M ATERIALS AND M ETHODS DNA Sequences The 28S ribosomal RNA (rRNA) domain D2 sequence (28S; alignment length, 341 bases) corresponds to positions 3,705–
787
4,046 of the 28S rRNA of D. melanogaster (GenBank accession no. X71159; Tautz et al., 1988). PCR amplication and sequencing were performed using primers 28S-F (CCC GAA GTA TCC TGA ATC TTT CGC ATT G) and 28S-R (GCC CGA TGA ACC TGA ATA TCC GTT ATG G). The mitochondrial cytochrome oxidase subunit 1 sequence (CO1; alignment length, 407 bases) corresponds to positions 2,205– 2,611 of the complete mitochondrial genome of D. melanogaster (accession no. U37541). Amplication and sequencing were performed using primers F-CO1 (CCA GCT GGA GGA GGA GAT CC) and R-CO1 (CCA GTA AAT AAT GGG TAT CAG TG) (Gleason et al., 1997). The alpha-amylase gene sequence (Amy; alignment length, 726 bases) corresponds to positions 181–906 of the D. melanogaster cDNA (accession no. L22719). The Amy locus has been tandemly duplicated in the melanogaster species subgroup, with the two duplicates evolving in a concerted fashion (Inomata et al., 1995; Shibata and Yamazaki, 1995). Therefore, for D. melanogaster, D. teissieri, and D. orena, only the proximal duplicates were used in the analysis. PCR amplication and sequencing were performed using primers Amy-F (AGG TCT CCC CTG TGA ACG AGA ACG CCG T) and Amy-R (ACG AAT ACC AGG GAG CGG TCG GAG GCA GC). The glycerol-3-phosphate dehydrogenase sequence (Gpdh; alignment length, 431 bases) corresponds to positions 3,447–3,876 of the D. melanogaster cDNA (accession no. X61223). This region includes 63 bases of exon 3, 307 bases of exon 4, and the intervening 61-base intron that was used in some but not all analyses. The full data set containing the intron is referred to here as Gpdhint, and the exons-only data set is referred to as Gpdh. For most species, PCR amplication was performed using primers GNL-mel (GTG GTG CCC CAC CAG TTC AT) and GNR-mel (GGC TTG AGC TGA TTT GTG CA) (Barrio and Ayala, 1997). Amplied fragments were sequenced using primers GNLmel and Gpdh-R (CCC ATC AAC ACG GCG CAT GG) (Goto et al., 2000). For D. biarmipes, primers biaGpdh-F (AAG GCC GAG GGC GGT GGC ATC GAT CTG AT) and biaGpdhR (GGG TAG AAG ACG TCC ACG AAG CGA ATC AT) were used for both amplication and sequencing.
788
VOL. 51
S YSTEMATIC BIOLOGY
The ° -dynein heavy chain gene sequence (kl3) corresponds to positions 3,674–4,321 of the D. melanogaster cDNA (accession no. AF313480). The intervening intron could not be aligned and was excluded from analysis (remaining alignment length, 661 bases). For most species, amplication and sequencing were performed using primers kl3-F1 (AGC TTT GGT GCC ATG AGT GTA CGA) and kl3-R1 (ATT GTC TTG CGT AGC TGG TCG ACG). For D. eugracilis, reverse primer kl3R4 (GTT AAT GTC ATT TGA AAA AAT TTR TAK CCA G) was used instead of kl3-R1. The fragment of the mitochondrial genome referred to here as ND1 (alignment length, 1,705 bases) spans part of cytochrome b, tRNA-Ser, NADH dehydrogenase subunit 1 (ND1), tRNA-Leu, and part of the 16S rRNA gene. This sequence corresponds to positions 11,549–13,255 of the complete mitochondrial genome of D. melanogaster (accession no. U37541). Sequence accession numbers, literature references, and sources of genomic DNA are listed in Table 1. For combined data sets comprising several gene sequences, the following abbreviations are used throughout this report: AGSC-Amy, Gpdh (without intron), 28S, and CO1; AGSCi-same as previous, but with the Gpdh intron included; AGSCKisame as previous plus kl3; AGSCKNi-same as previous plus ND1; GSCi-Gpdh including intron, 28S, and CO1; GSCKi-same as previous plus kl3; GSCKNi-same as previous plus ND1; AGKi-Amy, Gpdh including intron, and kl3; AGKSi-same as previous plus 28S. Taxon Sampling Our taxon sample included 17 ingroup species representing 7 species subgroups (melanogaster, takahashii, suzukii, elegans, eugracilis, cusphila, and rhopaloa). An eighteenth species, D. erecta, was included in the ND1 data set. We assume that the oriental subgroups form a monophyletic clade to the exclusion of the montium and ananassae subgroups. This assumption is supported by molecular, morphological, and chromosomal data (Bock and Wheeler, 1972; Bock, 1980; Ashburner et al., 1984; Lemeunier et al., 1986; Pelandakis et al., 1991; Inomata et al., 1997; Goto and Kimura, 2001; Schawaroch, 2000, 2002). Drosophila ananassae and D. bipectinata from the ananassae subgroup and D. kikkawai
from the montium subgroup were used as outgroup taxa. The relationship between these lineages and the oriental clade was not investigated. (For species descriptions and discussions of their taxonomic positions, see Bock and Wheeler, 1972; Bock, 1980; Lemeunier et al., 1986; Toda, 1991; Sultana et al., 1999; Schawaroch, 2000). Table 1 lists all Drosophila species considered in this analysis, their subgroup afliations, abbreviations used in this report, sequences available for each species, and GenBank accession numbers for each sequence. Not every sequence was available for every species; in combined data sets, missing data were coded as gaps. When different genes were sequenced from different strains of the same species, we did not consider this an impediment to combining the data. The only exception is the Amy sequence from the elegans 1 strain, which was placed very distantly from the elegans 3 strain by Inomata et al. (1997). The elegans 3 sequence contains several gaps that are not present in any other species. We have sequenced Amy from two additional strains of D. elegans, and these sequences closely matched the elegans 3 allele. We therefore excluded the elegans 1 allele from further analysis. The following abbreviations are used throughout this report to refer to three well-supported monophyletic groups: mel D melanogaster species subgroup; tmsb D (takahashii subgroup C D. mimetica) C (D. suzukii C D. biarmipes); elf D elegans subgroup C D. lucipennis C D. fuyamai. Sequence Alignment Multiple sequence alignment was performed using ClustalW (Thompson et al., 1994), with a gap opening penalty of 15.0, gap extension penalty of 6.66, and transition weight of 0.5 (relative to transversions). All alignments were conrmed and edited manually (where necessary) in MacClade (4.03; Maddison and Maddison, 2000) and imported into PAUP¤ (4.0b4a; Swofford, 2000). Gaps were treated as missing characters. Complete Amy and CO1 sequences and Gpdh and kl3 exon sequences could be aligned entirely without gaps. The 28S sequences differed by a small, precisely identiable indel at position 220 of the alignment. The ND1 alignment contained several small gaps that could not
789
melanogaster teissieric orenac erecta takahashiic lutescens trilutea pseudotakahashii prostipennis suzukiic biarmipes 361.0 biarmipes 363.3c mimeticac lucipennisc elegansc elegans “brown” elegans “black” gunungcola fuyamaic cusphilac eugracilisc ananassaec bipectinatac kikkawai
c
Species/straina
mel tei ore ere tak lut tri pst pro suz bia0 bia3 mim luc ele eleBw eleBl gun fuy c eug ana bip kik
Abbreviation
Amy
L22719 (2) D17736 (3) D21129 (3) NA AB003775 (4) NA NA NA NA AY098448 (9) NA AY098449 (9) AY098447 (9) AY098452 (9) AB003762 (4) AY098450 (9) AY098451 (9) NA AB003767 (4) AB003765 (4) AB003763 (4) AB003755 (4) AB003759 (4) NA
28S
X71159 (1) X71169 (1) X71173 (1) NA X71177 (1) NA NA NA NA AY098443 (9) NA AY098444 (9) X71179 (1) AY098445 (9) X71183 (1) NA NA NA AY098446 (9) X71181 (1) X71175 (1) X71197 (1) NA X71185 (1)
X61223 U47809 NA NA AB027273, AB02784, AB066194 (5) AB027276, AB027281, AB066190 (5) AB027270, AB027286, AB066197 (5) AY098464 (9) AB027275, AB027282, AB066191 (5) AB032136, AB032144, AB066193 (6) AY098466 (9) AY098467 (9) AY098465 (9) AY098470 (9) AB032138, AB032146, AB066186 (6) AY098468 (9) AY098469 (9) AB032137, AB032145, AB066188 (6) AY098471 (9) AB032141, AB032149, AB066187 (6) AY098472 (9) AY098473 (9) AY098474 (9) NA
Gpdh
U37541 U51618 NA NA AB027264 (5) AB027267 (5) AB027261 (5) AY098453 (9) AB027266 (5) AB032128 (6) AY098455 (9) AY098563 (9) AY098454 (9) AY098459 (9) AB032130 (6) AY098457 (9) AY098458 (9) AB032129 (6) AY098460 (9) AB032133 (6) AY098461 (9) AY098462 (9) AY098463 (9) NA
CO1
AF313480 (7) NA NA NA NA NA NA NA NA AY098476 (9) NA AY098477 (9) AY098475 (9) AY098479 (9) NA AY098478 (9) NA NA AY098480 (9) NA AY098481 (9) NA NA NA
kl3
U37541 AF164586 (8) AF164584 (8) AF164585 (8) AF164592 (8) AF164591 (8) NA NA NA NA NA NA AF164593 (8) NA AF164596 (8) NA NA NA NA AF164594 (8) AF164595 (8) AF164578 (8) NA AF164582 (8)
ND1
a Stocks of D. pseudotakahashii, D. biarmipes, D. mimetica, D. lucipennis, D. eugracilis, D. ananassae, and D. bipectinata were obtained from the National Drosophila Species Resource Center at Bowling Green; D. suzukii, strain HJ, was obtained from Dr. Y. Fuyama, Tokyo Metropolitan University; and the “brown” and “black” strains of D. elegans (which differ in cuticle pigmentation) were isolated from the 3D27 strain obtained from Dr. M. Kimura (Hokkaido University). b References nos. in parentheses: 1 D Pelandakis et al., 1991; 2 D Inomata et al., 1995; 3 D Shibata and Yamazaki, 1995; 4 D Inomata et al., 1997; 5 D Goto et al., 2000; 6 D Goto and Kimura, 2001; 7 D Carvalho et al., 2000; 8 D Kastanis et al., unpubl.; 9 D sequenced in this study. NA D sequence was not available. c Taxa represented in combined data sets.
montium
rhopaloa cusphila eugracilis ananassae
elegans
suzukii
takahashii
melanogaster
Subgroup
Accession nos. (referencesb )
TABLE 1. Drosophila species considered in the analysis, their subgroup afliations, abbreviations, and sequence accession numbers.
790
VOL. 51
S YSTEMATIC BIOLOGY
be aligned unambiguously, and the corresponding positions were removed from the data matrix. The kl3 intron could not be aligned and was excluded from analysis. Although the 61-bp intron of Gpdh could be aligned unambiguously at most positions, the alignment of some short stretches was not obvious to the eye. However, the optimal alignment was robust over a wide range of gap penalties and match/mismatch parameters. Inclusion of the intron in the Gpdh data set or in any of the combined data sets did not affect the topology of bootstrap consensus trees but resulted in higher bootstrap and posterior probability values at most nodes. We therefore chose to include this intron in most combined data sets. Phylogeny Reconstruction Maximum parsimony and maximum likelihood analyses were performed with PAUP¤ 4.0b4a. The TBR branch swapping algorithm was used for heuristic searches. For maximum parsimony, all characters were weighted equally. For each data set, maximum parsimony analysis was repeated 1,000 times with random order of sequence addition to explore the tree space for the presence of multiple optima. In all cases except CO1, a single tree island emerged, suggesting that alternative optima were absent. Two tree islands were found for CO1. Node stability was assessed by nonparametric bootstrapping (Felsenstein, 1985) with 1,000 replicates for maximum parsimony and 100–500 replicates for maximum likelihood, using simple order of sequence addition and TBR branch swapping. The resulting majority-rule consensus trees are shown for each data set. For maximum likelihood analysis, empirical base frequencies were used. Other model parameters (substitution rate matrix, shape parameter of the gamma distribution, and the percentage of invariable sites) were estimated in PAUP¤ by progressive simplication of a full GTRC0CI model (Rodriguez et al., 1990) on trees reconstructed by maximum parsimony or maximum likelihood under a simplied model. Likelihood-ratio tests were used to compare the t of each nested model, and the simplest model that produced a good t was used for phylogeny reconstruction. Model parameters for each data set are shown in Table 2. In most cases, greatly simplied models that t data poorly
nevertheless result in the same tree topologies as the more complicated models, albeit with somewhat lower bootstrap values (data not shown). Thus, for our data, the choice of model parameters has little inuence on phylogeny reconstruction. This was particularly true in the analysis of combined data sets. MrBayes (Huelsenbeck, 2000) was used for Bayesian inference of phylogeny. Rather than attempting to reconstruct a single best tree, this method uses a Markov chain Monte Carlo algorithm to sample a large number of phylogenies according to their posterior probabilities (Yang and Rannala, 1997; Larget and Simon, 1999). The percentage of sampled trees that contain a particular split is taken as an estimate of the posterior probability of that split, given the data and prior parameters. All analyses were performed with uninformative priors, and maximum likelihood parameters were estimated as part of the analysis. Three heated chains were used in addition to the single cold chain (Huelsenbeck, 2000). For each data set, analysis was repeated two to four times, and the chain was allowed to run for 1–2 million generations with trees sampled every 500 steps. For each data set, consensus trees with identical topologies and similar posterior probability values were produced every time, suggesting that tree distributions did not contain alternative optima (data not shown). For each data set, the bootstrap consensus trees from parsimony and likelihood analyses and the Bayesian consensus tree had identical or compatible topologies. Therefore, the best resolved of the three consensus trees is shown for each data set (Figs. 1–3). A node was considered unresolved if no method provided >50% support for that node. In combined data set trees (Fig. 2), branch lengths found by Bayesian analysis are shown. Statistical Tests Each gene was tested for nonstationary nucleotide composition by the  2 homogeneity test in PAUP¤ . The test was performed on the total base composition and then separately on variable sites only. Using the latter approach, the Amy gene had signicant variation in nucleotide composition across taxa (P < 0:0001). For all other genes, no deviations from stationarity were detected ( P > 0:396).
791
200 500
500
200 100 500 500 200 500 100 500
ML
1.000 1.000
1.575
0.492 1.13e C 04 1 2.387 1 1 199.287 1.627
AC
2.228 2.450
2.876
1 4.71e C 04 2.010 2.387 2.692 2.238 820.383 3.837
AG
2.228 1.000
2.876
0.492 4.71e C 04 1 2.387 1 1 820.383 3.668
AT
ML modelb
1.000 1.000
1.575
4.67e ¡ 09 5419.52 1 1 1 1 820.383 3.872
CG
6.255 6.593
6.449
1 1.53e C 05 3.187 7.626 7.549 5.580 3474.670 9.005
CT
1 1
1
1 1 1 1 1 1 1 1
GT
1.246 0.299
1.165
0.862 0.679 0.208 0.317 0.857 0.326 0.541 0.752
Shape
0.566 0
0.480
0.760 0.573 0 0 0.507 0 0.647 0.555
% inv
1 £ 1,000,000 1 £ 2,000,000 2 £ 1,000,000 2 £ 1,000,000
2 £ 1,000,000 2 £ 1,000,000 2 £ 1,000,000 2 £ 1,000,000 2 £ 1,000,000 2 £ 1,000,000 2 £ 1,000,000 2 £ 1,000,000 2 £ 2,000,000
No. runs £ generations
Bayesianc
6,000 4,000 4,000
12,000
4,000 4,000 4,000 4,000 4,000 4,000 4,000
No. trees
b Substitution
of bootstrap replicates. MP D maximum parsimony; ML D maximum likelihood. rate matrix. c Shape D shape parameter of the gamma distribution; % inv D percentage of invariable sites; no. runs £ generations D number of independent runs and generations in each run (e.g., two runs of 1,000,000 generations each); no. trees D total number of sampled trees.
a Number
1,000 1,000
1,000
Combined AGKi
GSCKi takahashii (Gpdh C CO1)
1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000
MP
Single genes 28S CO1 Amy Gpdhint Gpdh kl3 ND1 Total
Data set
Repsa
TABLE 2. Numbers of replicates and likelihood model and Bayesian analysis parameters for the Drosophila data.
792
S YSTEMATIC BIOLOGY
VOL. 51
FIGURE 1. Bootstrap/Bayesian consensus trees reconstructed from single-gene and combined Drosophila data sets. The three numbers shown above each dichotomous node are (top to bottom) maximum parsimony bootstrap value, maximum likelihood bootstrap value (italic), and Bayesian posterior probability of the corresponding split (bold). A node is shown as unresolved if none of the three methods provided >50% support for it. Branch lengths are not shown. Numbers of replicates and likelihood model parameters are given in Table 2. Species subgroups are indicated by symbols. (a) 28S rRNA. (b) CO1. (c) Amy. The numbers below each node are bootstrap values from the minimum evolution (ME) analysis with LogDet distances (1,000 replicates, percentage of invariable sites set to the maximum likelihood estimate of 0.457). The ME/LogDet analysis was repeated several times with different settings for percentage of invariable sites (0–80%). The resulting trees differed only in the placement of D. fuyamai, whereas D. eugracilis and D. cusphila always retained their basal position. (d) Gpdh, including intron. (e) Gpdh, excluding intron. (f) kl3. This tree is not rooted because of the lack of an outgroup. (g) ND1. (h) Combinable-component consensus of the 28S, CO1, Amy, Gpdh, and kl3 trees.
2002
KOPP AND TRUE—DROSOPHILA PHYLOGENY
793
FIGURE 2. Bootstrap/Bayesian consensus trees for combined Drosophila data sets. The three numbers shown at each dichotomous node are (top to bottom) maximum parsimony bootstrap value, maximum likelihood bootstrap value (italic), and Bayesian posterior probability of the corresponding split (bold). Branch lengths are from Bayesian analysis. Numbers of replicates and likelihood model parameters are given in Table 2. The trees differ in the placement of three major clades: “tmsb,” “elf,” and the melanogaster subgroup. (a) Total data set. The combined data sets AGSC, AGSCi, AGSCKi, and AGSCKNi also produced this tree topology. However, the degree of support for each node differred among data sets. At each node, the highest bootstrap and posterior probability values obtained for that node from any data set are shown. (Table 4 lists bootstrap and posterior probability values at each node for each combined data set.) (b) AGKi. (c) GSCKi.
Incongruence among data sets was assessed by the incongruence length difference (ILD) test (Farris et al., 1994, 1995) in PAUP¤ . The ILD test was performed for each pairwise combination of genes and for several combined data sets. Between 100 and 10,000 permutations were used to generate null distributions of tree length differences. Trees were reconstructed with heuristic searches using maximum parsimony. For pairwise comparisons, signicance was assessed using Bonferroni-corrected critical P values. Compatibility among phylogenetic trees derived from different data sets was assessed by the Kishino–Hasegawa (KH) and Shimodaira–Hasegawa (SH) tests (Kishino and Hasegawa, 1989; Shimodaira and Hasegawa, 1999) in PAUP¤ . All possible pair-
wise comparisons among single-locus data sets were performed in the following way. For each locus, the optimal unconstrained tree was reconstructed by maximum likelihood using empirically estimated parameters, and 100–500 bootstrap replicates were performed to assess node stability. Nodes with bootstrap support of