How to Build a Better Tree of Life

New Idea / by Joe Kloc /

An unconventional approach to analyzing molecular sequences allows researchers to construct larger evolutionary trees.

Illustration: Tyler Lang

Organizing the world’s species into branches on a phylogenetic tree is a major goal of biologists trying to understand how life evolved. DNA-sequencing technologies are providing them with more information than ever with which to accomplish this goal, but with less than 1 percent of all species currently placed in any kind of phylogeny, there is still much work to be done. In a recent paper in Science, researchers at the University of Texas at Austin introduced new tree-building software that could expand the tree of life and change our understanding of evolution.

One way to construct evolutionary trees is with software that compares and interprets discrepancies between the molecular sequences of different species using various statistical techniques. The robustness of the math driving these techniques largely determines the speed and accuracy of a given tree-building method. Thus taking a mathematically well-grounded approach to constructing evolutionary trees can limit a method’s scope. “The statisticians who have been developing these methods have been really trying to get the mathematics right,” explains Tandy Warnow, a phylogeneticist at the University of Texas at Austin. “And getting the mathematics right really does tend to limit you to small datasets.” Many programs are only fast enough to handle about 20 molecular sequences at a time—a paltry number considering the datasets biologists are trying to analyze are usually anywhere from a few hundred to a few thousand sequences.

To find out just how slow these programs were, Warnow attempted to run them on a data set of 100 sequences. “They looked like they were not going to complete for months and months and months,” she says. Larger datasets, then, could take decades.

“There is a clear and desperate need for methods that compute phylogenetic trees much faster,” says Antonis Rokas, a biologist at Vanderbilt University, adding that scientists ultimately hope to build trees containing millions of species.

To address the problem, Warnow and her colleagues developed a tree-building program called SATé capable of processing 1,000 sequences in 24 hours. She refers to the statistical method used by SATé as “not completely kosher” in her field, because in order to up the speed and power of the software, her team used mathematical techniques without solid theoretical grounding. “We’re not following a mathematically rigorous approach,” she says. But the risk paid off: SATé constructed trees with a high degree of accuracy from simulated datasets as well a real one whose tree structure had already been determined.

SATé solves another problem common among tree-building programs. Some species’ DNA changes so quickly that their molecular sequences from generation to generation can be quite different and thus more difficult for software to compare. But SATé is able to handle many of these rapidly evolving species and by doing so opens previously impenetrable datasets to new types of phylogenetic analysis.

“This is certainly a big step in the right direction,” Rokas says. “And I expect this software to be used more and more.” 

Because Warnow’s team developed SATé using mathematical methods they don’t yet completely understand, it is still a mystery how the program is able to deal with rapidly evolving sequences so successfully. “We have something that works well but doesn’t really yet have an explanation,” Warnow says. According to her, learning how the program works will require the attention of a strong probabilist.

“The difficulty is finding mathematicians interested in the biological problems,” explains Rokas. But if mathematicians can determine why SATé is able to outperform other tree-building methods, Warnow may be able to improve upon the design of SATé to consider even larger sets of data and move closer to the goal of constructing a tree of life containing all species.

Rokas explains that Warnow’s research is so successful because she understands the practical considerations facing biologists. “I want software that is easy and that runs quickly so that I can train my students to use it,” he says. SATé, which works on a laptop computer, was made to do just that. “We designed it so that it was really going to be easy for anyone to use,” Warnow explains.

By taking a step away from the mathematically well-understood approach to molecular phylogenetics, Warnow’s team was able to address the needs of researchers like Rokas. SATé enables scientists to handle real-world databases of a wider range of species, and, according to Warnow, this could lead to new scientific discoveries and have broad implications for evolutionary biology. What remains to be seen is how far Warnow can push mathematicians to solve the problems necessary to move SATé—and our understanding of evolution—forward.

Originally published July 1, 2009

Tags dna genetics innovation scale technology

Share this Stumbleupon Reddit Email + More


  • Ideas

    I Tried Almost Everything Else

    John Rinn, snowboarder, skateboarder, and “genomic origamist,” on why we should dumpster-dive in our genomes and the inspiration of a middle-distance runner.

  • Ideas

    Going, Going, Gone

    The second most common element in the universe is increasingly rare on Earth—except, for now, in America.

  • Ideas

    Earth-like Planets Aren’t Rare

    Renowned planetary scientist James Kasting on the odds of finding another Earth-like planet and the power of science fiction.

The Seed Salon

Video: conversations with leading scientists and thinkers on fundamental issues and ideas at the edge of science and culture.

Are We Beyond the Two Cultures?

Video: Seed revisits the questions C.P. Snow raised about science and the humanities 50 years by asking six great thinkers, Where are we now?

Saved by Science

Audio slideshow: Justine Cooper's large-format photographs of the collections behind the walls of the American Museum of Natural History.

The Universe in 2009

In 2009, we are celebrating curiosity and creativity with a dynamic look at the very best ideas that give us reason for optimism.

Revolutionary Minds
The Interpreters

In this installment of Revolutionary Minds, five people who use the new tools of science to educate, illuminate, and engage.

The Seed Design Series

Leading scientists, designers, and architects on ideas like the personal genome, brain visualization, generative architecture, and collective design.

The Seed State of Science

Seed examines the radical changes within science itself by assessing the evolving role of scientists and the shifting dimensions of scientific practice.

A Place for Science

On the trail of the haunts, homes, and posts of knowledge, from the laboratory to the field.


Witness the science. Stunning photographic portfolios from the pages of Seed magazine.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | Research Blogging | SEEDMAGAZINE.COM