08/20/2025 | News release | Distributed by Public on 08/20/2025 12:51
"Moving forward in science is as much unwinding the distorted thinking of the past as it is putting a clearer idea on the table." -J. Craig Venter (interview with Richard Dawkins)
Thirty years ago, a typical home computer CPU had around 3 million transistors. Today, a high-end consumer processor has around 184 billion, a staggering 60,000x increase. While this explosion in computing power was racing forward, enabling everything from entertainment to communications, another revolution was underway (also powered by computing advances), allowing scientists to read the genetic material of life with both greater speed and lower costs than ever before.
Reading the complete collection of genetic material of an organism (its DNA) is referred to as whole genome sequencing. In 1993, this had never been done. While portions of organisms and the complete genomes of viruses had previously been sequenced, no whole genome of a free-living organism had ever been read.
The first virus sequenced was bacteriophage MS2 (an RNA virus that infects bacteria) in 1976 by researchers including leading poxvirologist Walter Fiers at the University of Ghent, Belgium. It consists of 3,569 base pairs. Base pairs are the molecular rungs on the double helix ladder of DNA that are analogous to bits in a computer, storing genetic instructions.
This transmission electron microscopic (TEM) image depicts a single vaccinia virus particle, or virion, used therapeutically, to impart a human immunity to smallpox, when administered in the form of a vaccination. Image: CDC/Cynthia Goldsmith.By 1993, Vaccinia virus (a live virus used in the smallpox vaccine because of its close relationship to the virus that causes smallpox) had been sequenced, led by Bernard Moss at the National Institutes of Health (NIH). It was the largest sequence to date, consisting of about 190,000 base pairs.
It wasn't until the fall of 1993 that Nobel laureate Hamilton Smith suggested to J. Craig Venter at a scientific council meeting at The Institute for Genomic Research (TIGR, founded by Venter and the forerunner to the institute that now bears his name), that they attempt to sequence the genome of the bacteria Haemophilus influenzae.
In an oral history with Cold Spring Harbor Laboratory, Smith recalled saying to Venter, "'You call yourself a genomic institute, let's do a genome.' …I said, how about doing Haemophilus influenzae, it's 1.9 megabases, it has a very favorable base composition for sequencing, similar to human. And I can make the libraries. And [Venter] was extremely enthusiastic."
Smith was unable to secure any interest back at his lab at Johns Hopkins University to help with library preparation, a huge undertaking necessary to sequence the genome using the traditional Sanger clone-by-clone approach. He then recalled that Venter had been using something called random shotgun sequencing, which was far less labor intensive, on his express sequence tags work (ESTs, a method developed by Venter to rapidly identify genes in a DNA sequence).
Smith took this to Venter and the two developed a strategy to employ a shotgun sequencing approach on H. influenzae, which was now possible in part due to advances in computational power. The team got to work, including Granger Sutton, who developed the computer algorithm necessary to assemble the fragments of DNA sequenced using the shotgun method.
As the project started to take shape, Smith joined Venter and Sutton at TIGR, co-leading the H. influenzae sequencing effort with Venter. He later went on to lead the synthetic biology group at JCVI, working alongside Venter for decades, retiring in 2020. He is now an emeritus professor. Sutton also remained a key member of Venter's team for 27 years.
The traditional Sanger sequencing method is painstaking, requiring scientists to determine the precise position of each sequenced DNA fragment on the genome and gradually assemble a complete genome map. In contrast, the shotgun method randomly sheers DNA at different points and uses a computational approach to reassemble the sequenced fragments.
Detail from the cover of the journal Science, announcing the publication of the H. influenzae genome, depicted as a circular genome map. H. influenzae's genome is shaped like a closed loop, rather than a straight line with ends, like humans.Less than two years later in 1995, the team published their work in the journal Science, making H. influenzae Rd the first genome of a free-living organism to be sequenced, a seminal event in the biological sciences. While the work graced the cover of Science at the time, it was not without its detractors who expressed skepticism over shotgun sequencing, fearing it would not scale to larger genomes (like the human), as well as concerns over the accuracy of computational assembly.
Following this success, Venter and his team at TIGR proceeded to publish the complete sequence of Mycoplasma genitalium, the smallest known genome of a self-replicating organism at the time, also in 1995. Right after sequencing M. genitalium, they sequenced the genome of Methanococcus jannaschii, the first archaea genome, publishing their results in 1996 (archaea are a branch of life of single-celled microorganisms that are distinct from both bacteria and eukaryotes).
Having established the whole-genome shotgun sequencing approach, Venter set his sights on something much bigger: the human genome. The human genome is about 3.2 billion base pairs, with around 20,000 protein-coding genes. For context, it is around 1,749 times larger than H. influenzae.
In May 1998, Venter founded Celera Genomics in partnership with Applera Corporation with the goal of accelerating the sequencing of the human genome using whole-genome shotgun sequencing. The Human Genome Project (HGP), publicly funded and led by the NIH and the Department of Energy, began in October 1990 with an initial goal of finishing the human genome by 2005. The HGP used the older Sanger clone-by-clone sequencing method. Despite advances in shotgun sequencing developed and championed by Venter, the HGP continued with their approach.
The stage was set with dueling approaches. The rapid progress by Venter and the Celera team spurred the government-led HGP and while the HGP never fully adopted whole-genome shotgun sequencing, they did use shotgun sequencing late in the project to sequence individual chunks (called BAC clones).
Covers from the competing human genome efforts. The teams simultaneously published their findings, Venter in Science and the HGP in Nature, in February 2001.Venter and his team raced ahead, completing a draft human genome in three years. On June 26, 2000, Venter and HGP leaders jointly announced the completion of the first draft of the human genome at a press conference held at the White House, with President Bill Clinton and UK Prime Minister Tony Blair participating via video link. This was well ahead of the planned 2005 HGP completion date. The two teams simultaneously published their findings, Venter in Science and the HGP in Nature, in February 2001.
Thirty years ago, the once derided whole-genome shotgun sequencing method was proven with the sequencing of H. influenzae. Five years later, this same method then transformed the race to sequence the human genome, proving it could also scale at speed for a fraction of the cost of traditional methods. Today, whole-genome shotgun sequencing still underpins the most advanced sequencing technologies.
Read Venter's first-hand account of these events, the current state of DNA sequencing, its application to human health, and where we're headed in his GEN editorial.