DNA sequencing is a technique to determine the sequence of the four nucleotides—adenine, guanine, cytosine, and thymine—in a strand of DNA (deoxyribonucleic acid). The nucleotide sequence is the blueprint that contains the instructions for building an organism. Human chromosomes range in size from about 50,000,000 to 300,000,000 base pairs, and each human being has 46 (23 pairs) of these chromosomes, or approximately 3.2 billion bases of DNA in total.
Knowledge of DNA sequences has become essential to biological research and many applied fields, such as medical diagnosis, biotechnology, and forensics. Advancements in sequencing technology, including DNA sequencers, continually improve accessibility to this genetic information.
The first DNA sequencing technologies emerged in the 1970s and included two methods: the Maxam-Gilbert and the Sanger methods. American molecular biologists Allan M. Maxam and Walter Gilbert discovered the Maxam-Gilbert method, and English biochemist Frederick Sanger discovered the Sanger method (aka the dideoxy method). In 1980, both Walter Gilbert and Frederick Sanger were awarded The Nobel Prize in Chemistry for “their contributions concerning the determination of base sequences in nucleic acids." The Sanger method is still used widely, whereas the Maxam-Gilbert method is no longer used.
In the original Sanger method, DNA chains are synthesized on a template strand, but chain growth is stopped when one of four possible dideoxy nucleotides, which lacks a 3' hydroxyl group, becomes incorporated, which prevents the addition of another nucleotide. A population of nested, truncated DNA molecules is produced that represents each of the sites of that particular nucleotide in the template DNA. The molecules are separated according to size in a procedure called electrophoresis, and the inferred nucleotide sequence is deduced by a computer. Later, the method utilized automated sequencing machines, in which the truncated DNA molecules, labeled with fluorescent tags, are separated by size within thin glass capillaries and detected by laser excitation.
Next-generation sequencing technologies (NGS) have largely supplanted first-generation technologies. These newer approaches enable many DNA fragments to be sequenced at one time and are more cost-efficient and faster than first-generation technologies. The utility of next-generation technologies improved significantly with advances in bioinformatics that allow for increased data storage and facilitates the analysis and manipulation of very large data sets, often in the gigabase range (1 gigabase = 1,000,000,000 base pairs of DNA).
Knowledge of the sequence of a DNA segment has many uses:
- DNA sequencing can be used to find genes, segments of DNA that code for a specific protein or phenotype. If a region of DNA has been sequenced, it can be screened for characteristic features of genes. For example, open reading frames (ORFs)—long sequences that begin with a start codon (three adjacent nucleotides; the sequence of a codon dictates amino acid production) and are uninterrupted by stop codons (except for one at their termination)—suggest a protein-coding region. Also, human genes are generally adjacent to so-called CpG islands—clusters of cytosine and guanine, two of the nucleotides that make up DNA. If a gene with a known phenotype (such as a disease gene in humans) is known to be in the chromosomal region sequenced, then unassigned genes in the region will become candidates for that function.
- Homologous DNA sequences of different organisms can be compared in order to plot evolutionary relationships both within and between species.
- A gene sequence can be screened for functional regions. In order to determine the function of a gene, various domains can be identified that are common to proteins with similar functions. For example, certain amino acid sequences within a gene are always found in proteins that span a cell membrane; such amino acid stretches are called transmembrane domains. If a transmembrane domain is found in a gene of unknown function, it suggests that the encoded protein is located in the cellular membrane. Other domains characterize DNA-binding proteins. Several public databases of DNA sequences are available for analysis by any interested individual.
Using these technologies, scientists have been able to rapidly sequence entire genomes (whole genome sequencing) of organisms, to discover genes involved in disease, and to better understand genomic structure and diversity among species generally.
DNA sequencing companies
Documentaries, videos and podcasts
- BiotechnologyBiotechnology in a broad sense includes the use of living systems and organisms, as well as their parts for the development or production of products.
- Synthetic biologyInterdisciplinary branch of biology and engineering, applying multiple disciplines to build artificial biological systems for research, engineering, and medical applications.