LoginSign Up
DNA sequencing

DNA sequencing

The process of determining the precise order of nucleotides within a DNA molecule.

DNA sequencing is a technique, using any method or technology, to determine the sequence of the four nucleotides -- adenine, guanine, cytosine, and thymine -- in a strand of DNA (deoxyribonucleic acid). The nucleotide sequence is the blueprint that contains the instructions for building an organism. Human chromosomes range in size from about 50,000,000 to 300,000,000 base pairs and each human being has 46 (23 pairs) of these chromosomes, or approximately 3.2 billion bases of DNA in total.

Knowledge of DNA sequences has become essential to biological research and many applied fields such as medical diagnosis, biotechnology, and forensics. Advancements in sequencing technology, including DNA sequencers, continually improve accessibility to this genetic information.

History

First-generation

First-generation sequencing technologies, which emerged in the 1970s, included the Maxam-Gilbert method, discovered by and named for American molecular biologists Allan M. Maxam and Walter Gilbert, and the Sanger method (or dideoxy method), discovered by English biochemist Frederick Sanger. In the Sanger method DNA chains were synthesized on a template strand, but chain growth was stopped when one of four possible dideoxy nucleotides, which lack a 3' hydroxyl group, became incorporated, thereby preventing the addition of another nucleotide. A population of nested, truncated DNA molecules was produced that represented each of the sites of that particular nucleotide in the template DNA. The molecules were separated according to size in a procedure called electrophoresis, and the inferred nucleotide sequence was deduced by a computer. Later, the method was performed by using automated sequencing machines, in which the truncated DNA molecules, labeled with fluorescent tags, were separated by size within thin glass capillaries and detected by laser excitation.

Second-generation

Next-generation (massively parallel, or second-generation) sequencing technologies, often referred to as NGS, have largely supplanted first-generation technologies. These newer approaches enable many DNA fragments to be sequenced at one time and are more cost-efficient and much faster than first-generation technologies. The utility of next-generation technologies was improved significantly by advances in bioinformatics that allowed for increased data storage and facilitated the analysis and manipulation of very large data sets, often in the gigabase range (1 gigabase = 1,000,000,000 base pairs of DNA).

Usage

Knowledge of the sequence of a DNA segment has many uses. First, it can be used to find genes, segments of DNA that code for a specific protein or phenotype. If a region of DNA has been sequenced, it can be screened for characteristic features of genes. For example, open reading frames (ORFs)--long sequences that begin with a start codon (three adjacent nucleotides; the sequence of a codon dictates amino acid production) and are uninterrupted by stop codons (except for one at their termination)--suggest a protein-coding region. Also, human genes are generally adjacent to so-called CpG islands--clusters of cytosine and guanine, two of the nucleotides that make up DNA. If a gene with a known phenotype (such as a disease gene in humans) is known to be in the chromosomal region sequenced, then unassigned genes in the region will become candidates for that function. Second, homologous DNA sequences of different organisms can be compared in order to plot evolutionary relationships both within and between species. Third, a gene sequence can be screened for functional regions. In order to determine the function of a gene, various domains can be identified that are common to proteins of similar function. For example, certain amino acid sequences within a gene are always found in proteins that span a cell membrane; such amino acid stretches are called transmembrane domains. If a transmembrane domain is found in a gene of unknown function, it suggests that the encoded protein is located in the cellular membrane. Other domains characterize DNA-binding proteins. Several public databases of DNA sequences are available for analysis by any interested individual.

Using these technologies, scientists have been able to rapidly sequence entire genomes (whole genome sequencing) of organisms, to discover genes involved in disease, and to better understand genomic structure and diversity among species generally.

DNA sequencing companies

Oxford Nanopore Technologies developed a pocket-sized real-time DNA/RNA sequencer called MinION.

10X Genomics developed a new version of its Chromium de novo assembly solution, including Supernova 2.0, software for sequence assembly. The company also offers a technology called Linked-Reads which his designed to provide long-range information from short-read sequencing data. To map the larger picture of the genome, genetic information is organized into “read clouds”.

Genewiz which normally specializes in R&D genomics services branched out in 2017 into clinical genomics testing with its CLIA Sanger Sequencing service and also launched Amplicon-EZ, a service for sequencing mixed PCR products.

Pacific Biosciences of California (PacBio) produces Sequel system sequencers, which have been purchased by two Chinese customers, Annoroad Gene Technology and BGI Genomics.

Macrogen is a South Korean company providing sequencing services, which maintains a U.S. branch in Rockville, MD and has opened another branch in Madrid.

Qiagen sells NGS products and services including the GeneReader NGS system.

Agilent Technologies has a diagnostics and genomics group that is working on NGS target enrichment. Agilent acquired molecular and sample barcoding patent information from Population Genetics Technologies to enable them to improve accuracy and sensitivity of NGS detection applications.

BGI Genomics, formerly known as Beijing Genomics Institute, launched the Life Periodic Plan, a SMRT (single molecule real time)-based sequencing service that focuses on conservation biology. It aims to use sequencing to data-mine species and deliver data on all animals and plants on earth.

Thermo Fisher Scientific plans to offer Ion AmpliSeq technology for researchers with Illumina’s NGS platforms which will be under the name AmpliSeq for Illumina. NGS business accounts for under 2% of their revenue.

Illumina products include MiniSeq and NovaSeq and in 2018 they launched a smaller one-cubic-foot iSeq 100 sequencer at a cost affordable for many labs, $19,900. Illumina also launched a second FDA-regulated and CE-IVD marked NGS system called NextSeq 550Dx for clinical labs and is moving into oncology and hereditary disease testing applications.

Timeline

People

Name
Role
Related Golden topics

Further reading

Title
Author
Link
Type

Documentaries, videos and podcasts

Title
Date
Link

Companies

Company
CEO
Location
Products/Services

References