Bioinformatics

Bioinformatics

Interdisciplinary field that applies computational techniques to solve biological problems on the molecular level.

Bioinformatics is an interdisciplinary field that involves molecular biology, genetics, computer science, mathematics and statistics. It applies computational techniques to understand and organized information and address biological problems on the molecular model. Data-intensive, large-scale biological problems are solved from a computational perspective.

Bioinformatics has arisen from the needs of biologists to utilize and help interpret the vast amounts of data they gathered in genomic research. Bioinformatics approaches are motivated by the evolution of organisms and the complexity of working with incomplete and noisy data.

A bioinformatics solution commonly involves the following steps:

1. Collect statistics from biological data.

2. Build a computational model.

3. Solve a computational modeling problem.

4. Test and evaluate a computational algorithm.

The best-known application of bioinformatics is in the analysis of DNA sequences. The field now has more recent counterparts, proteomics and functional genomics.

Timeline

People

Name
Role
LinkedIn

Further reading

Title
Author
Link
Type
Date

What is bioinformatics? An introduction and overview

N.M. Luscombe, D. Greenbaum and M. Gerstein

Academic paper

Documentaries, videos and podcasts

Title
Date
Link

Companies

Company
CEO
Location
Products/Services

Antiverse

Murat Tunaboylu

Cambridge, UK

News

Title
Author
Date
Publisher
Description
Ebert, P., Audano, P. A., Zhu, Q., Rodriguez-Martin, B., Porubsky, D., Bonder, M. J., Sulovari, A., Ebler, J., Zhou, W., Serra Mari, R., Yilmaz, F., Zhao, X., Hsieh, P., Lee, J., Kumar, S., Lin, J., Rausch, T., Chen, Y., Ren, J., Santamarina, M., Höps, W., Ashraf, H., Chuang, N. T., Yang, X., Munson, K. M., Lewis, A. P., Fairley, S., Tallon, L. J., Clarke, W. E., Basile, A. O., Byrska-Bishop, M., Corvelo, A., Evani, U. S., Lu, T.-Y., Chaisson, M. J. P., Chen, J., Li, C., Brand, H., Wenger, A. M., Ghareghani, M., Harvey, W. T., Raeder, B., Hasenfeld, P., Regier, A. A., Abel, H. J., Hall, I. M., Flicek, P., Stegle, O., Gerstein, M. B., Tubio, J. M. C., Mu, Z., Li, Y. I., Shi, X., Hastie, A. R., Ye, K., Chong, Z., Sanders, A. D., Zody, M. C., Talkowski, M. E., Mills, R. E., Devine, S. E., Lee, C., Korbel, J. O., Marschall, T., Eichler, E. E.
April 2, 2021
Science
Many human genomes have been reported using short-read technology, but it is difficult to resolve structural variants (SVs) using these data. These genomes thus lack comprehensive comparisons among individuals and populations. Ebert et al. used long-read structural variation calling across 64 human genomes representing diverse populations and developed new methods for variant discovery. This approach allowed the authors to increase the number of confirmed SVs and to describe the patterns of variation across populations. From this dataset, they identified quantitative trait loci affected by these SVs and determined how they may affect gene expression and potentially explain genome-wide association study hits. This information provides insights into patterns of normal human genetic variation and generates reference genomes that better represent the diversity of our species. Science , this issue p. [eabf7117][1] ### INTRODUCTION The characterization of the full spectrum of genetic variation is critical to understanding human health and disease. Recent technological advances have made it possible to survey genetic variants on the level of fully reconstructed haplotypes, leading to substantially improved sensitivity in detecting and characterizing large structural variants (SVs), including complex classes. ### RATIONALE We focused on comprehensive genetic variant discovery from a human diversity panel representing 25 human populations. We leveraged a recently developed computational pipeline that combines long-read technology and single-cell template strand sequencing (Strand-seq) to generate fully phased diploid genome assemblies without guidance of a reference genome or use of parent-child trio information. Variant discovery from high-quality haplotype assemblies increases sensitivity and yields variants that are not only sequence resolved but also embedded in their genomic context, substantially improving genotyping in short-read sequenced cohorts and providing an assessment of their potential functional relevance. ### RESULTS We generated fully phased genome assemblies for 35 individuals (32 unrelated and three children from parent-child trios). Genomes are highly contiguous [average minimum contig length needed to cover 50% of the genome: 26 million base pairs (Mbp)], accurate at the base-pair level (quality value > 40), correctly phased (average switch error rate 0.18%), and nearly complete compared with GRCh38 (median aligned contig coverage >95%). From the set of 64 unrelated haplotype assemblies, we identified 15.8 million single-nucleotide variants (SNVs), 2.3 million insertions/deletions (indels; 1 to 49 bp in length), 107,590 SVs (≥50 bp), 316 inversions, and 9453 nonreference mobile elements. The large fraction of African individuals in our study (11 of 35) enhances the discovery of previously unidentified variation (approximately twofold increase in discovery rate compared with non-Africans). Overall, ~42% of SVs are previously unidentified compared with recent long-read-based studies. Using orthogonal technologies, we validated most events and discovered ~35 structurally divergent regions per human genome (>50 kbp) not yet fully resolved with long-read genome assembly. We found that homology-mediated mechanisms of SV formation are twice as common as expected from previous reports that used short-read sequencing. We constructed a phylogeny of active L1 source elements and observed a correlation between evolutionary age and features such as the activity level, suggesting that younger elements contribute disproportionately to disease-causing variation. Transduction tracing allowed the identification of 54 active SVA retrotransposon source elements, which mobilize nonrepetitive sequences at their 5′ and 3′ ends. We genotyped up to 50,340 SVs into Illumina short-read data from the 1000 Genomes Project and identified variants associated with changes in gene expression, such as a 1069-bp SV near the gene LIPI , a locus that is associated with cardiac failure. We further identified 117 loci that show evidence for population stratification. These are candidates for local adaptation, such as a 4.0-kbp deletion of regulatory DNA LCT (lactase gene) among Europeans. ### CONCLUSION Fully reconstructed haplotype assemblies triple SV discovery when compared with short-read data and improve genotyping, leading to insights into SV mechanism of origin, evolutionary history, and disease association. ![Figure][2] Discovery and analysis of global human genetic diversity. Starting from a global panel of human diversity (top), we discovered structural variation from fully phased diploid genome assemblies (middle), resulting in a comprehensive catalog of sequence- and context-resolved variants. This facilitates integrative analysis and identification of new associations between variants and molecular phenotypes (bottom). SAS, South Asian; AMR, Admixed American; AFR, African; EUR, European; EAS, East Asian; INV, inversion; INS, insertion; DEL, deletion; MEI, mobile element insertion. Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population. [1]: /lookup/doi/10.1126/science.abf7117 [2]: pending:yes
Science and Medicine Group Inc
January 5, 2021
www.prnewswire.com:443
/PRNewswire/ -- The unpredictable events of 2020 might cause one to forswear all prediction-making. Yet for Science and Medicine Group's publishing brands...
Golden logo
Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions.