In contrast to the human genome, which consists of 3 billion bases made of 4 nucleic acids organized in a one dimensional space, the human phenome contains an unknown number of elements with variation and dimensionality only partly understood. The scientific understanding of genes and genomic variation is restricted by a narrow range of methods to assess phenotypes allowing only certain anatomical and behavioral traits to be recorded, which is often done manually. Phenomics aims to get a more in-depth and unbiased assessment of phenotypic profiles at the whole organism level.
The field of phenomics recognizes a need for consensus on human- and machine-interpretable language to describe phenotypes. Lack of standardization and computability across phenotype data make sharing phenotypic data difficult and may result in missed opportunities for discoveries and a large amount of phenotype data are not publicly available. A computable format may involve the use of appropriate ontology terms for representing phenotypic descriptions in text or data sources.
In human phenomics, two of the aims are to understand how the environment makes people more or less susceptible to disease and to understand individual reactions to therapies. In 2003, at the time of completion of the Human Genome Project, the need for more precisely defined phenotypes and high-throughput systems to fully take advantage of genotyping studies was projected and the creation of an international Human Phenome Project was proposed. The theme for a satellite meeting at the 2012 Annual Meeting of the American Society of Human Genetics in San Francisco by The Human Variome Project was “Getting Ready for the Human Phenome Project”. The meeting was cosponsored by the Human Genome Variation Society. At that meeting it was noted that in comparison to phenome projects in model organisms such as mouse, rat and zebrafish which have compiled phenotypic data on the consequences of genetic mutations, similar scale efforts for humans lagged behind.
In humans, data on genotype-phenotype associations has been generated through genome-wide association studies (GWAS) or linking single nucleotide polymorphisms (SNPs) with disease phenotypes. To reduce the disease-centric bias in these approaches, an effort was made to look for associations in complete medical records and complete genome sequences, called Phenome Wide Association Studies (PheWAS). However human phenotype data has a strong clinical bias.
The introduction of genetic changes in animal model systems allow for unbiased interrogation of genotype-phenotype interactions. The Mouse Phenome Project was the first major effort in a vertebrate model to catalog baseline phenotypic data, which is housed at the Mouse Phenome Database at the Jackson Laboratory, Bar Harbor, ME. The Knockout Mouse Project is a National Institutes of Health (NIH) initiative which aims to generate a resource containing loss of function mutations for every gene in the mouse genome correlated with phenotypic data.
Most human genes (70%) have a counterpart (ortholog) in zebrafish, which combined with their short generation time, standard practises for genetic manipulation and suitability for live imaging make them cost-effective in biomedical research. The Zebrafish Phenome Project is underway and contribute to knowledge about phenotype-genotype associations and genetic diagnosis of human disease. The Chemical Phenomics Initiative, based on chemical genetics, is a high-throughput chemical screen for small molecules that modulate early embryonic development in zebrafish, carried out by the Hong lab at Vanderbilt University. Pharmacological targets for these small molecule developmental modulators are identified and made accessible to the scientific community through the chemotype-phenotype database on Chemical Phenomics interactive web portal. In zebrafish whole body micro-CT scanning has been used for skeletal phenomics studies.
As DNA sequencing became faster and cheaper, new knowledge about normal genetic variation in the human population allowed genetic variants once thought to cause disease to be reclassified as benign. The ranges and commonality of variations in human phenotypes if better understood could improve the accuracy and treatment of disease in genetic medicine. The Human Phenotype Ontology helps the sharing of phenotype data by standardising the vocabulary for phenotypes. For some types of phenotypic abnormalities, standardized measurements can be used to define the phenotype. To show a causal link between a genetic variant and an abnormal phenotype, it needs to be shown that the two are found together more often then expected by chance. Improvements in data about baseline population frequencies of phenotypes are needed for these calculations.
The International Human Phenome Project (Phase I) was launched in Shanghai in March 2018. The project will be led by Fudan University with collaboration with Shanghai Jiao Tong University, Shanghai Institute of Measurement and Testing Technology and Shanghai Institutes for Biological Sciences.
The following projects promote standardization and sharing of phenotypic data related to humans and model organisms:
- The Human Phenotype Ontology (HPO)
- The Human Variome Project
- PhenX Toolkit
- International Mouse Phenotyping Consortium (IMPC)
- National Phenome Centre
- International Phenome Centre Network
- UK Biobank
- The Personal Genome Project
- National Bio Resource Project (NBRP) Rat Phenome
- Inborn Errors of Metabolism Knowledgebase (IEMbase)
- Chemical Phenomics Initiative
- The Phenomics Discovery Intitiative (PDi)
- Consortium for Neuropsychiatric Phenomics (CNP)
- Mouse Phenome Database
- The Knockout Mouse Project
- Phenome Wide Association Studies (PheWAS)
- Chemical Phenomics Initiative
- Definiens (Tissue Phenomics Company)
- Plant Ontology
- Gene Ontology Consortium
Plant phenomics is used both to understand how crops respond to environmental changes and for crop improvement. Connections between plant genotype and phenotype were historically investigated by identifying a trait of interest and then using DNA markers and breeding to locate the gene responsible for the trait. Seeking the gene responsible for a phenotype is called a forward genetics approach. Reverse genetics approaches which mutate or alter genes first to find the phenotypic consequence of specific genetic changes became more commonly used with the development of mutagens, molecular genetics and bioinformatics. As the price of image data collection has gone down and the capability for computational image processing has increased, plant phenomics researchers are investigating relationships between genotype, phenotype and environment with satellite and drone images. One hurdle is in developing computational methods to extract useful information. Researchers at Iowa State University are using crowdsourcing to for image labeling used to train machine learning algorithms. The team used students and Amazon MTurkers for image labeling.
Researchers at University of Saskatchewan developed the open-source software platform, Deep Plant Phenomics, which uses deep convolutional neural networks for phenotyping plants. The platform was shown to be effective at leaf counting, mutant classification and age regression in top-down images of plant rosettes.
- National Plant Phenomics Centre (IBERS Gogerddan, Wales, UK)
- PHENOME, the French plant phenomic Infrastructure
- Australian Plant Phenomics Facility
- The European Infrastructure for Multi-scale Plant Phenomics and Simulation (EMPHASIS)
- International Plant Phenotyping Network
- North American Plant Phenotyping Network
- Qubit Phenomics
- Zegami Ltd
Computational approaches are being developed to gather, compare and process phenomics data. Machine learning methods are used for analysing images such as satellite images of plants, medical histology images and words describing medical conditions. For comparison of phenotypes across different organisms, formal ontologies are implemented that are accessible to automated reasoning. Phenotype ontologies are hierarchically-related phenotypic descriptions using controlled vocabulary that allows computation in individuals, populations and across multiple species. Ontologies are being developed in Web Ontology Language (OWL) and OBO Flatfile Format.
Documentaries, videos and podcasts
- Recursion PharmaceuticalsA biotechnology and data science company based in Salt Lake City, Utah, founded in 2013, that combines biology with artificial intelligence for drug discovery. Using human cell models of diseases, Recursion captures microscopic images to build biological datasets and computational techniques identify disease-associated biological changes.