Golden logo
    Create a WorkspaceQuery ToolSaved QueriesData RequestsListsPipelinesExploreFollowed Topics
    Upgrade to ProPricingAPI AccessHelp & Support
Log in
Sign up
NEW: You can now make knowledge queries using our new natural language prompt. Try it out now!
⟶
Question answering

Question answering

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.

OverviewStructured DataIssuesContributorsActivity
Contents
Overview

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP). QA systems enable users to retrieve exact answers for questions posed in natural language, using either a pre-structured database or a collection of natural language documents.

QA systems can be considered an advanced form of information retrieval that makes it possible to retrieve answers using natural language queries. With an increasing demand for systems that deliver short, precise, question-specific answers, QA is a growing area of research worldwide.

Question answering system architecture

Question answering system architecture

Architecture

QA system architecture is typically broken down into three modules:

  • Question processing
  • Document processing
  • Answer processing
Question processing

Question processing receives the input from the user (question in natural language) for analysis (obtaining preliminary information), classification, and reformulation.

Question classification breaks down the type of question to better understand the context for the answer. There are two main approaches to question classification: manual and automatic.

Manual classification applies hand-made rules for identifying expected answer types. While these rules can be accurate, they are time-consuming and non-extensible in nature. Some manual approaches improve answer detection by breaking down the question type into

  • What questions
  • Why questions
  • Who questions
  • How questions
  • Where questions

In contrast, automatic classifications are extensible to new questions types with acceptable accuracy.

Reformulation of the question converts it into a pre-trained vector with several examples of question and answer pairs. The main types of answer provided by QA systems include the following:

  • Factoid—a simple fact
  • List—a set of entities that satisfies the given criteria defined in the question
  • Definition—a summary of a short passage explaining the meaning of the subject/object of the question
  • Complex question—utilizes information in its context to usually merge retrieved passages using a range of techniques.
Document processing

Document processing takes the reformulated question as its input and uses an internal information retrieval system to map the closest documents to the input presented. A set of paragraphs, depending on the focus of the questions, are extracted and sorted according to their similarity and relevance to the question.

The document processing module includes three main tasks:

  1. Retrieve a set of relevant documents from the IR system
  2. Filter the documents and reduce them to a concise set of paragraphs
  3. Order and rank the documents by similarity and relevance to the question
Answer processing

This module uses extraction techniques on the result from the document processing module to present an answer to the question. While it returns a simple answer to the question, it may require merging and summarizing information from different sources, as well as dealing with uncertainty or contradiction.

Answer processing can be broken down into three major tasks:

  1. Identify statements/answers within the concise set of documents.
  2. Extract the relevant output by selecting appropriate phrases and words that answer the question.
  3. Validate the answer obtained in the previous step using evaluation metrics defined during the design of the QA system.
Types of question answering systems
Web-based

Web-based question answering systems use search engines to retrieve webpages potentially containing answers to the

questions before applying filters and ranking the recovered passages. The data available on the web has the

characteristics of semi-structure, heterogeneity, and distributivity.

Natural language processing (NLP)

NLP QA systems use linguistic intuitions and machine learning methods to extract answers from retrieved passages.

Knowledge-based

This type finds answers from structured data sources (knowledge base) instead of unstructured text. Standard data-based queries are used in replacement of word-based searches. This type of system makes use of structured data, such as ontology. An ontology describes a conceptual representation of concepts and their relationships within a specific domain.

Hybrid

High-performance QA systems use multiple types of resources. A hybrid approach uses a combination of web-based, NLP, and knowledge-based QA.

Techniques

A range of techniques, algorithms, frameworks, and tools are utilized in QA systems:

  • Deep neural network
  • Graph-based
  • Lemmatization
  • Latent Semantic Analysis (LSA)
  • Multi-document summarization
  • Naive Bayes
  • Named entity recognition
  • Parser
  • Part-of-speech (POS) Tagging
  • Relation finding (Similarity Distance)
  • Shallow syntactical
  • Stemming
  • Support vector machine
  • Text chunking
  • Tokenization
Datasets

Training a QA system requires large datasets. There are many publicly available text and graph-based datasets that have been generated through crowd-sourcing or manual annotation.

NLP Question Answering Datasets

Four possible outcomes from a QA system.

Four possible outcomes from a QA system.

Evaluation metrics

There are many methods for evaluating the performance of QA systems. Metrics are based on the difference between the actual answer and the predicted answer the system returns, shown by a 2 x 2 contingency table.

  • True positive—fragment correctly selected
  • False negative—fragment incorrectly not selected
  • False positive—fragment incorrectly selected
  • True negative—fragment correctly not selected

Basic evaluation metrics (F1, precision, and recall) can be calculated from the rate of these occurrences.

Applications

With the amount of information available online, there has been a rise in the use of automated answering systems that can accurately extract information. These systems have a range of applications:

  • Customer support
  • Education
  • Search engines
  • Data analytics

Question Answering Companies & Research Entities

Open in Query Tool

Prominent QA Researchers

Timeline

Companies in this industry

Open in Query Tool

Further Resources

Title
Author
Link
Type
Date

A literature review on question answering techniques, paradigms and systems

Marco Antonio Calijorne Soares, Fernando Silva Parreiras

https://www.sciencedirect.com/science/article/pii/S131915781830082X

Web

July 2020

Question Answering Systems: Survey and Trends

Abdelghani Bouziane, Djelloul Bouchiha, Noureddine Doumi, Mimoun Malkic

https://www.sciencedirect.com/science/article/pii/S1877050915034663

Web

2015

References

Is a
Technology
Technology
Industry
Industry

Industry attributes

Parent Industry
Natural language processing (NLP)
Natural language processing (NLP)

Technology attributes

Related Industries
Computer science
Computer science
‌
Computational linguistics

Other attributes

Wikidata ID
Q1074173
Short Name
QA
Golden logo
Company
HomePress & MediaBlogCareers
We're hiring
Products
OverviewKnowledge GraphQuery ToolData RequestsKnowledge StorageAPIPricingEnterpriseProtocol
Legal
Terms of ServiceEnterprise Terms of ServicePrivacy Policy
Help
Help centerAPI DocumentationContact Us
By using this site, you agree to our Terms of Service.