Log in
Enquire now
AI alignment

AI alignment

AI alignment is a field of AI safety research focused on developing AI systems to follow the user's desired behavior and achieve their desired outcomes, ensuring the model is "aligned" with human values.

OverviewStructured DataIssuesContributors

Contents

Is a
Industry
Industry

Industry attributes

Parent Industry
Artificial Intelligence (AI)
Artificial Intelligence (AI)
AI safety
AI safety
Overview

AI alignment is a field of AI safety research focused on developing AI systems to follow the user's desired behavior and achieve their desired outcomes, ensuring the model is "aligned" with human values. The AI alignment problem is an issue related to how we can encode AI models to make them act in a way that is compatible with human moral values. While AI models are written to efficiently and effectively perform tasks valuable to the user, they do not have the ability of judgment, inference, or understanding in the way a human would naturally do. This problem becomes more complex when the system has multiple values to prioritize in the system, making it impossible to maximize both.

AI alignment research has the following objectives types:

  1. Intended goals—These are goals fully aligned with the intentions and desires of the human user, even when poorly articulated. It's the hypothetical ideal outcome for the user.
  2. Specified goals—These are explicitly specified in the AI system's objective function or data set; they are programmed into the system.
  3. Emergent goals—These are the resulting goals the AI system advances.

Misalignment occurs when one or more of these goal types does not match the others, generally divided into two main types:

  • Inner misalignment—A mismatch between goals 2 and 3; what is written in the code does not match what the system advances.
  • Outer misalignment—A mismatch between goals 1 and 2; what the operator wants to happen does not match the explicit goals coded into the machine.

The alignment problem was first described in a 2003 thought experiment by philosopher Nick Bostrom. He imagined a super-intelligent AI that was tasked with producing as many paper clips as possible. Bostrom suggests the AI may quickly decide to kill all of humanity to prevent them from switching it off and getting in the way of its mission or as a way to harvest more resources to convert into more paper clips. While absurd, the thought experiment illustrates how AI doesn't have inherent human values, and the systems may optimize what we ask for using unexpected or harmful methods. With the release and widespread use of generative AI models, AI alignment is becoming increasingly important, with the developers of models creating methods to ensure their technology behaves as desired, limiting the impact of misinformation or bias.

The alignment problem comes from the disconnect between how we want AI models to behave and translating that into the numerical logic of computers. It can be divided into the technical aspect of encoding values and principles into AI in a reliable manner and the process of deciding what moral values or principles should be encoded.

Timeline

No Timeline data yet.

Companies in this industry

Further Resources

Title
Author
Link
Type
Date
No Further Resources data yet.

References

Find more entities like AI alignment

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.