Anthropic is an AI safety and research start-up building AI systems. Investments in the company have reached over $1 billion, including a $300 million deal with Google in late 2022 that gives the search company a 10 percent stake in Anthropic. Google Cloud is also Anthropic's preferred cloud provider. The company's research interests include natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
Founded in January 2021, Anthropic was started by former employees of OpenAI. The start-up is made of a small group of researchers, engineers, policy experts, and operational leaders whose experience spans a range of fields. Most of the company's staff is based in the Bay Area of California. Anthropic has published a number of papers building on the experience of its team prior to joining the company. This includes GPT-3, circuit-based interpretability, multimodal neurons, scaling laws, AI & compute, concrete problems in AI safety, and learning from human preferences.
Anthropic is developing a conversational AI assistant called Claude, which was created using a technique called constitutional AI, which attempts to better align AI systems with human intentions. Anthropic imposed an embargo on media coverage of Claude that was lifted in January 2023. In February 2023, Anthropic released a waiting list for users who want early access to Claude. While Claude was in closed beta, Anthropic worked with a small group of key partners (including Notion, Quora, and DuckDuckGo) to test the AI assistant. On March 14, 2023, Anthropic announced Claude was becoming accessible, with companies able to request early access. In May 2023, Anthropic announced Claude's context window was increasing from nine thousand tokens to a hundred thousand tokens, corresponding to around 75,000 words. The increase means users can submit hundreds of pages of material to the AI assistant for analysis. On July 11, 2023, Anthropic announced Claude 2—a new model accessible to anyone in the US and UK through the model's beta chat experience. Claude 2 has improved performance and longer responses compared to previous versions of the model.
On May 28, 2021, Anthropic announced $124 million series A funding to support its research roadmap and build prototype AI systems. The funding will help the company's computationally-intensive research on large-scale AI models. The funding round was led by Jaan Tallinn, cofounder of Skype, with participation from James McClave, Dustin Moskovitz, the Center for Emerging Risk Research (CERR), Eric Schmidt, and others.
Less than a year after its series A funding, Anthropic raised $580 million in series B funding on April 29, 2022. The financing will help Anthropic develop the technical components needed to build large-scale models with improved implicit safeguards. The company is also building teams and partnerships to explore the societal impacts of large-scale AI models. The round was led by Sam Bankman-Fried. Other investors included Caroline Ellison, Jim McClave, Nishad Singh, Jaan Tallinn, and the CERR. Regarding the funding, Antrhopic CEO Dario Amodei said,
With this fundraise, we’re going to explore the predictable scaling properties of machine learning systems, while closely examining the unpredictable ways in which capabilities and safety issues can emerge at-scale... We’ve made strong initial progress on understanding and steering the behavior of AI systems, and are gradually assembling the pieces needed to make usable, integrated AI systems that benefit society.
Reports in Financial Times state that Google invested roughly $300 million in Anthropic in late 2022 for a 10 percent stake in the company. In February 2023, Anthropic announced Google Cloud as its preferred cloud provider.
On May 23, 2023, Anthropic announced $450 million in Series C funding led by Spark Capital with participation from Google, Salesforce Ventures, Sound Ventures, Zoom Ventures, and others. As part of the funding round, Yasmin Razavi (a General Partner at Spark Capital) joined Anthropic's Board of Directors. In the announcement, the company stated the funds will:
support our continued work developing helpful, harmless, and honest AI systems—including Claude, an AI assistant that can perform a wide variety of conversational and text processing tasks.
Anthropic was founded in January 2021 by former employees of OpenAI. Originally a team of seven moved from OpenAI to start the new company; this included the following:
- Dario Amodei (CEO & cofounder)—OpenAI's VP of research
- Daniela Amodei (president & cofounder)—OpenAI's VP of safety and policy
- Tom Brown (cofounder)—Member of technical staff at OpenAI and the lead engineer for GPT-3
- Sam McCandlish (cofounder)—Research lead at OpenAI
- Jared Kaplan (cofounder)—Research consultant at OpenAI
- Jack Clark (cofounder)—OpenAI's policy director
The company is led by siblings and cofounders Dario Amodei and Daniela Amodei, as CEO and president respectively. Anthropic registered in California on February 3, 2021. Other OpenAI employees who left to join Anthropic in the early stages of the company include the following:
- Research scientist Amanda Askell
- Software engineer Catherine Olsson
- Technical staffers Tom Henighan, Kamal Ndousse, Benjamin Mann, and Nicholas Joseph
In a podcast from 2022, Daniela and Dario Amodei explained the motivation behind starting the new company. Daniela:
I think the best way I would describe it is because all of us wanted the opportunity to make a focused research bet with a small set of people who were highly aligned around a very coherent vision of AI research and AI safety. So the majority of our employees had worked together in one format or another in the past, so I think our team is known for work like GPT-3 or DeepDream Chris Olah worked on at Google Brain for scaling laws. But we'd also done a lot of different safety research together in different organizations as well. So multimodal neurons when we were at OpenAI, Concrete Problems in AI Safety and a lot of others, but this group had worked together in different companies at Google Brain and OpenAI and academia in startups previously, and we really just wanted the opportunity to get that group together to do this focused research bet of building steerable, interpretable and reliable AI systems with humans at the center of them.
We were all working at OpenAI and trying to make this focused bet on basically scaling plus safety or safety with a lens towards scaling being a big part of the path to AGI. And when we felt we were making this focused bet within a larger organization and it just eventually came to the conclusion that it would be great to have an organization like top to bottom was just focused on this bet and could make all its strategic decisions with this bet in mind. And so that was the thinking and the genesis.
Shortly after its founding, the company announced $124 million in series A funding in May 2021. Less than a year later, in April 2022, the company announced a $580 million series B round. At the time of the second funding round, the company had grown to roughly forty people based in an office in San Francisco, California. In late 2022, Google invested $300 million in Anthropic in exchange for a 10 percent stake in the company.
On December 15, 2022, Anthropic released details on a new technique they developed called constitutional AI. The technique details a list of rules or principles that act as the only human oversight used to train AI assistants.
In early 2023, Anthropic began publicly deploying its technology, a language model assistant named Claude, following the lifting of a media coverage embargo. Claude had been made available as a Slack integration during its closed beta release. Claude utilizes reinforcement learning from human feedback (RLHF) with a range of safety techniques built by Anthropic. The company is also working with multiple partners to deploy Claude and expand access to the assistant in the future.
On February 3, 2023, Anthropic and Google Cloud announced a partnership in which Google Cloud will become the AI start-up's preferred cloud provider. Claude and other Anthropic AI systems will run on Google Cloud moving forward. The partnership is designed so companies co-develop AI computing systems, and Anthropic can leverage the power of Google Cloud's TPUs and GPUs to train, scale, and deploy its systems. In a short announcement on Anthropic's website, CEO Dario Amodei stated:
We're partnering with Google Cloud to support the next phase of Anthropic, where we're going to deploy our AI systems to a larger set of people... This partnership gives us the cloud infrastructure performance and scale we need.
The day after the announcement, Anthropic announced a waiting list for users who want to get early access to its AI assistant.
Anthropic has conducted research in a number of areas, aiming to make AI systems more steerable, robust, and interpretable. This includes the following:
- Mathematically reverse engineering the behavior of small language models to understand the source of pattern-matching behavior exhibited by large language models
- Developing baseline techniques to make large language models more helpful, including reinforcement learning
- Releasing a dataset for other research labs to train models more aligned with human preferences
- Releasing an analysis of sudden changes in performance in large language models and their societal impacts
The company's website has a list of papers from its researchers.
Anthropic has four research principles that guide its work:
Inspired by the universality of scaling in statistical physics, Anthropic develops scaling laws to produce systematic, empirically-driven research. This includes searching for simple relationships within data, parameters, and the performance of large networks, then leveraging these relationships to produce more efficient and predictable networks. The company is also investigating what scaling laws may look like for the safety of AI systems.
As neural networks get larger and demonstrate better performance, they bring new safety challenges. Anthropic studies safety issues with large models to make them more reliable and ensure safe deployments. This includes prototyping systems that pair with safety techniques.
Anthropic evaluates their research to determine its potential impact on society. They build tools to evaluate and understand the capabilities and limitations of their AI systems.
Anthropic collaborates on its projects and aims for a mixture of top-down and bottom-up research planning. This includes researchers, engineers, societal impact experts, and policy analysts, and working with other labs to improve research into characterizing systems.
Constitutional AI is a technique developed by Anthropic to train language models to be better at responding to adversarial questions. The technique conditions language models using a simple set of behavioral principles. The name "Constitutional AI" was chosen to emphasize that general-purpose AI systems will always operate according to some principles, whether they are implicit or encoded in privately held data. In their paper describing the technique, Anthropic researchers used an ad hoc constitution drafted only for research purposes. The company believes the constitutions used by AI systems should be defined not only by researchers but by a group of experts from multiple disciplines working together.
When language models are trained to be harmless, they often lose value when posed with adversarial questions. Constitutional AI provides a simple set of principles to guide language models in how to respond to these types of questions. While previous techniques with the same aim required tens of thousands of human feedback labels, constitutional AI only needs a few dozen principles and examples to train less harmful language assistants.
The process involves both a supervised learning and reinforcement learning phase. During the supervised phases, the initial model is sampled to generate self-critiques and revisions before finetuning based on revised responses. During reinforcement learning, the finetuned model is sampled, then another model is used to evaluate which of the two samples is better. Then a preference model is trained from this dataset. The preference model is then trained using reinforcement learning to produce a reward signal, i.e., using reinforcement learning from AI feedback (RLAIF). The result is the ability to train a harmless yet nonevasive AI assistant, to engage with harmful queries by explaining objections to them. The basic steps of constitutional AI are shown below, including both the supervised learning (SL) and reinforcement learning (RL) stages. Red-teaming refers to testing the model by writing prompts that are likely to elicit a harmful response.
Anthropic states constitutional AI has five motivations:
- Make the goals and objectives of AI systems more transparent
- Make AI decision-making more transparent
- Use a much smaller quantity of high-quality human supervision when training AIs
- Fully automate red-teaming and train much more robust AI systems
- Explore “Scaling Supervision” by allowing AI systems to help humans to ensure that other AI systems remain safe
By changing the principles provided, constitutional AI can fix issues with AI behavior or target new goals in a few days, quicker than finetuning using large RLHF datasets.
Anthropic began testing its AI assistant, Claude, in a closed beta. Details of its performance were under a media embargo that was lifted in January 2023. The language model was created using constitutional AI, although the specific principles have not been made public. Claude behaves similarly to other language models, providing responses to user queries with the ability to generate content and hold an open-ended conversation.
While Claude initially had an input limit of nine thousand tokens, in May 2023, Anthropic announced an increase to one hundred thousand, which corresponds to around 75,000 words. Users can now submit hundreds of pages of material to Claude for analysis, and conversations can last significantly longer, lasting hours or even days. For example, businesses could input large business documents to Claude, asking questions that require synthesis of knowledge across many parts of the text. Other potential use cases include the following:
- Summarizing or explaining dense documents
- Analyzing business reports for potential risks and opportunities
- Digesting hundreds of pages of developer documentation to answer specific technical questions
- Rapidly prototyping during development by inputting an entire codebase into Claude's context window
Since the end of the media embargo, a number of people with access to the beta have published comparisons of Claude to the popular chatbot ChatGPT. Results showed that Claude can produce impressive, if not perfect, prose that could be mistaken for being human-generated. Generally, Claude seemed to follow user requests more closely but was less concise. Anthropic's assistant also can admit when it doesn't know the answer to a difficult query and is better at telling jokes than ChatGPT. However, Claude is still susceptible to similar flaws present when using ChatGPT. This includes offering answers beyond its programmed constraints and "hallucination," the problem where an AI system writes inconsistent, factually incorrect statements. Users have also reported that Claude is worse at math and generating code compared to ChatGPT.
Constitutional AI: Harmlessness from AI Feedback
December 15, 2022
Daniela and Dario Amodei on Anthropic - Future of Life Institute
March 4, 2022