Product attributes
GPTBot is a web crawler from OpenAI that gathers data that may potentially be used to improve the company's future AI models. Web pages are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates OpenAI policies. GPTBot aims to gather data to help improve AI model accuracy as well as their general capabilities and safety. OpenAI allows websites to disallow GPTBot from accessing its site.
OpenAI's documentation states GPTBot and can be identified by the following user agent and string:
User agent token: GPTBot Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
Website owners can disallow GPTBot access by adding it to the site's robots.txt:
User-agent: GPTBot Disallow: /
GPTBot access can also be customized, allowing access to only parts of a site. This is done by adding the GPTBot token to a site’s robots.txt like this:
User-agent: GPTBot Allow: /directory-1/ Disallow: /directory-2/
GPTBot was launched on August 7, 2023.

