AI SEO

Understanding OpenAI's Web Crawlers: GPTBot, OAI-SearchBot, and ChatGPT-User

Manuel YangManuel YangDecember 22, 20246 min read

TL;DR: OpenAI runs three crawlers: OAI-SearchBot (for ChatGPT search results), GPTBot (for training data), and ChatGPT-User (user-triggered browsing). Block OAI-SearchBot and you won't appear in ChatGPT search answers. Block GPTBot to opt out of training without affecting search visibility. You can allow one while blocking another.

OpenAI operates three distinct web crawlers, each serving a different purpose. Understanding the difference between them matters for managing your AI search visibility and deciding what gets indexed, what gets used for training, and what appears in ChatGPT's search results.

The Three OpenAI Crawlers at a Glance

Crawler Purpose robots.txt Token
OAI-SearchBot ChatGPT search results OAI-SearchBot
GPTBot AI model training GPTBot
ChatGPT-User User-triggered browsing ChatGPT-User

Each crawler operates independently. You can allow one while blocking another. For example, appearing in search results without contributing to training data.

OAI-SearchBot: The Search Crawler

OAI-SearchBot is responsible for surfacing websites in ChatGPT's search features. When someone asks ChatGPT a question and it searches the web, OAI-SearchBot is what indexed that content beforehand.

Key points:

  • Sites that block OAI-SearchBot won't appear in ChatGPT search answers
  • Blocked sites can still appear as navigational links, but not as cited sources
  • Changes to your robots.txt take about 24 hours to take effect

Full user-agent string:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot

OpenAI publishes the IP addresses OAI-SearchBot uses at openai.com/searchbot.json, which you can use for firewall allowlisting.

GPTBot: The Training Crawler

GPTBot crawls content that may be used for training OpenAI's generative AI foundation models. This is separate from search. Blocking GPTBot doesn't affect whether your content appears in ChatGPT search results.

Key points:

  • Disallowing GPTBot signals your content shouldn't be used for AI training
  • This is independent from search visibility
  • If you allow both bots, OpenAI may use a single crawl for both purposes to reduce server load

Full user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot

IP addresses are published at openai.com/gptbot.json.

ChatGPT-User: User-Triggered Browsing

ChatGPT-User is different from the other two. It's not an automatic crawler. This user-agent appears when a ChatGPT user explicitly asks ChatGPT to visit a webpage, or when Custom GPTs use GPT Actions to fetch external content.

Key points:

  • Triggered by user actions, not automated crawling
  • robots.txt rules may not apply since the request is user-initiated
  • Does NOT determine search visibility (use OAI-SearchBot for that)
  • Commonly seen when users ask "read this article" or "summarize this page"

Full user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

How Datagum Tracks These Crawlers

At Datagum, we track all three OpenAI crawlers to help you understand how AI systems interact with your content. Our middleware detects these bots via the User-Agent header:

const AI_BOTS: Record<string, string> = {
  // OpenAI
  'GPTBot': 'openai-gptbot',
  'ChatGPT-User': 'openai-chatgpt-user',
  'OAI-SearchBot': 'openai-searchbot',
  // ... other AI bots
}

When we detect an OpenAI crawler, we log:

  • Which crawler visited (search, training, or user-triggered)
  • What page was accessed
  • When the crawl occurred
  • Full headers for detailed analysis

This tracking reveals patterns you can't see otherwise:

  • Is OAI-SearchBot indexing your key pages?
  • How often does GPTBot visit your site?
  • Are users asking ChatGPT to read your content?

We also track when users click through from ChatGPT to your site, measuring not just crawl activity, but actual traffic from AI interfaces.

Configuring Your robots.txt

Here's how to control each crawler:

Allow all OpenAI crawlers (recommended for maximum visibility)

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

Allow search but block training

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

Block all OpenAI crawlers

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

Recommendation: If you want your content to appear in ChatGPT search results, allow OAI-SearchBot. The decision on GPTBot (training) is separate and depends on your stance on AI training data.

What This Means for AI SEO

Understanding these crawlers changes how you think about AI visibility:

  1. Search visibility requires OAI-SearchBot access. Block it and you won't appear in ChatGPT's search answers, only as navigational links.

  2. Training opt-out doesn't hurt search. You can block GPTBot while allowing OAI-SearchBot. Many publishers choose this approach.

  3. User-triggered requests are different. When someone asks ChatGPT to read your page directly, that's ChatGPT-User, and it's harder to block since it's user-initiated.

  4. Tracking reveals the full picture. Without monitoring, you don't know which bots are actually visiting your site, how often, or what they're accessing.

Datagum helps you see this activity. Our Citation Analyzer tests whether your content gets cited in ChatGPT responses, and our crawler tracking shows you the bot activity happening on your site.

Frequently Asked Questions

How do I block GPTBot but allow OAI-SearchBot?

Add this to your robots.txt:

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

This opts out of AI training while maintaining search visibility in ChatGPT.

How long do robots.txt changes take effect?

OpenAI says about 24 hours for changes to propagate. If you recently updated your robots.txt, give it a day before testing.

Will blocking GPTBot hurt my ChatGPT visibility?

No. GPTBot is for training data collection. OAI-SearchBot handles search. Blocking GPTBot only affects whether your content is used for training future models—it doesn't affect whether ChatGPT can cite you in search results.

Can I see which OpenAI bots are visiting my site?

Yes. Check your server logs for the user-agent strings listed above. Or use our tracking middleware to automatically detect and log OpenAI crawler visits with full details.

What about other AI crawlers like ClaudeBot or PerplexityBot?

Each AI company has their own crawlers. Claude uses ClaudeBot (with Brave Search as its backend). Perplexity runs PerplexityBot. The concepts are similar—check each provider's documentation for their specific robots.txt tokens.

Is ChatGPT-User the same as OAI-SearchBot?

No. ChatGPT-User appears when a user explicitly asks ChatGPT to visit a webpage (like "read this article"). It's user-initiated, not automated crawling. OAI-SearchBot is what indexes content for ChatGPT's search feature before any user query.


Want to see how your content performs in AI search? Try our Citation Analyzer to test your URLs and discover whether ChatGPT is citing your content.

Related Articles