We Now Track 25+ AI Crawlers (Not Just OpenAI)
TL;DR: We expanded our AI crawler detection from 13 patterns to 25+. Now tracking Microsoft (Copilot, Bing), Meta AI, Anthropic (Claude), Apple, Brave, Cohere, Mistral, and more. All matching is case-insensitive to catch variations.
Everyone talks about OpenAI's crawlers. GPTBot this, OAI-SearchBot that. Makes sense—ChatGPT dominates the conversation.
But OpenAI isn't the only game in town. Microsoft Copilot is baked into Windows and Office. Perplexity is eating into Google's search share. Claude has a growing user base. Meta AI ships on every Facebook and Instagram app.
If you're only tracking OpenAI, you're missing the picture.
What We Added
Here's the full breakdown of AI systems we now detect:
Major AI Assistants
| Company | Crawlers/Agents | What They Do |
|---|---|---|
| OpenAI | GPTBot, OAI-SearchBot, ChatGPT-User | Training, search indexing, user browsing |
| Microsoft | BingBot, BingPreview | Copilot search, preview generation |
| Anthropic | ClaudeBot, Claude-Web, Claude-SearchBot | Training, web browsing, search |
| Google-Extended | Gemini/Bard training data | |
| Perplexity | PerplexityBot, Perplexity-User | Search indexing, user queries |
| Meta | FacebookBot, Meta-ExternalAgent | Meta AI training and retrieval |
Other AI Systems
We also track:
- Apple — Applebot (Siri and Apple Intelligence)
- Brave — Brave Search crawler
- Cohere — Enterprise AI crawler
- Mistral — European AI lab crawler
- ByteDance — TikTok's AI systems
- Amazon — Alexa and AWS AI features
Plus general crawlers like Common Crawl and Internet Archive that feed into AI training datasets.
Click-Through Tracking Too
Crawlers are half the story. We also detect when users click through from AI chat interfaces to your site.
Referrer domains we track:
chatgpt.com/chat.openai.comclaude.aiperplexity.aigemini.google.com/bard.google.comcopilot.microsoft.combing.com/chatmeta.aiyou.com,phind.com,poe.comhuggingface.co
When someone clicks a link in ChatGPT or Perplexity and lands on your site, we log it. Different from a crawler visit. This is actual traffic from AI interfaces.
Three Event Types
Every detection gets categorized:
- Crawler — Automated bot indexing or training
- Click — User clicked through from an AI chat
- Agent — Custom GPT or AI agent fetching your content (detected via Signature-Agent header)
The distinction matters. A crawler visit means you're being indexed. A click means someone found you through AI and visited. An agent means a Custom GPT or automation is pulling your content.
Case-Insensitive Matching
Small detail, big impact. User-agent strings aren't consistent. Some bots report as GPTBot, others as gptbot. We now match case-insensitively to catch all variations.
const userAgentLower = userAgent.toLowerCase()
for (const [pattern, source] of Object.entries(AI_BOTS)) {
if (userAgentLower.includes(pattern)) {
return { source, type: 'crawler' }
}
}
Why Track All This?
Two reasons:
1. Know who's indexing you. If ClaudeBot visits daily but OAI-SearchBot never shows up, you know where to focus. Maybe your robots.txt is blocking one but not the other. Maybe one AI system likes your content more than another.
2. Measure AI traffic, not just crawls. Crawler visits don't mean citations. Click-throughs do. If Perplexity sends you 100 visitors but ChatGPT sends zero, that tells you something about where your content actually appears.
The Events Dashboard
All this data shows up in your Events dashboard. Color-coded by source:
- Green — OpenAI (ChatGPT, GPTBot)
- Orange — Anthropic (Claude)
- Blue — Microsoft (Copilot, Bing)
- Indigo — Perplexity
- Cyan — Meta AI
- Sky — Google (Gemini, Bard)
Each event includes the path visited, location data, and timestamp. Hover for details on what each crawler does.
What's Next
We're working on aggregated analytics—daily/weekly trends, top pages by AI source, geographic breakdown. The raw events are useful, but patterns over time tell the real story.
For now, the foundation is there. Every major AI system hitting your site gets logged.
Want to see which AI systems are visiting your content? Check your Events dashboard or run a URL through our Citation Analyzer to test if you're being cited.