Product Updates

We Now Track 25+ AI Crawlers (Not Just OpenAI)

Manuel YangManuel YangDecember 25, 20244 min read

TL;DR: We expanded our AI crawler detection from 13 patterns to 25+. Now tracking Microsoft (Copilot, Bing), Meta AI, Anthropic (Claude), Apple, Brave, Cohere, Mistral, and more. All matching is case-insensitive to catch variations.

Everyone talks about OpenAI's crawlers. GPTBot this, OAI-SearchBot that. Makes sense—ChatGPT dominates the conversation.

But OpenAI isn't the only game in town. Microsoft Copilot is baked into Windows and Office. Perplexity is eating into Google's search share. Claude has a growing user base. Meta AI ships on every Facebook and Instagram app.

If you're only tracking OpenAI, you're missing the picture.

What We Added

Here's the full breakdown of AI systems we now detect:

Major AI Assistants

Company Crawlers/Agents What They Do
OpenAI GPTBot, OAI-SearchBot, ChatGPT-User Training, search indexing, user browsing
Microsoft BingBot, BingPreview Copilot search, preview generation
Anthropic ClaudeBot, Claude-Web, Claude-SearchBot Training, web browsing, search
Google Google-Extended Gemini/Bard training data
Perplexity PerplexityBot, Perplexity-User Search indexing, user queries
Meta FacebookBot, Meta-ExternalAgent Meta AI training and retrieval

Other AI Systems

We also track:

  • Apple — Applebot (Siri and Apple Intelligence)
  • Brave — Brave Search crawler
  • Cohere — Enterprise AI crawler
  • Mistral — European AI lab crawler
  • ByteDance — TikTok's AI systems
  • Amazon — Alexa and AWS AI features

Plus general crawlers like Common Crawl and Internet Archive that feed into AI training datasets.

Click-Through Tracking Too

Crawlers are half the story. We also detect when users click through from AI chat interfaces to your site.

Referrer domains we track:

  • chatgpt.com / chat.openai.com
  • claude.ai
  • perplexity.ai
  • gemini.google.com / bard.google.com
  • copilot.microsoft.com
  • bing.com/chat
  • meta.ai
  • you.com, phind.com, poe.com
  • huggingface.co

When someone clicks a link in ChatGPT or Perplexity and lands on your site, we log it. Different from a crawler visit. This is actual traffic from AI interfaces.

Three Event Types

Every detection gets categorized:

  1. Crawler — Automated bot indexing or training
  2. Click — User clicked through from an AI chat
  3. Agent — Custom GPT or AI agent fetching your content (detected via Signature-Agent header)

The distinction matters. A crawler visit means you're being indexed. A click means someone found you through AI and visited. An agent means a Custom GPT or automation is pulling your content.

Case-Insensitive Matching

Small detail, big impact. User-agent strings aren't consistent. Some bots report as GPTBot, others as gptbot. We now match case-insensitively to catch all variations.

const userAgentLower = userAgent.toLowerCase()
for (const [pattern, source] of Object.entries(AI_BOTS)) {
  if (userAgentLower.includes(pattern)) {
    return { source, type: 'crawler' }
  }
}

Why Track All This?

Two reasons:

1. Know who's indexing you. If ClaudeBot visits daily but OAI-SearchBot never shows up, you know where to focus. Maybe your robots.txt is blocking one but not the other. Maybe one AI system likes your content more than another.

2. Measure AI traffic, not just crawls. Crawler visits don't mean citations. Click-throughs do. If Perplexity sends you 100 visitors but ChatGPT sends zero, that tells you something about where your content actually appears.

The Events Dashboard

All this data shows up in your Events dashboard. Color-coded by source:

  • Green — OpenAI (ChatGPT, GPTBot)
  • Orange — Anthropic (Claude)
  • Blue — Microsoft (Copilot, Bing)
  • Indigo — Perplexity
  • Cyan — Meta AI
  • Sky — Google (Gemini, Bard)

Each event includes the path visited, location data, and timestamp. Hover for details on what each crawler does.

What's Next

We're working on aggregated analytics—daily/weekly trends, top pages by AI source, geographic breakdown. The raw events are useful, but patterns over time tell the real story.

For now, the foundation is there. Every major AI system hitting your site gets logged.


Want to see which AI systems are visiting your content? Check your Events dashboard or run a URL through our Citation Analyzer to test if you're being cited.