How We Track AI Bot Events (And What We Learned)
TL;DR: We track 25+ AI crawlers plus click-throughs from ChatGPT, Perplexity, and Claude. Building this revealed some surprises: Gemini doesn't crawl in real-time like ChatGPT does, internal API calls can pollute your click data, and fire-and-forget tracking doesn't work on edge runtimes. Here's how we solved each problem.
If you're optimizing for AI search visibility, you need to know what's actually happening. Which AI bots are crawling your site? Are users clicking through from ChatGPT? Is Perplexity indexing your content?
Google Analytics won't tell you. Traditional SEO tools don't track this. So we built it ourselves.
The Three Types of AI Traffic
Not all AI traffic is the same. We categorize it into three types:
1. Crawler Events
These are AI bots automatically indexing your content. They visit your site on their own schedule to build their search indexes or training datasets.
- GPTBot — OpenAI's training data crawler
- OAI-SearchBot — Powers ChatGPT's search feature
- PerplexityBot — Perplexity's search index
- ClaudeBot — Anthropic's crawler (uses Brave Search)
- Googlebot — Feeds both regular search and AI Overviews
When these bots visit, you want to know. It means your content is being considered for AI responses.
2. Click Events
These happen when a user sees your site cited in an AI response and clicks through. The browser sends a referrer header from the AI platform:
chatgpt.com— User clicked a link in ChatGPTperplexity.ai— User clicked from Perplexityclaude.ai— User clicked from Claudegemini.google.com— User clicked from Gemini
Click events are the money metric. They mean AI is not just finding your content—it's sending you traffic.
3. Agent Events
A newer category. When someone builds a Custom GPT or uses ChatGPT's browsing feature, it fetches your content with a special Signature-Agent header. This is different from both crawlers (automated indexing) and clicks (user navigation).
What We Track: 25+ AI Systems
Here's the full list we detect:
| Company | Crawlers |
|---|---|
| OpenAI | GPTBot, OAI-SearchBot, ChatGPT-User |
| Perplexity | PerplexityBot, Perplexity-User |
| Anthropic | ClaudeBot, Claude-Web, Claude-SearchBot |
| Googlebot, Google-Extended | |
| Microsoft | BingBot, BingPreview |
| Meta | FacebookBot, Meta-ExternalAgent |
| Apple | Applebot |
| Others | Brave, Cohere, Mistral, ByteDance, Amazon |
We also track click-throughs from:
- ChatGPT, Perplexity, Claude, Gemini
- Microsoft Copilot, Bing Chat
- Meta AI, You.com, Phind, Poe
The Gotchas We Discovered
Building this wasn't straightforward. Here's what surprised us.
Gotcha 1: Gemini Doesn't Crawl in Real-Time
When you ask ChatGPT a question, it might trigger OAI-SearchBot to fetch fresh content. Same with Perplexity—PerplexityBot fetches in real-time when users ask questions.
Gemini is different. It doesn't have its own crawler. Instead, it uses Google's existing search index that Googlebot already built. When you ask Gemini something, there's no new request to your site.
This means:
- You can't trigger Gemini visits by asking it about your site
- There's no "GeminiBot" to track
- You can't distinguish AI Overview traffic from regular Google search traffic
The best you can do is track Googlebot and know that some of those crawls feed into AI Overviews. But you won't see Gemini-specific activity.
Gotcha 2: Internal API Calls Pollute Click Data
This one was subtle. When a user clicks from ChatGPT to your /pricing page, your analytics should show one click event for /pricing. Simple.
But here's what actually happens: the browser loads /pricing, which then makes internal API calls to /api/social-proof, /api/auth/get-session, etc. These fetch requests inherit the chatgpt.com referrer from the original navigation.
Suddenly your events table shows:
chatgpt→ Click →/pricing✓chatgpt→ Click →/api/social-proof✗chatgpt→ Click →/api/auth/get-session✗
Those API "clicks" aren't real user navigation. They're internal fetches that happened to keep the referrer.
The fix: Use the Sec-Fetch-Mode header. Browsers send sec-fetch-mode: navigate for actual page navigation, but cors or same-origin for internal fetches. We only track click events where sec-fetch-mode is navigate.
This approach is generic—it works for any site without knowing their specific URL structure. Critical if you're building tracking as a service.
Gotcha 3: Fire-and-Forget Doesn't Work on Edge
Our initial implementation was simple: detect an AI bot, fire off a tracking request, don't wait for it. Classic fire-and-forget pattern.
// This doesn't work reliably on Cloudflare Workers
fetch('/api/track', { method: 'POST', body: data })
// Worker terminates before fetch completes
On edge runtimes like Cloudflare Workers, the worker can terminate before your fire-and-forget request completes. The tracking just... disappears.
The fix: Either await the request (adds slight latency) or use the platform's waitUntil() API. We chose to await since it only affects bot requests—regular users don't trigger tracking, so there's no latency impact for humans.
The Technical Implementation
Here's how the detection works at a high level:
function detectAiSource(referer, userAgent, signatureAgent) {
// Check for ChatGPT agent mode
if (signatureAgent?.includes('chatgpt.com')) {
return { source: 'chatgpt-agent', type: 'agent' }
}
// Check referrer for click-throughs
if (referer?.includes('chatgpt.com')) {
return { source: 'chatgpt', type: 'click' }
}
// Check user-agent for crawlers (case-insensitive)
if (userAgent?.toLowerCase().includes('gptbot')) {
return { source: 'openai-gptbot', type: 'crawler' }
}
return null
}
For click events, we add the sec-fetch-mode check:
if (aiSource.type === 'click') {
const isNavigation = request.headers.get('sec-fetch-mode') === 'navigate'
if (!isNavigation) return // Skip internal fetches
}
All detection runs in middleware, so every request gets checked before hitting your application code.
What This Data Tells You
Once you're tracking AI bot events, you can answer questions like:
-
Which AI systems are indexing my content? If PerplexityBot visits daily but OAI-SearchBot never shows up, you know where to focus.
-
Am I getting traffic from AI citations? Crawler visits are nice, but click events mean actual users finding you through AI.
-
Which pages do AI bots prefer? Maybe they're hitting your blog but ignoring your product pages. That's actionable.
-
Is my robots.txt blocking AI crawlers? If you've allowed OAI-SearchBot but never see it, something's wrong.
The Bigger Picture
AI search is still early. The tracking tools, the optimization strategies, the best practices—they're all being figured out in real-time.
What we know: AI systems are crawling the web, building indexes, and citing sources in their responses. If your content isn't being crawled, it can't be cited. If it's being crawled but not cited, something about your content or structure isn't working.
Tracking is the first step to understanding what's actually happening. You can't optimize what you can't measure.
Want to see which AI systems are visiting your site? Check out our Events dashboard to track crawler activity and click-throughs from AI platforms.
Related Articles
JSON-LD Schema Generation: Making Your Content AI-Ready
Learn how structured data helps search engines and AI systems understand your content, and how our tool generates Schema.org markup automatically.
We Now Track 25+ AI Crawlers (Not Just OpenAI)
Datagum now detects crawlers from Microsoft, Meta, Anthropic, Google, and more. Here's the full list and why it matters.
Understanding OpenAI's Web Crawlers: GPTBot, OAI-SearchBot, and ChatGPT-User
Learn how OpenAI's three crawlers work, what they're used for, and how Datagum tracks them to measure your AI search visibility.