How We Track AI Bot Events (And What We Learned)

TL;DR: We track 25+ AI crawlers plus click-throughs from ChatGPT, Perplexity, and Claude. Building this revealed some surprises: Gemini doesn't crawl in real-time like ChatGPT does, internal API calls can pollute your click data, and fire-and-forget tracking doesn't work on edge runtimes. Here's how we solved each problem.

If you're optimizing for AI search visibility, you need to know what's actually happening. Which AI bots are crawling your site? Are users clicking through from ChatGPT? Is Perplexity indexing your content?

Google Analytics won't tell you. Traditional SEO tools don't track this. So we built it ourselves.

The Three Types of AI Traffic

Not all AI traffic is the same. We categorize it into three types:

1. Crawler Events

These are AI bots automatically indexing your content. They visit your site on their own schedule to build their search indexes or training datasets.

GPTBot — OpenAI's training data crawler
OAI-SearchBot — Powers ChatGPT's search feature
PerplexityBot — Perplexity's search index
ClaudeBot — Anthropic's crawler (uses Brave Search)
Googlebot — Feeds both regular search and AI Overviews

When these bots visit, you want to know. It means your content is being considered for AI responses.

2. Click Events

These happen when a user sees your site cited in an AI response and clicks through. The browser sends a referrer header from the AI platform:

chatgpt.com — User clicked a link in ChatGPT
perplexity.ai — User clicked from Perplexity
claude.ai — User clicked from Claude
gemini.google.com — User clicked from Gemini

Click events are the money metric. They mean AI is not just finding your content—it's sending you traffic.

3. Agent Events

A newer category. When someone builds a Custom GPT or uses ChatGPT's browsing feature, it fetches your content with a special Signature-Agent header. This is different from both crawlers (automated indexing) and clicks (user navigation).

What We Track: 25+ AI Systems

Here's the full list we detect:

Company	Crawlers
OpenAI	GPTBot, OAI-SearchBot, ChatGPT-User
Perplexity	PerplexityBot, Perplexity-User
Anthropic	ClaudeBot, Claude-Web, Claude-SearchBot
Google	Googlebot, Google-Extended
Microsoft	BingBot, BingPreview
Meta	FacebookBot, Meta-ExternalAgent
Apple	Applebot
Others	Brave, Cohere, Mistral, ByteDance, Amazon

We also track click-throughs from:

ChatGPT, Perplexity, Claude, Gemini
Microsoft Copilot, Bing Chat
Meta AI, You.com, Phind, Poe

The Gotchas We Discovered

Building this wasn't straightforward. Here's what surprised us.

Gotcha 1: Gemini Doesn't Crawl in Real-Time

When you ask ChatGPT a question, it might trigger OAI-SearchBot to fetch fresh content. Same with Perplexity—PerplexityBot fetches in real-time when users ask questions.

Gemini is different. It doesn't have its own crawler. Instead, it uses Google's existing search index that Googlebot already built. When you ask Gemini something, there's no new request to your site.

This means:

You can't trigger Gemini visits by asking it about your site
There's no "GeminiBot" to track
You can't distinguish AI Overview traffic from regular Google search traffic

The best you can do is track Googlebot and know that some of those crawls feed into AI Overviews. But you won't see Gemini-specific activity.

Gotcha 2: Internal API Calls Pollute Click Data

This one was subtle. When a user clicks from ChatGPT to your /pricing page, your analytics should show one click event for /pricing. Simple.

But here's what actually happens: the browser loads /pricing, which then makes internal API calls to /api/social-proof, /api/auth/get-session, etc. These fetch requests inherit the chatgpt.com referrer from the original navigation.

Suddenly your events table shows:

chatgpt → Click → /pricing ✓
chatgpt → Click → /api/social-proof ✗
chatgpt → Click → /api/auth/get-session ✗

Those API "clicks" aren't real user navigation. They're internal fetches that happened to keep the referrer.

The fix: Use the Sec-Fetch-Mode header. Browsers send sec-fetch-mode: navigate for actual page navigation, but cors or same-origin for internal fetches. We only track click events where sec-fetch-mode is navigate.

This approach is generic—it works for any site without knowing their specific URL structure. Critical if you're building tracking as a service.

Gotcha 3: Fire-and-Forget Doesn't Work on Edge

Our initial implementation was simple: detect an AI bot, fire off a tracking request, don't wait for it. Classic fire-and-forget pattern.

// This doesn't work reliably on Cloudflare Workers
fetch('/api/track', { method: 'POST', body: data })
// Worker terminates before fetch completes

On edge runtimes like Cloudflare Workers, the worker can terminate before your fire-and-forget request completes. The tracking just... disappears.

The fix: Either await the request (adds slight latency) or use the platform's waitUntil() API. We chose to await since it only affects bot requests—regular users don't trigger tracking, so there's no latency impact for humans.

The Technical Implementation

Here's how the detection works at a high level:

function detectAiSource(referer, userAgent, signatureAgent) {
  // Check for ChatGPT agent mode
  if (signatureAgent?.includes('chatgpt.com')) {
    return { source: 'chatgpt-agent', type: 'agent' }
  }

  // Check referrer for click-throughs
  if (referer?.includes('chatgpt.com')) {
    return { source: 'chatgpt', type: 'click' }
  }

  // Check user-agent for crawlers (case-insensitive)
  if (userAgent?.toLowerCase().includes('gptbot')) {
    return { source: 'openai-gptbot', type: 'crawler' }
  }

  return null
}

For click events, we add the sec-fetch-mode check:

if (aiSource.type === 'click') {
  const isNavigation = request.headers.get('sec-fetch-mode') === 'navigate'
  if (!isNavigation) return // Skip internal fetches
}

All detection runs in middleware, so every request gets checked before hitting your application code.

What This Data Tells You

Once you're tracking AI bot events, you can answer questions like:

Which AI systems are indexing my content? If PerplexityBot visits daily but OAI-SearchBot never shows up, you know where to focus.
Am I getting traffic from AI citations? Crawler visits are nice, but click events mean actual users finding you through AI.
Which pages do AI bots prefer? Maybe they're hitting your blog but ignoring your product pages. That's actionable.
Is my robots.txt blocking AI crawlers? If you've allowed OAI-SearchBot but never see it, something's wrong.

The Bigger Picture

AI search is still early. The tracking tools, the optimization strategies, the best practices—they're all being figured out in real-time.

What we know: AI systems are crawling the web, building indexes, and citing sources in their responses. If your content isn't being crawled, it can't be cited. If it's being crawled but not cited, something about your content or structure isn't working.

Tracking is the first step to understanding what's actually happening. You can't optimize what you can't measure.

Want to see which AI systems are visiting your site? Sign up for free to access your Events dashboard and track crawler activity and click-throughs from AI platforms.