Robots.txt for AI Crawlers: How to Configure GPTBot, ClaudeBot, and PerplexityBot Access

Configure robots.txt for AI crawlers like GPTBot, ClaudeBot, and PerplexityBot. Copy-paste configurations for 4 common scenarios plus common mistakes to avoid.

According to Gartner, traditional search volume will drop 25% by the end of 2026 as AI search platforms absorb query traffic. Whether your website is ready for that shift depends, in part, on one small file sitting in your root directory: robots.txt.

Most websites have a robots.txt that was configured years ago for Google and Bing. It doesn't account for GPTBot, ClaudeBot, PerplexityBot, or the dozen other AI crawlers now hitting your server logs. The result? Many businesses are either accidentally blocking AI crawlers they want indexing their content, or accidentally allowing crawlers they'd prefer to block.

This guide covers every major AI crawler, what each one does, and gives you copy-paste configurations for the most common scenarios.

The AI Crawlers You Need to Know

AI platforms use specific crawlers to access web content. Each crawler serves a different purpose, and understanding the distinction matters for your configuration.

CrawlerPlatformPurposeWhat Happens If Blocked
GPTBotOpenAICollects training data for future ChatGPT modelsYour content won't be in future training data
ChatGPT-UserOpenAIReal-time web browsing when users ask ChatGPT to searchChatGPT can't cite your pages in live search results
OAI-SearchBotOpenAIPowers ChatGPT's SearchGPT and web search featuresYour pages won't appear in ChatGPT search results
ClaudeBotAnthropicCrawls for Claude's web access and trainingClaude can't access or cite your content
PerplexityBotPerplexity AIReal-time web search for Perplexity answersYour content won't appear in Perplexity results
Google-ExtendedGoogleTraining data for Gemini and AI featuresContent excluded from Gemini training, but Google Search still works
BingbotMicrosoftWeb indexing for Bing and CopilotInvisible to both Bing search and Microsoft Copilot
BytespiderByteDanceCrawls for TikTok AI featuresExcluded from ByteDance AI products

The critical distinction: some crawlers gather training data (GPTBot, Google-Extended) while others power live search features (ChatGPT-User, OAI-SearchBot, PerplexityBot). Blocking training crawlers means your content won't be in future model updates. Blocking search crawlers means your content won't be cited in real-time AI answers.

For more on how these crawlers fit into the broader AI search picture, see our AI crawlers glossary entry.

Configuration Scenarios: Copy-Paste Ready

Here are the most common robots.txt configurations for AI crawlers. Pick the scenario that matches your business goals.

Scenario 1: Allow Everything (Maximum AI Visibility)

Best for: Companies that want maximum AI visibility and are actively pursuing GEO strategies.

```

AI Crawlers - Allow All


User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /
```

This is the right choice for most businesses pursuing AI visibility. If you want AI platforms to recommend your brand, they need access to your content. ChatGPT processes 4.5 billion monthly visits, and Perplexity AI handles over 500 million monthly searches. Blocking these crawlers means opting out of those audiences.

Scenario 2: Allow Search, Block Training

Best for: Publishers, media companies, and businesses with proprietary content who want live AI citations but don't want their content used for model training.

```

Allow real-time AI search crawlers


User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

Block AI training crawlers

User-agent: GPTBot Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /
```

This configuration lets AI search features cite your content in real time while preventing that content from being absorbed into training data. It's a reasonable compromise for content-heavy businesses concerned about intellectual property.

One caveat: the line between "search" and "training" crawlers isn't always clean. OpenAI has separate bots for each purpose, but not every platform makes that distinction. Perplexity's bot serves both functions.

Scenario 3: Selective Access (Protect Premium Content)

Best for: Companies with both public and gated content, like SaaS companies with a public blog and premium documentation.

```

Allow AI crawlers on public content


User-agent: GPTBot
Allow: /blog/
Allow: /resources/
Disallow: /docs/
Disallow: /app/
Disallow: /api/

User-agent: ChatGPT-User
Allow: /blog/
Allow: /resources/
Disallow: /docs/
Disallow: /app/

User-agent: PerplexityBot
Allow: /blog/
Allow: /resources/
Disallow: /docs/
Disallow: /app/
```

This gives AI systems access to your marketing content (which you want cited) while keeping proprietary documentation and application pages private.

Scenario 4: Block All AI Crawlers

Best for: Companies that sell content directly, have strict IP concerns, or have decided against AI visibility for strategic reasons.

```

Block all known AI crawlers


User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /
```

I'd caution against this unless you have a specific reason. Blocking AI crawlers is choosing to be invisible in a channel that 50% of B2B buyers use for research, according to G2. And as Gartner's data suggests, this channel is only growing.

The Strategic Trade-Off: Visibility vs. Control

Every robots.txt decision for AI crawlers comes down to a trade-off between visibility and control.

On one side: allowing AI crawlers means your content can be cited, recommended, and used to answer questions about your brand. The SE Ranking 2025 study found that articles over 2,900 words are 59% more likely to be chosen as a ChatGPT citation. But you can only benefit from that if ChatGPT can actually read your content.

On the other side: your content has value, and AI companies are using it to build products. Some publishers argue they should be compensated for this use. Major news organizations have negotiated licensing deals with OpenAI for exactly this reason.

For most B2B companies, the calculus is straightforward. Your content exists to attract customers. AI platforms are sending you customers. Blocking them hurts you more than it hurts them.

But for media companies, research firms, and content creators whose product IS the content, the decision is harder. In those cases, the "allow search, block training" approach from Scenario 2 offers a middle path.

Here's my take: if you're reading this guide and thinking about how to get cited by AI, block nothing. Open the gates. The brands that win in AI search are the ones that make it easy for AI to understand and cite their content.

Common Mistakes That Break AI Visibility

I've audited hundreds of robots.txt files for AI visibility issues. Here are the mistakes I see most often.

Mistake 1: Using wildcard blocks that catch AI crawlers.
Some sites have a blanket `User-agent: * / Disallow: /private/` that also blocks AI crawlers from `/private/` paths where they store customer case studies or pricing pages. Review every Disallow rule to make sure you're not blocking content you actually want AI to see.

Mistake 2: Forgetting ChatGPT-User and OAI-SearchBot.
Many guides from 2024 only mention GPTBot. OpenAI now uses three separate crawlers. Blocking GPTBot but forgetting ChatGPT-User means you've blocked training but also blocked live search citations. Make sure your configuration covers all three OpenAI bots.

Mistake 3: Not checking server-side rendering.
Most AI crawlers cannot execute JavaScript. If your site relies on client-side rendering, the crawler may see an empty page even if your robots.txt allows access. Serve critical content as static HTML or use server-side rendering to ensure AI crawlers can actually read what you're allowing them to access.

Mistake 4: Setting it and forgetting it.
New AI crawlers appear regularly. When a new platform launches or an existing platform adds a new crawler, your robots.txt needs updating. Review it quarterly at minimum.

Mistake 5: Not pairing with llms.txt.
robots.txt controls access. llms.txt provides context. Using both gives AI systems the complete picture: what they can access AND what your site is about. Together they form a more effective AI crawler strategy.

How to Verify Your Configuration

After updating robots.txt, verify that your changes are working correctly.

Test with Google's robots.txt Tester. Available in Google Search Console, this tool lets you check if specific URLs are blocked or allowed for any user agent. Test each AI crawler against your important pages.

Check server logs for AI crawler activity. Look for GPTBot, ChatGPT-User, PerplexityBot, and ClaudeBot in your access logs. If you've allowed them but don't see activity, your content may not be getting crawled for other reasons (slow page speed, JavaScript-only rendering, or low authority signals).

Monitor your AI visibility before and after changes. If you're loosening restrictions, give it 2-4 weeks and then check whether your brand starts appearing in more AI answers. ChatGPT's OAI-SearchBot crawls sites every few days to weeks, according to Profound, so changes aren't instant.

Cross-reference with schema markup. Verify that pages you've opened to AI crawlers also have proper structured data. Schema markup helps AI systems understand the content they're accessing, which increases the likelihood of accurate citation.

Monitor how AI crawlers interact with your brand using AI Radar →

Frequently Asked Questions

Does blocking GPTBot also block ChatGPT search?

No. OpenAI uses separate crawlers: GPTBot for training data collection, ChatGPT-User for live browsing, and OAI-SearchBot for search features. Blocking GPTBot only prevents training data use. To block ChatGPT search, you must also block ChatGPT-User and OAI-SearchBot.

Will blocking AI crawlers hurt my Google rankings?

No. Google-Extended is separate from Googlebot. Blocking Google-Extended prevents your content from being used in Gemini training but does not affect your Google Search rankings. Your traditional SEO is unaffected.

How quickly do AI crawlers respect robots.txt changes?

Most AI crawlers check robots.txt regularly, typically within 24-48 hours. However, it may take longer for changes to affect AI responses. Content already in training data stays there regardless of future robots.txt changes.

Should I block Bingbot to prevent Microsoft Copilot access?

Blocking Bingbot prevents both Bing search indexing AND Microsoft Copilot from citing your content. If you want Bing search visibility but not Copilot access, there's currently no way to separate the two. Most businesses should allow Bingbot.

Can robots.txt fully prevent AI from using my content?

robots.txt is a request, not a enforcement mechanism. Well-behaved crawlers from major platforms (OpenAI, Anthropic, Google) respect it. But it can't prevent indirect use: if a third-party site quotes your content, AI systems can still access that quote. robots.txt protects direct crawling, not all references to your content.

What's the difference between robots.txt and llms.txt for AI?

robots.txt controls which pages AI crawlers can access. llms.txt provides a structured summary of your site's content and purpose. robots.txt is a gatekeeper; llms.txt is a tour guide. Most sites pursuing AI visibility should implement both.

Does blocking GPTBot also block ChatGPT search?

No. OpenAI uses separate crawlers: GPTBot for training data collection, ChatGPT-User for live browsing, and OAI-SearchBot for search features. Blocking GPTBot only prevents training data use. To block ChatGPT search, you must also block ChatGPT-User and OAI-SearchBot.

Will blocking AI crawlers hurt my Google rankings?

No. Google-Extended is separate from Googlebot. Blocking Google-Extended prevents your content from being used in Gemini training but does not affect your Google Search rankings. Your traditional SEO is unaffected.

How quickly do AI crawlers respect robots.txt changes?

Most AI crawlers check robots.txt regularly, typically within 24-48 hours. However, it may take longer for changes to affect AI responses. Content already in training data stays there regardless of future robots.txt changes.

Should I block Bingbot to prevent Microsoft Copilot access?

Blocking Bingbot prevents both Bing search indexing AND Microsoft Copilot from citing your content. If you want Bing search visibility but not Copilot access, there's currently no way to separate the two. Most businesses should allow Bingbot.

Can robots.txt fully prevent AI from using my content?

robots.txt is a request, not an enforcement mechanism. Well-behaved crawlers from major platforms respect it. But it can't prevent indirect use: if a third-party site quotes your content, AI systems can still access that quote.

What's the difference between robots.txt and llms.txt for AI?

robots.txt controls which pages AI crawlers can access. llms.txt provides a structured summary of your site's content and purpose. robots.txt is a gatekeeper; llms.txt is a tour guide. Most sites pursuing AI visibility should implement both.