Skip to content
← Blog

How to Prepare Your robots.txt for AI Search Crawlers

AI readiness Feb 18, 2026 3 min read

What these checks test

SiteCurl’s AI readiness checks examine your robots.txt for two things. First, whether you have an llms.txt file that helps AI models understand your site. Second, whether your robots.txt blocks known AI crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), or PerplexityBot.

Why it matters

AI search is growing. Google AI Overviews, ChatGPT with browsing, Perplexity, and other tools pull content from the web and cite sources in their answers. If your site blocks these crawlers, your content will not appear in AI-generated answers.

This is a trade-off. Some site owners block AI crawlers to prevent their content from being used as training data. Others want maximum visibility and allow all AI crawlers. The right choice depends on your goals.

For most businesses that rely on organic traffic, blocking AI crawlers means missing a growing distribution channel. AI search is sending referral traffic to cited sources, and that traffic is increasing.

How to manage AI crawler access

If your robots.txt does not specifically mention AI crawlers, most of them will crawl your site by default under the User-agent: * rules. But if you have previously added Disallow rules for specific bots, remove them:

# Remove these to allow AI crawlers:
# User-agent: GPTBot
# Disallow: /
# User-agent: ClaudeBot
# Disallow: /

Blocking specific AI crawlers

If you want to block specific AI crawlers (for example, to prevent training data use while allowing search bots):

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /

Known AI crawler user agents

  • GPTBot: OpenAI (ChatGPT browsing, training)
  • ClaudeBot: Anthropic
  • PerplexityBot: Perplexity AI
  • CCBot: Common Crawl (used by many AI companies for training)
  • Google-Extended: Google AI training (separate from Googlebot search indexing)

Adding an llms.txt file

The llms.txt standard (https://llmstxt.org) is an emerging way to help AI models understand your site. Create a plain text file at /llms.txt with:

  • A brief description of your site or product
  • Key pages and their purpose
  • Any specific instructions for AI models

This is optional but helps AI tools cite your content more accurately.

How to verify the fix

Visit https://yoursite.com/robots.txt and check which user agents are listed and what their rules are. Run a SiteCurl scan to see the AI readiness section, which flags blocked AI crawlers and missing llms.txt.

AI crawler access connects to your broader robots.txt setup and overall SEO setup.

Start a free trial to check your AI readiness score.

See if AI search engines can read your site

Check structured data, crawl access, and content clarity for AI-powered search.

Start 7-Day Studio Trial

No credit card required.

We use cookies to understand how visitors interact with our site. No personal data is sold.