Part of the AI Readiness audit

Check if AI crawlers can reach your site

Some robots.txt files block AI crawlers by default. SiteCurl checks whether ChatGPT, Perplexity, and other AI bots can access your content or are being turned away.

Start 7-Day Studio Trial

No signup required. Results in under 60 seconds.

423,000+ checks run and counting

What this check does

SiteCurl reads your robots.txt file and checks for rules that block known AI crawlers: GPTBot (OpenAI), ChatGPT-User, PerplexityBot, ClaudeBot (Anthropic), Bytespider (ByteDance), and others. If your robots.txt blocks these bots, SiteCurl flags which AI crawlers are blocked and which are allowed.

The check also looks for broad Disallow rules that block all bots by mistake, AI crawlers included. A Disallow: / rule under User-agent: * blocks all bots, AI tools included.

SiteCurl names the exact crawlers that are blocked so you can make a clear choice about which ones to let in and which to keep out.

How this shows up in the real world

AI firms send crawlers to read web content. OpenAI uses GPTBot to build training data and ChatGPT-User to fetch pages during chats. Anthropic uses ClaudeBot. Perplexity uses PerplexityBot. Google uses Google-Extended for Gemini training. Each crawler has its own user agent string.

Many sites block these crawlers, some on purpose and some by mistake. A host or CMS may include default robots.txt rules that block AI bots. A safety plugin may add broad blocking rules. The site owner may not know their content is hidden from AI tools.

Blocking AI crawlers has trade-offs. If you block GPTBot, your content will not be used in OpenAI's training data. But if you also block ChatGPT-User, your site will not be cited when ChatGPT users ask questions your content could answer. The training crawler and the lookup crawler serve new roles.

The choice is yours. Some sites block AI crawlers to guard their content. Others allow them to boost reach. SiteCurl gives you the data to make a clear choice rather than being blocked by default without knowing.

Why it matters

AI search is a growing traffic source. When AI tools cite your content, users click through to your site. If your robots.txt blocks the AI crawler, your content is hidden from that tool. You miss cites, traffic, and the trust that comes with being named.

Many site owners do not know they are blocking AI crawlers. Default robots.txt files, safety plugins, and hosting setups can add blocking rules with no one aware. A check shows what is going on versus what you intend.

The cost of surprise blocking grows as AI search use rises. Sites that are visible to AI tools now are building a presence that will grow as more people use AI-powered search tools.

Who this impacts most

Content sites that want AI citations need to allow lookup crawlers (ChatGPT-User, PerplexityBot). Blocking them means your articles will not be cited in AI answers, even if they are the best source on the topic.

Online stores can gain from AI product tips. When users ask AI for product picks, sites that allow AI crawlers can be named in the response. Blocked sites are left out.

Sites that block AI crawlers on purpose should check that the blocking works as planned. If you want to block training but allow lookups, you need split rules for each crawler user agent.

How to fix it

Step 1: Review your robots.txt file. Visit https://yoursite.com/robots.txt and read the rules. Look for entries that name GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, or broad blocks that hit all bots.

Step 2: Decide which crawlers to allow. You can allow all AI crawlers, block all of them, or pick and choose. If you want AI citations, allow ChatGPT-User and PerplexityBot. If you want to block training data, block GPTBot and Google-Extended while letting lookup crawlers through.

Step 3: Update your robots.txt. Add or change rules for each user agent. For instance, to allow ChatGPT lookups but block training: User-agent: GPTBot then Disallow: / on the next line, while leaving ChatGPT-User unblocked.

Step 4: Check your CMS and hosting setup. Some WordPress plugins (like Yoast or safety plugins) change robots.txt. Some hosts add default rules. Check these settings to make sure they match your intent.

Common mistakes when fixing this

Blocking all AI crawlers when you only want to block training. GPTBot collects training data. ChatGPT-User fetches pages during live chats. Blocking both means your content cannot be cited in AI answers. If you want citations, allow lookup crawlers while blocking training ones.

Not checking robots.txt after plugin or hosting changes. Safety plugins, SEO plugins, and hosting updates can change your robots.txt with no notice. Check the file after any change to your CMS or hosting setup.

Thinking robots.txt blocks all access. Robots.txt is a request, not a wall. Good crawlers respect it, but it does not stop access. For content you want to guard, use auth or access controls, not just robots.txt rules.

How to verify the fix

After updating your robots.txt, run another SiteCurl scan. The AI crawler permissions check should reflect your updated rules. You can also test specific crawlers by reading your robots.txt and checking each user agent rule manually.

For a quick check, visit https://yoursite.com/robots.txt in your browser and search for 'GPTBot,' 'ChatGPT-User,' and 'PerplexityBot' to see if they are mentioned in Disallow rules.

The bottom line

Your robots.txt controls whether AI systems can read your content. Check it to make sure the rules match your intent. If you want AI citations and traffic, allow retrieval crawlers. If you want to block training data collection, target specific crawlers rather than blocking all AI access broadly.

Example findings from a scan

All major AI crawlers are allowed

GPTBot is blocked by robots.txt

Broad Disallow rule blocks all crawlers including AI bots

Frequently asked questions

Which AI crawlers should I allow?

It depends on your goals. If you want your content cited in AI answers, allow ChatGPT-User and PerplexityBot. If you want to prevent your content from being used in AI training data, block GPTBot and Google-Extended. You can allow retrieval while blocking training.

Does blocking AI crawlers affect regular search rankings?

No. AI crawlers are separate from Googlebot and Bingbot. Blocking GPTBot does not affect your Google or Bing search rankings. Traditional search crawlers have their own user agent strings.

Can I check AI crawler access without signing up?

Yes. The free audit checks your robots.txt for AI crawler rules as part of a full seven-category scan. No signup needed.

What is the difference between GPTBot and ChatGPT-User?

GPTBot crawls the web to collect data for training OpenAI's models. ChatGPT-User fetches pages in real time when a ChatGPT user asks a question. Blocking GPTBot stops training data collection. Blocking ChatGPT-User stops your content from being cited in live conversations.

Check your AI crawler access now