Part of the AI Readiness audit
Check if AI crawlers can reach your site
Some robots.txt files block AI crawlers by default. SiteCurl checks whether ChatGPT, Perplexity, and other AI bots can access your content or are being turned away.
No signup required. Results in under 60 seconds.
What this check does
SiteCurl reads your robots.txt file and checks for rules that block known AI crawlers: GPTBot (OpenAI), ChatGPT-User, PerplexityBot, ClaudeBot (Anthropic), Bytespider (ByteDance), and others. If your robots.txt blocks these user agents, SiteCurl flags which AI crawlers are blocked and which are allowed.
The check also looks for broad Disallow rules that unintentionally block all bots, including AI crawlers. A Disallow: / rule under User-agent: * blocks everything, including AI systems.
SiteCurl reports the specific crawlers that are blocked so you can make an informed decision about which ones to allow and which to keep out.
How this shows up in the real world
AI companies send crawlers to read web content. OpenAI uses GPTBot to build training data and ChatGPT-User to fetch pages during conversations. Anthropic uses ClaudeBot. Perplexity uses PerplexityBot. Google uses Google-Extended for Gemini training. Each crawler identifies itself with a unique user agent string.
Many sites block these crawlers, sometimes intentionally and sometimes by accident. A hosting provider or CMS may include default robots.txt rules that block AI bots. A security plugin may add broad blocking rules. The site owner may not know their content is invisible to AI systems.
Blocking AI crawlers has trade-offs. If you block GPTBot, your content will not be used in OpenAI's training data. But if you also block ChatGPT-User, your site will not be cited when ChatGPT users ask questions that your content could answer. The training crawler and the retrieval crawler serve different purposes.
The decision is yours. Some publishers block AI crawlers to protect their content. Others allow them to maximize visibility. SiteCurl gives you the information to make an informed choice rather than being blocked by default without knowing it.
Why it matters
AI search is a growing traffic source. When AI systems cite your content, visitors click through to your site. If your robots.txt blocks the AI crawler, your content is invisible to that system. You miss citations, traffic, and the trust that comes with being referenced.
Many site owners do not know they are blocking AI crawlers. Default robots.txt files, security plugins, and hosting setups can add blocking rules without the site owner's awareness. A check reveals what is actually happening versus what you intend.
The cost of unintentional blocking grows as AI search adoption increases. Sites that are visible to AI systems now are building a presence that will compound as more people use AI-powered search tools.
Who this impacts most
Content publishers who want AI citations need to allow retrieval crawlers (ChatGPT-User, PerplexityBot). Blocking them means your articles will not be cited in AI-generated answers, even if they are the best source on the topic.
E-commerce sites can benefit from AI product recommendations. When users ask AI for product suggestions, sites that allow AI crawlers can be included in the response. Blocked sites are excluded.
Sites that intentionally block AI crawlers should verify that the blocking is working as intended. If you want to block training but allow retrieval, you need separate rules for different crawler user agents.
How to fix it
Step 1: Review your robots.txt file. Visit https://yoursite.com/robots.txt and read the rules. Look for entries that mention GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, or broad wildcard blocks that affect all bots.
Step 2: Decide which crawlers to allow. You can allow all AI crawlers, block all of them, or selectively allow some. If you want AI citations, allow ChatGPT-User and PerplexityBot. If you want to block training data collection, block GPTBot and Google-Extended while allowing retrieval crawlers.
Step 3: Update your robots.txt. Add or modify rules for specific user agents. For example, to allow ChatGPT retrieval but block training: User-agent: GPTBot then Disallow: / on the next line, while leaving ChatGPT-User unblocked.
Step 4: Check your CMS and hosting settings. Some WordPress plugins (like Yoast or security plugins) modify robots.txt. Some hosting providers add default rules. Check these settings to make sure they match your intent.
Common mistakes when fixing this
Blocking all AI crawlers when you only want to block training. GPTBot collects training data. ChatGPT-User fetches pages during live conversations. Blocking both means your content cannot be cited in AI answers. If you want citations, allow retrieval crawlers while blocking training ones.
Not checking robots.txt after plugin or hosting changes. Security plugins, SEO plugins, and hosting updates can modify your robots.txt without notification. Check the file after any change to your CMS or hosting setup.
Assuming robots.txt controls everything. Robots.txt is a request, not a firewall. Well-behaved crawlers respect it, but it does not technically prevent access. For content you want to protect, use authentication or access controls, not just robots.txt rules.
How to verify the fix
After updating your robots.txt, run another SiteCurl scan. The AI crawler permissions check should reflect your updated rules. You can also test specific crawlers by reading your robots.txt and checking each user agent rule manually.
For a quick check, visit https://yoursite.com/robots.txt in your browser and search for 'GPTBot,' 'ChatGPT-User,' and 'PerplexityBot' to see if they are mentioned in Disallow rules.
The bottom line
Your robots.txt controls whether AI systems can read your content. Check it to make sure the rules match your intent. If you want AI citations and traffic, allow retrieval crawlers. If you want to block training data collection, target specific crawlers rather than blocking all AI access broadly.
Example findings from a scan
All major AI crawlers are allowed
GPTBot is blocked by robots.txt
Broad Disallow rule blocks all crawlers including AI bots
Related checks
Frequently asked questions
Which AI crawlers should I allow?
It depends on your goals. If you want your content cited in AI answers, allow ChatGPT-User and PerplexityBot. If you want to prevent your content from being used in AI training data, block GPTBot and Google-Extended. You can allow retrieval while blocking training.
Does blocking AI crawlers affect regular search rankings?
No. AI crawlers are separate from Googlebot and Bingbot. Blocking GPTBot does not affect your Google or Bing search rankings. Traditional search crawlers have their own user agent strings.
Can I check AI crawler access without signing up?
Yes. The free audit checks your robots.txt for AI crawler rules as part of a full seven-category scan. No signup needed.
What is the difference between GPTBot and ChatGPT-User?
GPTBot crawls the web to collect data for training OpenAI's models. ChatGPT-User fetches pages in real time when a ChatGPT user asks a question. Blocking GPTBot stops training data collection. Blocking ChatGPT-User stops your content from being cited in live conversations.
Check your AI crawler access now