o

How to configure your robots txt for AI crawlers?

ANI/Robots and Crawler Access

How to Configure Your robots.txt to Allow AI Crawlers — Step by Step

Your robots.txt file is the first thing AI crawlers check before attempting to index your site. A misconfigured robots.txt — even one that was correctly set up for traditional SEO — can silently block every major AI crawler, making your site invisible to AI systems regardless of how well your content is structured.

AEOGEOSEOANIASI

The direct answer

To allow AI crawlers in your robots.txt, open the file via Rank Math’s robots.txt editor, add an explicit User-agent line for each major AI crawler followed by Allow: /, and save. This must be done explicitly — general Allow rules may not cover AI crawlers that are not listed, and security plugins can override your robots.txt settings with separate firewall rules.

Step 1 — Access your robots.txt in Rank Math

Go to Rank Math > General Settings > Edit robots.txt. This opens a text editor showing your current robots.txt content. Do not edit robots.txt by uploading a file to your server — use Rank Math’s editor so it stays in sync with WordPress and does not get overwritten by plugin updates.

Step 2 — Check what is currently in your robots.txt

Before adding anything, read what is already there. Look for:

  • Any User-agent: * with Disallow: / — this blocks all crawlers and needs to be removed or modified
  • Any Disallow rules that use broad pattern matching that could catch AI crawler paths
  • Any existing AI crawler rules that may be outdated or missing new crawlers

Step 3 — Add the AI crawler allow rules

Add the following block to your robots.txt. Place it before any User-agent: * rules, as specific user agent rules take precedence over the wildcard:

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: CCBot
Allow: /

User-agent: DuckAssistBot
Allow: /

Step 4 — Save and verify

Click Save in Rank Math. Then open a new browser tab and go to yoursite.com/robots.txt. Confirm the AI crawler rules you just added are visible. If they are not visible, a caching plugin may be serving a cached version — go to LiteSpeed Cache > Manage > Purge All and check again.

Step 5 — Check your security plugin separately

robots.txt controls crawler access at the crawl level but security plugins operate at the server level — they can block requests before robots.txt is ever checked. After updating robots.txt, verify your security plugin settings:

  • Wordfence: Go to Firewall > Blocking. Check that no IP ranges associated with OpenAI, Anthropic, or Perplexity are blocked. Also check Rate Limiting settings — aggressive rate limiting can block crawlers that visit frequently.
  • Sucuri: Go to Firewall settings and check the whitelist/blacklist. Add GPTBot and PerplexityBot to the allowed list if there is a bot management section.
  • iThemes Security / Solid Security: Check Bot Protection settings and ensure “Block Bad Bots” is not catching AI crawlers in its definition of bad bots.
Pages you may want to exclude from AI indexing

You can selectively block AI crawlers from specific pages by adding Disallow rules for specific paths after the Allow: / rule for each crawler. Common pages to exclude: checkout pages, confirmation pages, password-protected pages, admin areas, and any pages with duplicate content. Use specific path Disallow rules rather than broad patterns to avoid accidentally blocking content pages.

Advanced robots.txt configuration for specific use cases

Most sites should allow all major AI crawlers access to all content pages. However there are specific cases where selective blocking makes sense. If you have a paid members-only content area, you may want to block AI crawlers from that area while allowing them on your public content. Use path-specific Disallow rules for the protected directory rather than broad user-agent blocks. For example, to block GPTBot from your members area while allowing access to the rest of your site:

User-agent: GPTBot
Disallow: /members/
Disallow: /checkout/
Disallow: /account/
Allow: /

What happens after you add the allow rules

After updating your robots.txt, AI crawlers will not visit immediately — they follow their own crawl schedules. GPTBot typically rechecks robots.txt on its own schedule, usually within a few days to a couple of weeks. PerplexityBot may visit sooner given its higher crawl frequency. The first visits after a robots.txt update confirm the rules are being respected. Check your server logs 2 to 4 weeks after updating and you should see the first AI crawler entries if blocking was previously the issue.

Keeping your robots.txt updated as the AI landscape evolves

New AI systems launch regularly and existing systems add new crawler user agents as they expand their capabilities. Review your robots.txt quarterly as part of your ANI audit. When new AI systems emerge with significant user bases, add their crawler user agents to your allow list proactively rather than waiting until you notice they are not visiting. The quarterly review takes 10 minutes and keeps your site accessible to AI systems as they evolve.

Common mistakes to avoid

A common robots.txt mistake is adding the AI crawler allow rules after an existing User-agent: * Disallow: / block. In robots.txt, specific user agent rules take precedence over the wildcard — but only if the specific rules appear in their own block, not nested under the wildcard block. Ensure each AI crawler user agent has its own separate block with its own Allow: / line, completely separate from the User-agent: * block.

Quick implementation checklist

  • Open robots.txt via Rank Math > General Settings > Edit robots.txt
  • Check for any existing Disallow: / rules before adding new ones
  • Add separate user-agent blocks for each of the 7 major AI crawlers
  • Save and verify at yoursite.com/robots.txt in a fresh browser tab
  • Clear LiteSpeed Cache after saving to ensure the updated file is served
  • Check security plugin separately — robots.txt alone may not be sufficient

How this connects to the full ANI system

Correct robots.txt configuration is the first and most critical ANI action. Without it, no other ANI improvements can take effect because AI crawlers will never reach the pages where your improvements are implemented. For the complete ANI implementation guide covering all 24 topics in sequence, see the full ANI guide at teachmeoptimization.com/ani.

Measuring improvement

After implementing the steps in this guide, revisit your server access logs in 2 to 4 weeks to confirm AI crawler visits. Run your site through the free TeachMeOptimization scanner to check your ANI score before and after. Track your AI citation rate monthly using the manual Perplexity and ChatGPT audit process described in the ANI audit guide — citation rate improvement is the ultimate measure of whether your ANI implementation is working.

Scroll to Top