Robots.txt Validator

Validate syntax and simulate bot access rules

Robots.txt Input

Test URLs (Optional)

Test Settings

Quick Tips

  • User-agent: Specify bot rules target
  • Disallow: Block paths from crawling
  • Allow: Exception to Disallow rules
  • Sitemap: Point to XML sitemap

Warning

Blocking too aggressively can hide important pages from search engines. Always test with URLs before deployment.

Robots.txt Validator Tutorial

What is robots.txt?

Robots.txt is a text file placed at your website's root directory (e.g., example.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It's part of the Robots Exclusion Protocol (REP).

How to Use This Tool:

  1. Choose input method:
    • Paste: Manually paste your robots.txt content
    • Fetch: Automatically fetch from a live website
  2. Optionally test specific URLs (single or bulk)
  3. Select which user agents (bots) to simulate
  4. Click "Validate & Simulate" to see results

Robots.txt Syntax:

Directive Purpose Example
User-agent: Specify which bot the rules apply to User-agent: * (all bots)
User-agent: Googlebot
Disallow: Block access to specific paths Disallow: /admin/
Disallow: / (block all)
Allow: Exception to Disallow rules Allow: /admin/public/
Sitemap: Location of XML sitemap Sitemap: https://example.com/sitemap.xml

Common Use Cases:

Scenario robots.txt Rules
Block AI Crawlers
User-agent: GPTBot
Disallow: /

User-agent: Claude-Web
Disallow: /
Protect Admin Areas
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /private/
Block with Exceptions
User-agent: *
Disallow: /admin/
Allow: /admin/public/
Staging Site (Block All)
User-agent: *
Disallow: /

Common Mistakes:

  • Missing User-agent: Every rule block must start with User-agent:
  • Wrong file location: Must be at root (example.com/robots.txt), not in subdirectories
  • Case sensitivity: Paths are case-sensitive (/Admin/ โ‰  /admin/)
  • Blocking CSS/JS: Don't block stylesheet/script files - Google needs them for rendering
  • Using Allow alone: Allow: only works as exception to Disallow:
  • Blocking too much: Test before deployment to avoid hiding important pages
How Do Search Engines Use Robots.txt?

When a crawler visits your site:

  1. First request: Fetch example.com/robots.txt
  2. Parse rules for its user-agent (or User-agent: * as fallback)
  3. Before crawling each URL, check if it's blocked by Disallow rules
  4. If allowed, proceed with crawl; if blocked, skip URL
  5. Respect robots.txt is voluntary - malicious bots may ignore it