Robots.txt Validator
Validate syntax and simulate bot access rules
Robots.txt Validator Tutorial
What is robots.txt?
Robots.txt is a text file placed at your website's root directory (e.g., example.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It's part of the Robots Exclusion Protocol (REP).
How to Use This Tool:
- Choose input method:
- Paste: Manually paste your robots.txt content
- Fetch: Automatically fetch from a live website
- Optionally test specific URLs (single or bulk)
- Select which user agents (bots) to simulate
- Click "Validate & Simulate" to see results
Robots.txt Syntax:
| Directive | Purpose | Example |
|---|---|---|
User-agent: |
Specify which bot the rules apply to | User-agent: * (all bots)User-agent: Googlebot |
Disallow: |
Block access to specific paths | Disallow: /admin/Disallow: / (block all) |
Allow: |
Exception to Disallow rules | Allow: /admin/public/ |
Sitemap: |
Location of XML sitemap | Sitemap: https://example.com/sitemap.xml |
Common Use Cases:
| Scenario | robots.txt Rules |
|---|---|
| Block AI Crawlers | User-agent: GPTBot Disallow: / User-agent: Claude-Web Disallow: / |
| Protect Admin Areas | User-agent: * Disallow: /admin/ Disallow: /wp-admin/ Disallow: /private/ |
| Block with Exceptions | User-agent: * Disallow: /admin/ Allow: /admin/public/ |
| Staging Site (Block All) | User-agent: * Disallow: / |
Common Mistakes:
- Missing User-agent: Every rule block must start with
User-agent: - Wrong file location: Must be at root (example.com/robots.txt), not in subdirectories
- Case sensitivity: Paths are case-sensitive (
/Admin/โ/admin/) - Blocking CSS/JS: Don't block stylesheet/script files - Google needs them for rendering
- Using Allow alone:
Allow:only works as exception toDisallow: - Blocking too much: Test before deployment to avoid hiding important pages
How Do Search Engines Use Robots.txt?
When a crawler visits your site:
- First request: Fetch
example.com/robots.txt - Parse rules for its user-agent (or
User-agent: *as fallback) - Before crawling each URL, check if it's blocked by Disallow rules
- If allowed, proceed with crawl; if blocked, skip URL
- Respect robots.txt is voluntary - malicious bots may ignore it