Robots txt Checker

Debug Crawling

Robots.txt Input

Input Method

Current robots.txt

Test URLs (Optional)

Test Mode

Test Settings

Select User Agents to Test

Googlebot

Bingbot

GPTBot

ClaudeBot

* (All Bots)

Quick Tips

User-agent: Specify bot rules target
Disallow: Block paths from crawling
Allow: Exception to Disallow rules
Sitemap: Point to XML sitemap

Warning

Blocking too aggressively can hide important pages from search engines. Always test with URLs before deployment.

Robots.txt Validator Tutorial

What is robots.txt?

Robots.txt is a text file placed at your website's root directory (e.g., example.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It's part of the Robots Exclusion Protocol (REP).

How to Use This Tool:

Choose input method:
- Paste: Manually paste your robots.txt content
- Fetch: Automatically fetch from a live website
Optionally test specific URLs (single or bulk)
Select which user agents (bots) to simulate
Click "Validate & Simulate" to see results

Robots.txt Syntax:

Directive	Purpose	Example
`User-agent:`	Specify which bot the rules apply to	`User-agent: *` (all bots) `User-agent: Googlebot`
`Disallow:`	Block access to specific paths	`Disallow: /admin/` `Disallow: /` (block all)
`Allow:`	Exception to Disallow rules	`Allow: /admin/public/`
`Sitemap:`	Location of XML sitemap	`Sitemap: https://example.com/sitemap.xml`

Common Use Cases:

Scenario	robots.txt Rules
Block AI Crawlers	User-agent: GPTBot Disallow: / User-agent: Claude-Web Disallow: /
Protect Admin Areas	User-agent: * Disallow: /admin/ Disallow: /wp-admin/ Disallow: /private/
Block with Exceptions	User-agent: * Disallow: /admin/ Allow: /admin/public/
Staging Site (Block All)	User-agent: * Disallow: /

Common Mistakes:

Missing User-agent: Every rule block must start with User-agent:
Wrong file location: Must be at root (example.com/robots.txt), not in subdirectories
Case sensitivity: Paths are case-sensitive (/Admin/ ≠ /admin/)
Blocking CSS/JS: Don't block stylesheet/script files - Google needs them for rendering
Using Allow alone: Allow: only works as exception to Disallow:
Blocking too much: Test before deployment to avoid hiding important pages

How Do Search Engines Use Robots.txt?

When a crawler visits your site:

First request: Fetch example.com/robots.txt
Parse rules for its user-agent (or User-agent: * as fallback)
Before crawling each URL, check if it's blocked by Disallow rules
If allowed, proceed with crawl; if blocked, skip URL
Respect robots.txt is voluntary - malicious bots may ignore it