What is robots.txt?
robots.txt is a plain text file placed at the root of a website that instructs web crawlers (like Googlebot) which pages or sections they are allowed or disallowed from crawling.Basic Syntax
plaintext
User-agent: *
Allow: /
Disallow: /private/
Sitemap: https://example.com/sitemap.xmlKey Directives
| Directive | Purpose ||-----------|---------|
| User-agent | Specifies which crawler the rules apply to |
| Allow | Permits crawling of specified paths |
| Disallow | Blocks crawling of specified paths |
| Sitemap | Points to your XML sitemap |
| Crawl-delay | Suggests delay between requests (not universally supported) |
Important Notes
•robots.txt is a suggestion, not enforcement — well-behaved crawlers follow it
•It does not prevent indexing — use noindex meta tags for that
•Keep it simple — overly complex rules can accidentally block important pages
•Test it — Use Google Search Console's robots.txt tester