What is Robots.txt?
**Robots.txt** is a plain text file placed at the root of your website (e.g.,yourdomain.com/robots.txt) that provides instructions to web crawlers about which pages or sections they should or shouldn't access.Basic Syntax
plaintext
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Sitemap: https://yourdomain.com/sitemap.xmlKey Directives
| Directive | Purpose | Example ||-----------|---------|---------|
| User-agent | Which crawler the rules apply to |
User-agent: Googlebot || Allow | Pages the crawler CAN access |
Allow: /public/ || Disallow | Pages the crawler should NOT access |
Disallow: /admin/ || Sitemap | Location of your sitemap |
Sitemap: https://... || Crawl-delay | Time between requests (not Google) |
Crawl-delay: 10 |SEO Impact
1.Crawl Budget: Prevents wasting crawl budget on unimportant pages
2.Duplicate Content: Block parameter URLs, print versions, etc.
3.Staging/Dev: Prevent indexing of test environments
4.Private Content: Keep admin areas out of search results
Common Mistakes
•❌ Using robots.txt to hide sensitive content (it's publicly readable!)
•❌ Accidentally blocking important pages
•❌ Forgetting to update after site restructure
•❌ Blocking CSS/JS files needed for rendering