What is Web Crawling?
Web crawling (also called spidering) is the automated process of systematically browsing websites to discover, fetch, and index their content. Crawlers follow links from page to page, building a map of a site's content.How Web Crawlers Work
1.Start with seed URLs — Begin with a list of starting pages
2.Fetch content — Download the HTML of each page
3.Extract links — Find all links on the page
4.Follow links — Add discovered URLs to the crawl queue
5.Store content — Save the page content for processing
6.Respect rules — Follow robots.txt and rate limits
Types of Crawlers
•Search engine crawlers — Googlebot, Bingbot (index the web)
•Content crawlers — Extract specific data from websites
•AI training crawlers — Gather content for AI knowledge bases
Web Crawling in SiteSupport
SiteSupport uses web crawling to:•Automatically discover all pages on your website
•Extract and process content from each page
•Build a knowledge base for your AI chatbot
•Re-crawl periodically to keep content fresh