What is Crawl Budget? And Why Does It Matter for SEO?
What is the Google Crawl Budget?
Google gives each page a crawl budget (credit), the maximum number of pages the Google bot crawls. This budget is not the same for all websites but depends on its PageRank . The higher the PageRank, the more pages are covered by the crawler. Since Google does not have infinite resources either, it is important for the server farms to use electricity costs as effectively as possible.
Why is the crawl budget important?
The optimization (SEO) of the internal link structure is important to ensure that the Google crawler finds and indexes the largest possible number of pages. Especially with large portals or online shops, an incorrect structure can cause the bot to stay in subareas that are not important, so important articles or high-quality content are not recorded because the crawler stops when its budget is reached. Small and medium-sized websites with a few thousand URLs don't have to worry about this.
How do I see the indexing status?
Using the Google Search Console, you can see the indexing status in the “Crawling Statistics” menu item. There all activities of the Google bot of the last 90 days are displayed and displayed graphically. A "declining chart" may indicate a problem crawling your page.
Difference between Crawl Rate and Crawl Demand
- Crawl Rate refers to the number of concurrent connections that Google Bot can use to crawl a website. The loading time of a website is therefore important for crawling.
- Crawl Demand depends on your Domain Authority , therefore how important your website is for Google and how often the search engine wants to crawl it.
Which factors influence the crawl budget?
- Faceted navigation and session IDs: When filtering, e.g. in online shops, using navigation (e.g. colors, sizes, gender), different URLs are created, which must be clearly defined in SEO. Session IDs also lead to duplicates on the web or endless URLs to the same page and affect the Google crawl budget.
- Duplicate content : In addition to the two points mentioned, avoiding duplicate content is an important factor in relation to the crawl budget.
- Soft Error Pages: Pages that do not return a 404 error, do not display the desired content, and return a 200 status code.
- Hacked Sites: Websites that have been hacked and thus may harm the user as well.
- Infinite Spaces: Incorrect functions of plugins, e.g. as in the calendar of the German museum. Infinite URLs can be generated here using the "scroll function".
- Thin or SPAM content: Inferior content or SPAM on a website leads to a reduction in crawling, up to and including the complete cessation of it.
- Numerous redirects: So-called 301 or 302 redirects lead to several "hops" for the Google Bot.
Missing redirects: A document should ideally only be accessible under one URL. You can use redirects, canonical tags, useful internal links or pagination to help search engines understand your website better. A website can theoretically be accessible under several URLs:
http://www.yoursite.com
https://www.yoursite.com
http://yoursite.com
https://yoursite.com
SEO measures to optimize the crawl budget
- Implement a flat page structure that allows the user to reach their goal in just a few clicks.
- Optimize the internal linking of your website, link pages or products that should be crawled more frequently.
- Exclude unimportant pages from crawling using robots.txt-
- Sensible use of the meta tags noindex or nofollow .
- Provide XML sitemap and submit accordingly.
- Improve website loading time.
- Avoid server errors and set up redirects sensibly.
- Optimizing your backlinks in a topic-relevant environment.
Conclusion
The Google crawl budget is an important factor in search engine optimization (SEO). Google needs to understand your website and be able to index it in a meaningful way. If your valuable content is slumbering in corners that are difficult for the bot to reach, it will not be able to build up good rankings either.