Sorry, this site doesn't work properly without JavaScript.

What is a crawler? Everything You Need to Know

Team TeachWiki

August 25, 2023 September 09, 2023

What is a crawler?

A crawler, also known as a robot, bot, spider, search bot or webcrawler, is a program that independently searches the World Wide Web and reads and indexes content and information. Derived from the search engine "WebCrawler" which worked in 1994 as the first public search engine with full-text index search.

How does a crawler work?

Triggered by a hyperlink on a website, the crawler scours the Internet and thus gets from website to website, the data collected from this is in turn stored in a database. The algorithms determine how often a website is crawled, the better known the site, the more frequently it is visited.

Which information a crawler records depends on its task:

Price comparison portals search for products, their availability and prices
In data mining, crawlers are used to generate addresses
News from the news portals are crawled
Plagiarism search for copyrighted material on the net

Google uses various bots for this, be it for Adsense, mobile sites, image search, news.

How can crawlers be blocked or controlled?

You can prevent crawling using robots.txt . Example:

User-agent: Googlebot

Disallow:

In this example, Googlebot is not allowed to visit the page

User-agent: Googlebot

Disallow: /reports

This example disallows Googlebot from indexing the /reports directory

With the help of the meta tags " nofollow " or " noindex " it is also possible to tell the crawler which page it should not follow or index. You can use the canonical tag to inform the crawler about the original page or show the structure using a sitemap .xml.

crawlers and search engine optimization

In SEO, it should be of interest to specifically control crawlers on your own website. Every website has a crawl budget , so this should be used as best as possible. Through targeted control or locking out, this can be used as effectively as possible. Pay attention to fast loading times, small file sizes and lean website architecture.

Comments

Newer Posts Older Posts


<script type="text/javascript" src="https://www.blogger.com/static/v1/widgets/1601900224-widgets.js"></script>
<script type='text/javascript'>
window['__wavt'] = 'AOuZoY4hmLWqFDqHEIU2kqV5D1BBt3RAjg:1763022420876';_WidgetManager._Init('//www.blogger.com/rearrange?blogID\x3d8775157000093746340','//blog.teachwiki.com/2023/08/crawler.html','8775157000093746340');
_WidgetManager._SetDataContext([{'name': 'blog', 'data': {'blogId': '8775157000093746340', 'title': 'TeachWiki', 'url': 'https://blog.teachwiki.com/2023/08/crawler.html', 'canonicalUrl': 'https://blog.teachwiki.com/2023/08/crawler.html', 'homepageUrl': 'https://blog.teachwiki.com/', 'searchUrl': 'https://blog.teachwiki.com/search', 'canonicalHomepageUrl': 'https://blog.teachwiki.com/', 'blogspotFaviconUrl': 'https://blog.teachwiki.com/favicon.ico', 'bloggerUrl': 'https://www.blogger.com', 'hasCustomDomain': true, 'httpsEnabled': true, 'enabledCommentProfileImages': true, 'gPlusViewType': 'FILTERED_POSTMOD', 'adultContent': false, 'analyticsAccountNumber': 'UA-117176450-1', 'encoding': 'UTF-8', 'locale': 'en', 'localeUnderscoreDelimited': 'en', 'languageDirection': 'ltr', 'isPrivate': false, 'isMobile': false, 'isMobileRequest': false, 'mobileClass': '', 'isPrivateBlog': false, 'isDynamicViewsAvailable': true, 'feedLinks': '\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22TeachWiki - Atom\x22 href\x3d\x22https://blog.teachwiki.com/feeds/posts/default\x22 /\x3e\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/rss+xml\x22 title\x3d\x22TeachWiki - RSS\x22 href\x3d\x22https://blog.teachwiki.com/feeds/posts/default?alt\x3drss\x22 /\x3e\n\x3clink rel\x3d\x22service.post\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22TeachWiki - Atom\x22 href\x3d\x22https://www.blogger.com/feeds/8775157000093746340/posts/default\x22 /\x3e\n\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22TeachWiki - Atom\x22 href\x3d\x22https://blog.teachwiki.com/feeds/8287753244080384147/comments/default\x22 /\x3e\n', 'meTag': '', 'adsenseHostId': 'ca-host-pub-1556223355139109', 'adsenseHasAds': false, 'adsenseAutoAds': false, 'boqCommentIframeForm': true, 'loginRedirectParam': '', 'view': '', 'dynamicViewsCommentsSrc': '//www.blogblog.com/dynamicviews/4224c15c4e7c9321/js/comments.js', 'dynamicViewsScriptSrc': '//www.blogblog.com/dynamicviews/52c8df0da55a1f9c', 'plusOneApiSrc': 'https://apis.google.com/js/platform.js', 'disableGComments': true, 'interstitialAccepted': false, 'sharing': {'platforms': [{'name': 'Get link', 'key': 'link', 'shareMessage': 'Get link', 'target': ''}, {'name': 'Facebook', 'key': 'facebook', 'shareMessage': 'Share to Facebook', 'target': 'facebook'}, {'name': 'BlogThis!', 'key': 'blogThis', 'shareMessage': 'BlogThis!', 'target': 'blog'}, {'name': 'X', 'key': 'twitter', 'shareMessage': 'Share to X', 'target': 'twitter'}, {'name': 'Pinterest', 'key': 'pinterest', 'shareMessage': 'Share to Pinterest', 'target': 'pinterest'}, {'name': 'Email', 'key': 'email', 'shareMessage': 'Email', 'target': 'email'}], 'disableGooglePlus': true, 'googlePlusShareButtonWidth': 0, 'googlePlusBootstrap': '\x3cscript type\x3d\x22text/javascript\x22\x3ewindow.___gcfg \x3d {\x27lang\x27: \x27en\x27};\x3c/script\x3e'}, 'hasCustomJumpLinkMessage': false, 'jumpLinkMessage': 'Read more', 'pageType': 'item', 'postId': '8287753244080384147', 'pageName': 'What is a crawler? Everything You Need to Know', 'pageTitle': 'TeachWiki: What is a crawler? Everything You Need to Know', 'metaDescription': 'Learn about web crawling, what it is, how it works and how it relates to SEO. Want to know more, Click!'}}, {'name': 'features', 'data': {}}, {'name': 'messages', 'data': {'edit': 'Edit', 'linkCopiedToClipboard': 'Link copied to clipboard!', 'ok': 'Ok', 'postLink': 'Post Link'}}, {'name': 'template', 'data': {'name': 'custom', 'localizedName': 'Custom', 'isResponsive': true, 'isAlternateRendering': false, 'isCustom': true}}, {'name': 'view', 'data': {'classic': {'name': 'classic', 'url': '?view\x3dclassic'}, 'flipcard': {'name': 'flipcard', 'url': '?view\x3dflipcard'}, 'magazine': {'name': 'magazine', 'url': '?view\x3dmagazine'}, 'mosaic': {'name': 'mosaic', 'url': '?view\x3dmosaic'}, 'sidebar': {'name': 'sidebar', 'url': '?view\x3dsidebar'}, 'snapshot': {'name': 'snapshot', 'url': '?view\x3dsnapshot'}, 'timeslide': {'name': 'timeslide', 'url': '?view\x3dtimeslide'}, 'isMobile': false, 'title': 'What is a crawler? Everything You Need to Know', 'description': 'Learn about web crawling, what it is, how it works and how it relates to SEO. Want to know more, Click!', 'url': 'https://blog.teachwiki.com/2023/08/crawler.html', 'type': 'item', 'isSingleItem': true, 'isMultipleItems': false, 'isError': false, 'isPage': false, 'isPost': true, 'isHomepage': false, 'isArchive': false, 'isLabelSearch': false, 'postId': 8287753244080384147}}, {'name': 'widgets', 'data': [{'title': 'JavaScript Variables', 'type': 'LinkList', 'sectionId': 'elcreative_panel_top', 'id': 'LinkList995'}, {'title': 'CSS Options', 'type': 'LinkList', 'sectionId': 'elcreative_panel_top', 'id': 'LinkList996'}, {'title': 'Custom SVG Icons', 'type': 'LinkList', 'sectionId': 'elcreative_panel_top', 'id': 'LinkList998'}, {'title': 'Custom SVG Pack (SVG Sprites)', 'type': 'HTML', 'sectionId': 'elcreative_panel_top', 'id': 'HTML994'}, {'title': 'TeachWiki (Header)', 'type': 'Header', 'sectionId': 'section_app_bar', 'id': 'Header1'}, {'title': 'Social links', 'type': 'LinkList', 'sectionId': 'section_tabbed_menu', 'id': 'LinkList98'}, {'title': 'Labels', 'type': 'Label', 'sectionId': 'section_tabbed_menu', 'id': 'Label100'}, {'title': 'Navigation Drawer Menu', 'type': 'LinkList', 'sectionId': 'section_navigation', 'id': 'LinkList999'}, {'title': 'More Menu', 'type': 'PageList', 'sectionId': 'section_navigation', 'id': 'PageList999'}, {'title': 'Blog Posts', 'type': 'Blog', 'sectionId': 'section_main_widget', 'id': 'Blog1', 'posts': [{'id': '8287753244080384147', 'title': 'What is a crawler? Everything You Need to Know', 'showInlineAds': false}], 'headerByline': {'regionName': 'header1', 'items': [{'name': 'share', 'label': ''}, {'name': 'author', 'label': 'By'}, {'name': 'timestamp', 'label': '-'}]}, 'footerBylines': [{'regionName': 'footer1', 'items': [{'name': 'comments', 'label': 'comments'}, {'name': 'icons', 'label': ''}]}, {'regionName': 'footer2', 'items': [{'name': 'labels', 'label': ''}]}, {'regionName': 'footer3', 'items': [{'name': 'location', 'label': 'Location:'}]}], 'allBylineItems': [{'name': 'share', 'label': ''}, {'name': 'author', 'label': 'By'}, {'name': 'timestamp', 'label': '-'}, {'name': 'comments', 'label': 'comments'}, {'name': 'icons', 'label': ''}, {'name': 'labels', 'label': ''}, {'name': 'location', 'label': 'Location:'}]}, {'title': 'Featured post', 'type': 'FeaturedPost', 'sectionId': 'section_aside_widget', 'id': 'FeaturedPost1', 'postId': '2880011444328188547'}, {'title': 'image1', 'type': 'Image', 'sectionId': 'footer_widgets', 'id': 'Image1'}, {'title': 'Pages', 'type': 'PageList', 'sectionId': 'footer_widgets', 'id': 'PageList1'}, {'title': 'Company', 'type': 'LinkList', 'sectionId': 'footer_widgets', 'id': 'LinkList6'}, {'title': 'Custom Vanilla JavaScript', 'type': 'HTML', 'sectionId': 'elcreative_panel_bottom', 'id': 'HTML998'}, {'title': 'Custom jQuery Script', 'type': 'HTML', 'sectionId': 'elcreative_panel_bottom', 'id': 'HTML999'}, {'title': 'Contact Form', 'type': 'ContactForm', 'sectionId': 'hidden', 'id': 'ContactForm1'}]}]);
_WidgetManager._RegisterWidget('_LinkListView', new _WidgetInfo('LinkList995', 'elcreative_panel_top', document.getElementById('LinkList995'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_LinkListView', new _WidgetInfo('LinkList996', 'elcreative_panel_top', document.getElementById('LinkList996'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_LinkListView', new _WidgetInfo('LinkList998', 'elcreative_panel_top', document.getElementById('LinkList998'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_HTMLView', new _WidgetInfo('HTML994', 'elcreative_panel_top', document.getElementById('HTML994'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_HeaderView', new _WidgetInfo('Header1', 'section_app_bar', document.getElementById('Header1'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_LinkListView', new _WidgetInfo('LinkList98', 'section_tabbed_menu', document.getElementById('LinkList98'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_LabelView', new _WidgetInfo('Label100', 'section_tabbed_menu', document.getElementById('Label100'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_LinkListView', new _WidgetInfo('LinkList999', 'section_navigation', document.getElementById('LinkList999'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_PageListView', new _WidgetInfo('PageList999', 'section_navigation', document.getElementById('PageList999'), {'title': 'More Menu', 'links': [{'isCurrentPage': false, 'href': 'https://blog.teachwiki.com/p/authors.html', 'id': '4466910508562469180', 'title': 'Submit Articles'}, {'isCurrentPage': false, 'href': 'https://blog.teachwiki.com/p/login.html', 'id': '7622936770685076934', 'title': 'Admin'}, {'isCurrentPage': false, 'href': 'https://blog.teachwiki.com/p/contact-us.html', 'id': '5608590007871794627', 'title': 'Give us feedback'}, {'isCurrentPage': false, 'href': 'https://blog.teachwiki.com/search', 'title': ' Digital Marketing News'}], 'mobile': false, 'showPlaceholder': true, 'hasCurrentPage': false}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_BlogView', new _WidgetInfo('Blog1', 'section_main_widget', document.getElementById('Blog1'), {'cmtInteractionsEnabled': false, 'lightboxEnabled': true, 'lightboxModuleUrl': 'https://www.blogger.com/static/v1/jsbin/3651071657-lbx.js', 'lightboxCssUrl': 'https://www.blogger.com/static/v1/v-css/828616780-lightbox_bundle.css'}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_FeaturedPostView', new _WidgetInfo('FeaturedPost1', 'section_aside_widget', document.getElementById('FeaturedPost1'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_ImageView', new _WidgetInfo('Image1', 'footer_widgets', document.getElementById('Image1'), {'resize': true}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_PageListView', new _WidgetInfo('PageList1', 'footer_widgets', document.getElementById('PageList1'), {'title': 'Pages', 'links': [{'isCurrentPage': false, 'href': 'http://blog.teachwiki.com/', 'title': 'Home'}], 'mobile': false, 'showPlaceholder': true, 'hasCurrentPage': false}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_LinkListView', new _WidgetInfo('LinkList6', 'footer_widgets', document.getElementById('LinkList6'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_HTMLView', new _WidgetInfo('HTML998', 'elcreative_panel_bottom', document.getElementById('HTML998'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_HTMLView', new _WidgetInfo('HTML999', 'elcreative_panel_bottom', document.getElementById('HTML999'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_ContactFormView', new _WidgetInfo('ContactForm1', 'hidden', document.getElementById('ContactForm1'), {'contactFormMessageSendingMsg': 'Sending...', 'contactFormMessageSentMsg': 'Your message has been sent.', 'contactFormMessageNotSentMsg': 'Message could not be sent. Please try again later.', 'contactFormInvalidEmailMsg': 'A valid email address is required.', 'contactFormEmptyMessageMsg': 'Message field cannot be empty.', 'title': 'Contact Form', 'blogId': '8775157000093746340', 'contactFormNameMsg': 'Name', 'contactFormEmailMsg': 'Email', 'contactFormMessageMsg': 'Message', 'contactFormSendMsg': 'Send', 'contactFormToken': 'AOuZoY69RSciBz7HYo-x02xepBlSWEZIfw:1763022420877', 'submitUrl': 'https://www.blogger.com/contact-form.do'}, 'displayModeFull'));
</script>
</body>