Written by: Ryan Walsh

11th November 2023

 

A Robots.txt file is very simple to create, yet its important whether the website is an e-commerce website or a brochure website, that it has a Robot.txt file.

The reason being is there will be pages, such as the blog posts, about us page and homepage that you will want Googlebot and other search engines to crawl.

Yet there’s no point in Googlebot crawling say the WordPress login page everyday that’s just used to login, its not for shoppers.

 

So, what exactly is a Robot.txt file?

You can create a Robot.txt file in something simple like Microsoft’s Notepad, and it simply sets out a set of instructions, for say Googlebot, on which pages to crawl and index and which ones not to.

 

Now you might be thinking, well what’s the point in that?

 

Well, here’s the thing each website has an allocated “crawl budget”, if your websites super important, let’s say the BBC, well its likely to have a high crawl budget and get crawled and indexed every single day, perhaps even multiple time a day.

Now, lets say there’s a website that doesn’t get updated that much, lets say the last blog post yet live back in 2017.

Well Googlebot is not going to crawl that website so often, as it doesn’t change that often, there’s no new work being added. So, this simply means it has less of a crawl budge.

So a Robots.txt helps to save time, that’s the search engines time by not having to keep crawling non-important pages.

 

You can set different commands for different user agents

 

Different search engines, use different user agents.

  • Google uses Googlebot
  • Bing uses Bingbot
  • And Baidu mainly used in China uses Baiduspider

 

Disallow

 

Disallow can be used to tell the crawler not to crawl a page of blog post, yet, you do need to keep checking whether you want this command left in place over long term. For example you might put in place, because say that product is out of stock, so you don’t want the page to appear in Google.

However, you do have to remember to delete that, if you wish to restock and resale the product, as otherwise, that page will never appear in Google’s results.

Its most definitely worth having a site map, as gives the crawler a list of URL’s to crawl, and to allow the crawler to crawl specific pages.

 

Make sure you keep your Robots.txt file up to date

 

Its important that as you add new blog posts and pages, that you go back to the Robots.txt and keep it up to date. So, for example, when you add a new main page, that’s a new service that your offering, do make sure you update the file.

This takes seconds to do (or your web developer to do), yet it can make sure that Googlebot and the other crawlers, then have a list of up to date URL’s on your website, so it can crawl and index them and spot any changes that have been made to those pages or blog posts.