We use cookies
We use cookies to optimize our website. By continuing to browse the site, you agree to our use of cookies.
New design
Control panel
  • Русский
  • Українська
  • English
  • UAH
  • USD
  • RUB
  • EUR
  • 0-800-307-307 Hotline
  • +38 (044) 392-74-33 Kiev
  • +38 (057) 728-39-00 Kharkiv
  • +38 (056) 794-38-31 Dnipro
  • +38 (032) 229-58-93 Lviv
  • +38 (048) 738-57-70 Odessa
  • +38(093) 170-15-42  Life
  • +38 (067) 400-88-44 Kievstar
  • +1(888)393-24-51  USA, Toll free
  • +44(131)507-01-14  Great Britain
  • +7 (499) 348-28-61 Moscow

Everything about ROBOTS.TXT and its syntax

Basic syntax

User-Agent: a robot for which the following rules will apply (for example, "Googlebot")

Disallow: pages to which you want to block access (you can specify a large list of such directives on each new line)

Each User-Agent / Disallow group must be separated by a blank line. But, non-empty lines should not exist within the group (between User-Agent and the last Disallow directive).

The hash symbol (#) can be used for robots.txt comments: for the current line, anything after the # will be ignored. This comment can be used both for the entire line and at the end of the line after directives.

Directories and file names are case sensitive: "catalog", "Catalog" and "CATALOG" – these are all different directories for search engines.

Host: used to indicate to Yandex the main site mirror. Therefore, if you want to glue 2 sites and make a page-by-page 301 redirect, then for the robots.txt file (on the duplicate site) you do NOT need to redirect so that Yandex can see this directive on the site that needs to be glued.

Crawl-delay: you can limit the crawl speed of your site, because if your site has a very high traffic, then the load on the server from various search robots can lead to additional problems.

Regular expressions: for more flexible customization of your directives, you can use 2 characters

* (asterisk) – means any sequence of characters

$ (dollar sign) – means end of line

Basic examples of robots.txt usage

Ban on indexing the entire site

User-agent: *

Disallow: /

It is important to use this instruction when you are developing a new site and posting access to it, for example, through a subdomain.

Very often, developers forget to close the site from indexing in this way and immediately get a full copy of the site in the index of search engines. If this does happen, then you need to make a 301 page redirect to your main domain.

And this construction ALLOWS to index the entire site:

User-agent: *

Disallow:

Prevent indexing a specific folder

User-agent: Googlebot

Disallow: /no-index/

A ban on visiting the page for a specific robot

User-agent: Googlebot

Disallow: /no-index/this-page.html

Prevent indexing of certain file types

User-agent: *

Disallow: /*.pdf$

Allow a specific crawler to visit a specific page

User-agent: *

Disallow: /no-bots/block-all-bots-except-rogerbot-page.html

User-agent: Yandex

Allow: /no-bots/block-all-bots-except-Yandex-page.html

Link to Sitemap

User-agent: *

Disallow:

Sitemap: http://www.example.com/none-standard-location/sitemap.xml

Nuances using this directive: if unique content is constantly added to your site, then

  • it is better NOT to add a link to your sitemap in robots.txt,

  • make the sitemap itself with a NON-STANDARD name sitemap.xml (for example, my-new-sitemap.xml and then add this link through the "webmasters" of search engines),

since a lot of unscrupulous webmasters scrape content from other sites and use it for their projects.

Template for WordPress

Allow: /wp-content/themes/*.js 

Allow: /wp-content/themes/*.css 

Allow: /wp-includes/js/*.css Allow: /wp-includes/js/*.js 

Allow: /wp-content/plugins/*.js 

Allow: /wp-content/plugins/*.css 

Template for Joomla

Allow: /templates/*.css 

Allow: /templates/*.js 

Allow: /templates/*.png 

Allow: /templates/*.gif 

Allow: /templates/*.ttf 

Allow: /templates/*.svg 

Allow: /templates/*.woff 

Allow: /components/*.css 

Allow: /components/*.js 

Allow: /media/*.js Allow: /media/*.css 

Allow: /plugins/*.css Allow: /plugins/*.js 

Template for Bitrix

Allow: /bitrix/templates/*.js 

Allow: /bitrix/templates/*.png 

Allow: /bitrix/templates/*.jpg 

Allow: /bitrix/templates/*.gif 

Allow: /bitrix/cache/css/*.css 

Allow: /bitrix/cache/js/s1/*.js 

Allow: /upload/iblock/*.jpg 

Allow: /upload/iblock/*.png 

Allow: /upload/iblock/*.gif 

Template for DLE

Allow: /engine/classes/*.css 

Allow: /engine/classes/*.js 

Allow: /templates/ 

Having figured out the simple syntax of the commands for the robot, it is also important to take into account the values of the meta robots tag.

This meta tag can be assigned four values.

The content attribute can contain the following values:

index, noindex, follow, nofollow

If there are multiple values, they are separated by commas.

Currently, only the following values are important:

The INDEX directive tells the robot that this page can be indexed.

The FOLLOW directive informs the robot that it is allowed to follow the links on this page. Some authors have argued that in the absence of these values, the default search engines act as if they were given the INDEX and FOLLOW directives.

So, the global directives look like this:

Index everything = INDEX, FOLLOW

Index nothing = NOINDEX, NOFLLOW

Examples of meta robots tag:

Examples of meta robots tag:


Order hosting and choose a domain in the company "Hosting Ukraine".

We have a high-quality and reliable service, a convenient control system through the admin panel, intelligent security systems and technical support, which will help resolve all emerging issues at any time of the day.

Our prices: SSD hosting from 1$, VPS on SSD from 12$, Cloud (cloud) hosting from 3$, cloud VPS from 6$.

Join Hosting Ukraine and we will take care of the technical side of your business.


Only registered users can leave comments

Subscribe to the newsletter

We will send announcements of new articles and useful tips once a week

Follow us on social networks