User-Agent: a robot for which the following rules will apply (for example, "Googlebot")
Disallow: pages to which you want to block access (you can specify a large list of such directives on each new line)
Each User-Agent / Disallow group must be separated by a blank line. But, non-empty lines should not exist within the group (between User-Agent and the last Disallow directive).
The hash symbol (#) can be used for robots.txt comments: for the current line, anything after the # will be ignored. This comment can be used both for the entire line and at the end of the line after directives.
Directories and file names are case sensitive: "catalog", "Catalog" and "CATALOG" – these are all different directories for search engines.
Host: used to indicate to Yandex the main site mirror. Therefore, if you want to glue 2 sites and make a page-by-page 301 redirect, then for the robots.txt file (on the duplicate site) you do NOT need to redirect so that Yandex can see this directive on the site that needs to be glued.
Crawl-delay: you can limit the crawl speed of your site, because if your site has a very high traffic, then the load on the server from various search robots can lead to additional problems.
Regular expressions: for more flexible customization of your directives, you can use 2 characters
* (asterisk) – means any sequence of characters
$ (dollar sign) – means end of line
It is important to use this instruction when you are developing a new site and posting access to it, for example, through a subdomain.
Very often, developers forget to close the site from indexing in this way and immediately get a full copy of the site in the index of search engines. If this does happen, then you need to make a 301 page redirect to your main domain.
And this construction ALLOWS to index the entire site:
Nuances using this directive: if unique content is constantly added to your site, then
it is better NOT to add a link to your sitemap in robots.txt,
make the sitemap itself with a NON-STANDARD name sitemap.xml (for example, my-new-sitemap.xml and then add this link through the "webmasters" of search engines),
since a lot of unscrupulous webmasters scrape content from other sites and use it for their projects.
Allow: /wp-includes/js/*.css Allow: /wp-includes/js/*.js
Allow: /media/*.js Allow: /media/*.css
Allow: /plugins/*.css Allow: /plugins/*.js
Having figured out the simple syntax of the commands for the robot, it is also important to take into account the values of the meta robots tag.
This meta tag can be assigned four values.
The content attribute can contain the following values:
index, noindex, follow, nofollow
If there are multiple values, they are separated by commas.
Currently, only the following values are important:
The INDEX directive tells the robot that this page can be indexed.
The FOLLOW directive informs the robot that it is allowed to follow the links on this page. Some authors have argued that in the absence of these values, the default search engines act as if they were given the INDEX and FOLLOW directives.
So, the global directives look like this:
Index everything = INDEX, FOLLOW
Index nothing = NOINDEX, NOFLLOW
Examples of meta robots tag:
We have a high-quality and reliable service, a convenient control system through the admin panel, intelligent security systems and technical support, which will help resolve all emerging issues at any time of the day.
Our prices: SSD hosting from 1$, VPS on SSD from 12$, Cloud (cloud) hosting from 3$, cloud VPS from 6$.
Join Hosting Ukraine and we will take care of the technical side of your business.