home Interesting The source code of the robots txt file. Yandex robots. Crawl-delay - stopwatch for weak servers

The source code of the robots txt file. Yandex robots. Crawl-delay - stopwatch for weak servers

The robot.txt file is required for most sites.

Each SEO-optimizer should understand the meaning of this file, as well as be able to prescribe the most requested directives.

Properly composed robots improves the position of the site in search results and, among other promotion methods, is an effective SEO tool.

To understand what robot.txt is and how it works, let's remember how search engines work.

To check for it, enter the root domain in the address bar, then add /robots.txt to the end of the URL.

For example, the Moz robot file is located at: moz.com/robots.txt. We enter, and get the page:

Instructions for the "robot"

How to create a robots.txt file?

3 types of instructions for robots.txt.

If you find that the robots.txt file is missing, creating one is easy.

As already mentioned at the beginning of the article, this is a regular text file in the root directory of the site.

It can be done through the admin panel or a file manager, with which the programmer works with files on the site.

We will figure out how and what to prescribe there in the course of the article.

Search engines receive three types of instructions from this file:

scan everything, that is full access(Allow);
nothing can be scanned - a complete ban (Disallow);
it is impossible to scan individual elements (which ones are indicated) - partial access.

In practice, it looks like this:

Please note that the page can still get into the SERP if it has a link installed on this site or outside it.

To better understand this, let's study the syntax of this file.

Robots.Txt Syntax

Robots.txt: what does it look like?

Important points: what you should always remember about robots.

Seven common terms that are often found on websites.

In its simplest form, the robot looks like this:

User agent: [name of the system for which we write directives] Disallow: Sitemap: [indicate where we have the sitemap] # Rule 1 User agent: Googlebot Disallow: /prim1/ Sitemap: http://www.nashsite.com /sitemap.xml

Together, these three lines are considered the simplest robots.txt.

Here we prevented the bot from indexing the URL: http://www.nashsite.com/prim1/ and indicated where the sitemap is located.

Please note: in the robots file, the set of directives for one user agent (search engine) is separated from the set of directives for another by a line break.

In a file with several search engine directives, each prohibition or permission applies only to the search engine specified in that particular block of lines.

it important point and it must not be forgotten.

If the file contains rules that apply to multiple user agents, the system will give priority to directives that are specific to the specified search engine.

Here is an example:

In the illustration above, MSNbot, discobot and Slurp have individual rules that will work only for these search engines.

All other user agents follow the general directives in the user-agent: * group.

The robots.txt syntax is absolutely straightforward.

There are seven general terms that are often found on websites.

User-agent: The specific web search engine (search engine bot) that you are instructing to crawl. A list of most user agents can be found here. In total, it has 302 systems, of which two are the most relevant - Google and Yandex.
Disallow: A disallow command that tells the agent not to visit the URL. Only one "disallow" line is allowed per URL.
Allow (only applicable to Googlebot): The command tells the bot that it can access the page or subfolder even if its parent page or subfolder has been closed.
Crawl-delay: How many milliseconds the search engine should wait before loading and crawling the page content.

Please note - Googlebot does not support this command, but the crawl rate can be manually set in Google Search Console.

Sitemap: Used to call the location of any XML maps associated with this URL. This command is only supported by Google, Ask, Bing and Yahoo.
Host: this directive specifies the main mirror of the site, which should be taken into account when indexing. It can only be written once.
Clean-param: This command is used to deal with duplicate content in dynamic addressing.

Regular Expressions

Regular expressions: what they look like and what they mean.

How to enable and disable crawling in robots.txt.

In practice, robots.txt files can grow and become quite complex and unwieldy.

The system makes it possible to use regular expressions to provide the required functionality of the file, that is, to work flexibly with pages and subfolders.

* is a wildcard, meaning that the directive works for all search bots;
$ matches the end of the URL or string;
# used for developer and optimizer comments.

Here are some examples of robots.txt for http://www.nashsite.com

Robots.txt URL: www.nashsite.com/robots.txt

User-agent: * (i.e. for all search engines) Disallow: / (slash denotes the site's root directory)

We have just banned all search engines from crawling and indexing the entire site.

How often is this action required?

Infrequently, but there are times when it is necessary that the resource does not participate in search results, and visits were made through special links or through corporate authorization.

This is how the internal sites of some firms work.

In addition, such a directive is prescribed if the site is under development or modernization.

If you need to allow the search engine to crawl everything on the site, then you need to write the following commands in robots.txt:

User-agent: * Disallow:

There is nothing in the prohibition (disallow), which means everything is possible.

Using this syntax in the robots.txt file allows crawlers to crawl all pages on http://www.nashsite.com, including homepage, admin and contacts.

Blocking specific search bots and individual folders

Syntax for Google search engine (Googlebot).

Syntax for other search agents.

User-agent: Googlebot Disallow: /example-subfolder/

This syntax only specifies Google search engine(Googlebot) that you don't need to crawl the address: www.nashsite.com/example-subfolder/.

Blocking individual pages for the specified bots:

User-agent: Bingbot Disallow: /example-subfolder/blocked-page.html

This syntax says that only Bingbot (the name of the Bing crawler) should not visit the page at: www.nashsite.com /example-subfolder/blocked-page.

In fact, that's all.

If you master these seven commands and three symbols and understand the application logic, you can write the correct robots.txt.

Why it doesn't work and what to do

Main action algorithm.

Other methods.

Misbehaving robots.txt is a problem.

After all, it will take time to identify the error, and then figure it out.

Re-read the file, make sure you haven't blocked anything extra.

If after a while it turns out that the page is still hanging in the search results, look in Google Webmaster to see if the site has been re-indexed by the search engine, and check if there are any external links to the closed page.

Because if they are, then it will be more difficult to hide it from the search results, other methods will be required.

Well, before using, check this file with a free tester from Google.

Timely analysis helps to avoid troubles and saves time.

We have released a new book "Content Marketing in in social networks: How to get into the head of subscribers and fall in love with your brand.

Robots.txt is a text file that contains information for crawlers that help index portal pages.

More videos on our channel - learn internet marketing with SEMANTICA

Imagine that you are on a treasure hunt on an island. You have a map. The route is indicated there: “Approach a large stump. From it, take 10 steps to the east, then reach the cliff. Turn right, find the cave."

These are directions. Following them, you follow the route and find the treasure. The search bot also works approximately the same way when it starts indexing a site or page. It finds the robots.txt file. It reads which pages should be indexed and which should not. And by following these commands, it bypasses the portal and adds its pages to the index.

What is robots.txt for?

They start visiting sites and indexing pages after the site is uploaded to the hosting and dns are registered. They do their job regardless of whether you have any technical files or not. Robots indicates to search engines that when crawling a website, they need to take into account the parameters that are in it.

The absence of a robots.txt file can lead to problems with the speed of crawling the site and the presence of garbage in the index. Incorrect file configuration is fraught with the exclusion of important parts of the resource from the index and the presence of unnecessary pages in the search results.

All this, as a result, leads to problems with promotion.

Let's take a closer look at what instructions are contained in this file, and how they affect the behavior of the bot on your site.

How to make robots.txt

First, check if you have this file.

Type in address bar browser address of the site and through a slash the name of the file, for example, https://www.xxxxx.ru/robots.txt

If the file is present, a list of its parameters will appear on the screen.

If the file does not exist:

The file is created in a plain text editor like Notepad or Notepad++.
You need to set the robots name, extension.txt. Enter data in accordance with accepted formatting standards.
You can check for errors using services such as the Yandex webmaster. There you need to select the "Analyze robots.txt" item in the "Tools" section and follow the prompts.
When the file is ready, upload it to the root directory of the site.

Customization Rules

Search engines have more than one robot. Some bots only index text content, some - only graphic. And the search engines themselves may have different schemes of how crawlers work. This must be taken into account when compiling the file.

Some of them may ignore some of the rules, for example, GoogleBot does not respond to information about which site mirror is considered the main one. But in general, they perceive and are guided by the file.

File syntax

Document parameters: name of the robot (bot) "User-agent", directives: allowing "Allow" and prohibiting "Disallow".

Now there are two key search engines: Yandex and Google, respectively, it is important to take into account the requirements of both when compiling a site.

The format for creating entries is as follows, note the required spaces and empty lines.

User agent directive

The robot searches for records that begin with User-agent, they must contain indications of the name of the search robot. If it is not specified, bot access is considered unrestricted.

Disallow and Allow Directives

If you need to disable indexing in robots.txt, use Disallow. With its help, they restrict the bot's access to the site or some sections.

If robots.txt does not contain a single "Disallow" directive, it is considered that indexing of the entire site is allowed. Usually bans are written after each bot separately.

All information after the # sign is commentary and is not machine readable.

Allow is used to allow access.

The asterisk symbol indicates that it applies to all: User-agent: *.

This option, on the contrary, means a complete ban on indexing for everyone.

Prevent viewing the entire contents of a specific directory folder

To block a single file, you need to specify its absolute path

Directives Sitemap, Host

For Yandex, it is customary to indicate which mirror you want to designate as the main one. And Google, as we remember, ignores it. If there are no mirrors, just fix how you think it is correct to write your website name with or without www.

Clean-param Directive

It can be used if the URLs of the website pages contain variable parameters that do not affect their content (these can be user ids, referrers).

For example, in the page address "ref" defines the traffic source, i.e. indicates where the visitor came to the site from. The page will be the same for all users.

The robot can be pointed to this, and it will not download duplicate information. This will reduce server load.

Crawl-delay directive

With the help, you can determine with what frequency the bot will load pages for analysis. This command is used when the server is overloaded and indicates that the bypass process needs to be accelerated.

robots.txt errors

The file is not in the root directory. The robot will not look for it deeper and will not take it into account.
The letters in the title must be small Latin.
Error in the name, sometimes they miss the letter S at the end and write robot.
You cannot use Cyrillic characters in the robots.txt file. If you need to specify a domain in Russian, use the format in the special Punycode encoding.
This is a method for converting domain names to a sequence of ASCII characters. To do this, you can use special converters.

This encoding looks like this:
website.rf = xn--80aswg.xn--p1ai

Additional information on what to close in robots txt and on settings in accordance with the requirements of Google and Yandex search engines can be found in reference documents. Different cms may also have their own characteristics, this should be taken into account.

Robots.txt- this is a text file that is located in the root of the site - http://site.ru/robots.txt. Its main purpose is to set certain directives to search engines - what and when to do on the site.

The simplest Robots.txt

The simplest robots.txt , which allows all search engines to index everything, looks like this:

User-agent : *
Disallow :

If the Disallow directive does not have a slash at the end, then all pages are allowed to be indexed.

This directive completely prohibits the site from being indexed:

User-agent : *
Disallow: /

User-agent - indicates for whom the directives are intended, an asterisk indicates that for all PSs, for Yandex indicate User-agent: Yandex.

Yandex help says that its crawlers process User-agent: * , but if User-agent: Yandex is present, User-agent: * is ignored.

Disallow and Allow Directives

There are two main directives:

Disallow - prohibit

Allow - allow

Example: On the blog, we forbade indexing the /wp-content/ folder where the plugin files, template, etc. are located. But there are also images that must be indexed by the PS in order to participate in the image search. To do this, you need to use the following scheme:

User-agent : *
Allow : /wp-content/uploads/ # Allow images to be indexed in the uploads folder
Disallow : /wp-content/

The order in which directives are used is important for Yandex if they apply to the same pages or folders. If you specify like this:

User-agent : *
Disallow : /wp-content/
Allow : /wp-content/uploads/

Images will not be loaded by the Yandex robot from the /uploads/ directory, because the first directive is being executed, which denies all access to the wp-content folder.

Google takes it easy and follows all the directives of the robots.txt file, regardless of their location.

Also, do not forget that directives with and without a slash perform a different role:

Disallow: /about Denies access to the entire site.ru/about/ directory, and pages that contain about - site.ru/about.html , site.ru/aboutlive.html, etc. will not be indexed.

Disallow: /about/ It will prohibit robots from indexing pages in the site.ru/about/ directory, and pages like site.ru/about.html, etc. will be available for indexing.

Regular expressions in robots.txt

Two characters are supported, these are:

* - implies any order of characters.

Example:

Disallow: /about* will deny access to all pages that contain about, in principle, and without an asterisk, such a directive will also work. But in some cases this expression is not replaceable. For example, in one category there are pages with .html at the end and without, in order to close all pages that contain html from indexing, we write the following directive:

Disallow : /about/*.html

Now the site.ru/about/live.html page is closed from indexing, and the site.ru/about/live page is open.

Another analogy example:

User-agent : Yandex
Allow : /about/*.html #allow indexing
Disallow : /about/

All pages will be closed, except for pages that end in .html

$ - cuts off the rest and marks the end of the line.

Example:

Disallow: /about- This robots.txt directive prohibits indexing all pages that start with about , as well as prohibiting pages in the /about/ directory.

By adding a dollar sign at the end - Disallow: /about$ we will tell the robots that only the /about page cannot be indexed, but the /about/ directory, /aboutlive pages, etc. can be indexed.

Sitemap directive

This directive specifies the path to the Sitemap, as follows:

Sitemap : http://site.ru/sitemap.xml

Host Directive

Specified in this form:

Host: site.ru

Without http:// , slashes, and the like. If you have a main mirror site with www, then write:

Robots.txt example for Bitrix

User-agent: *
Disallow: /*index.php$
Disallow: /bitrix/
Disallow: /auth/
Disallow: /personal/
Disallow: /upload/
Disallow: /search/
Disallow: /*/search/
Disallow: /*/slide_show/
Disallow: /*/gallery/*order=*
Disallow: /*?*
Disallow: /*&print=
Disallow: /*register=
Disallow: /*forgot_password=
Disallow: /*change_password=
Disallow: /*login=
Disallow: /*logout=
Disallow: /*auth=
Disallow: /*action=*
Disallow: /*bitrix_*=
Disallow: /*backurl=*
Disallow: /*BACKURL=*
Disallow: /*back_url=*
Disallow: /*BACK_URL=*
Disallow: /*back_url_admin=*
Disallow: /*print_course=Y
Disallow: /*COURSE_ID=
Disallow: /*PAGEN_*
Disallow: /*PAGE_*
Disallow: /*SHOWALL
Disallow: /*show_all=
Host: sitename.com
Sitemap: https://www.sitename.ru/sitemap.xml

WordPress robots.txt example

After all the necessary directives described above have been added. You should end up with a robots file like this:

This is, so to speak, the basic version of robots.txt for wordpress. There are two User-agents here - one for everyone and the second for Yandex, where the Host directive is specified.

robots meta tags

It is possible to close a page or site from indexing not only with the robots.txt file, this can be done using the meta tag.

You need to register it in the tag and this meta tag will prohibit indexing the site. There are plugins in WordPress that allow you to set such meta tags, for example - Platinum Seo Pack. With it, you can close any page from indexing, it uses meta tags.

Crawl-delay directive

With this directive, you can set the time for which the search bot should be interrupted between downloading site pages.

User-agent : *
Crawl-delay : 5

The timeout between two page loads will be 5 seconds. To reduce the load on the server, they usually set it to 15-20 seconds. This directive is needed for large, frequently updated sites where search bots just "live".

For regular sites/blogs this directive is not needed, but you can thus limit the behavior of other irrelevant search robots (Rambler, Yahoo, Bing), etc. After all, they also visit the site and index it, thereby creating a load on the server.

Hi all! Today I would like to tell you about robots.txt file. Yes, a lot of things are written about him on the Internet, but, to be honest, I myself am very for a long time I couldn't figure out how to create the correct robots.txt. I ended up making one and it's on all my blogs. I do not notice any problems with robots.txt, it works just fine.

Robots.txt for WordPress

And why, in fact, do we need robots.txt? The answer is still the same -. That is, compiling robots.txt is one of the parts search engine optimization site (by the way, very soon there will be a lesson that will be devoted to all the internal optimization of a WordPress site. Therefore, do not forget to subscribe to RSS so as not to miss interesting materials.).

One of the functions given file – prohibition of indexing unnecessary pages of the site. It also sets the address and prescribes the main site mirror(website with www or without www).

Note: for search engines, the same site with www and without www are completely different sites. But, realizing that the content of these sites is the same, search engines “glue” them together. Therefore, it is important to register the main site mirror in robots.txt. To find out which is the main one (with www or without www), just type the address of your site in the browser, for example, with www, if you are automatically redirected to the same site without www, then the main mirror of your site without www. I hope I explained correctly.

So, this cherished, in my opinion, correct robots.txt for wordpress You can see below.

Correct Robots.txt for WordPress

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /*?*
Disallow: /tag

User agent: Yandex
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /*?*
Disallow: /tag
host: website
.gz
Sitemap: https://site/sitemap.xml

Everything that is given above, you need to copy into text document with the .txt extension, that is, the file name should be robots.txt. You can create this text document, for example, using the program. Just please don't forget change in the last three lines address to your website address. The robots.txt file must be located in the root of the blog, that is, in the same folder as the wp-content, wp-admin, etc. folders.

Those who are too lazy to create this text file, you can just download robots.txt and also correct 3 lines there.

I want to note that in the technical parts, which will be discussed below, you do not need to load yourself heavily. I cite them for “knowledge”, so to speak, a general outlook, so that they know what is needed and why.

So the line:

user-agent

sets the rules for some search engine: for example, “*” (asterisk) indicates that the rules are for all search engines, and what is below

User agent: Yandex

means that these rules are for Yandex only.

Disallow
Here you “shove” sections that DO NOT need to be indexed by search engines. For example, on the https://site/tag/seo page, I have duplicate articles (repetition) with regular articles, and duplicating pages negatively affects search promotion, therefore, it is highly desirable that these sectors need to be closed from indexing, which we do using this rule:

Disallow: /tag

So, in the robots.txt given above, almost all unnecessary sections of the WordPress site are closed from indexing, that is, just leave everything as it is.

Host

Here we set the main mirror of the site, which I talked about a little higher.

Sitemap

In the last two lines, we specify the address of up to two sitemaps created with .

Possible problems

But because of this line in robots.txt, my site posts were no longer indexed:

Disallow: /*?*

As you can see, this very line in robots.txt prohibits the indexing of articles, which of course we don’t need at all. To fix this, you just need to remove these 2 lines (in the rules for all search engines and for Yandex) and the final correct robots.txt for a WordPress site without CNC will look like this:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /tag

User agent: Yandex
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /tag
host: website
Sitemap: https://site/sitemap.xml

To check whether we have compiled the robots.txt file correctly, I recommend that you use the Yandex Webmaster service (I told you how to register in this service).

We go to the section Indexing settings –> Robots.txt analysis:

Already there, click on the “Download robots.txt from the site” button, and then click on the “Check” button:

If you see something like the following message, then you have the correct robots.txt for Yandex:

First, I'll tell you what robots.txt is.

Robots.txt- a file that is located in the root folder of the site, where special instructions for search robots. These instructions are necessary so that when entering the site, the robot does not take into account the page / section, in other words, we close the page from indexing.

Why robots.txt is needed

The robots.txt file is considered a key requirement for SEO optimization of absolutely any site. The absence of this file can negatively affect the load from robots and slow indexing, and even more so, the site will not be indexed completely. Accordingly, users will not be able to go to pages through Yandex and Google.

Impact of robots.txt on search engines?

Search engines(especially Google) will index the site, but if there is no robots.txt file, then, as I said, not all pages. If there is such a file, then the robots are guided by the rules that are specified in this file. Moreover, there are several types of search robots, if some can take into account the rule, then others ignore it. In particular, the GoogleBot robot does not take into account the Host and Crawl-Delay directives, the YandexNews robot has recently ceased to take into account the Crawl-Delay directive, and the YandexDirect and YandexVideoParser robots ignore the generally accepted directives in robots.txt (but take into account those that are written specifically for them).

The site is loaded the most by robots that load content from your site. Accordingly, if we tell the robot which pages to index and which to ignore, as well as at what time intervals to load content from pages (this is more important for large sites that have more than 100,000 pages in the search engine index). This will make it much easier for the robot to index and load content from the site.

Files that are related to CMS, for example, in Wordpress - /wp-admin/, can be classified as unnecessary for search engines. In addition, ajax, json scripts responsible for pop-up forms, banners, captcha output, and so on.

For most robots, I also recommend that you close all Javascript and CSS files from indexing. But for GoogleBot and Yandex, it is better to index such files, as they are used by search engines to analyze the convenience of the site and its ranking.

What is the robots.txt directive?

directives- these are the rules for search robots. The first standards for writing robots.txt and, accordingly, appeared in 1994, and an extended standard in 1996. However, as you already know, not all robots support certain directives. Therefore, below I have described what the main robots are guided by when indexing the pages of the site.

What does user-agent mean?

This is the most important directive that determines for which search robots further rules will apply.

For all robots:

For a specific bot:

User agent: Googlebot

Case in robots.txt is not important, you can write both Googlebot and googlebot

Google crawlers

Yandex search robots


	Yandex's main indexing robot
	Used in the Yandex.Images service
	Used in the Yandex.Video service
	multimedia data
	Blog Search
	A crawler accessing a page when it is added via the "Add URL" form
	robot that indexes site icons (favicons)
	Yandex.Direct
	Yandex.Metrica
	Used in the Yandex.Catalog service
	Used in the Yandex.News service
YandexImageResizer	Search robot of mobile services

Search robots Bing, Yahoo, Mail.ru, Rambler

Disallow and Allow Directives

Disallow closes sections and pages of your site from indexing. Accordingly, Allow, on the contrary, opens them.

There are some features.

First, the additional operators are *, $, and #. What are they used for?

“*” is any number of characters and their absence. By default, it is already at the end of the line, so there is no point in putting it again.

“$” - indicates that the character before it must come last.

“#” - comment, everything that comes after this character is ignored by the robot.

Examples of using Disallow:

Disallow: *?s=

Disallow: /category/

Accordingly, the search robot will close pages like:

But pages of the form will be open for indexing:

Now you need to understand how nested rules are executed. The order in which directives are written is very important. The inheritance of rules is determined by which directories are specified, that is, if we want to close a page / document from indexing, it is enough to write a directive. Let's look at an example

This is our robots.txt file

Disallow: /template/

This directive is also indicated anywhere, and you can register several sitemap files.

Host directive in robots.txt

This directive is required to specify the main mirror of the site (often with or without www). note that host directive specified without the http:// protocol, but with the https:// protocol. The directive is taken into account only by Yandex and Mail.ru search robots, while other robots, including GoogleBot, will not take into account the rule. Host to register 1 time in the file robots.txt

Example with http://

Host: www.website.ru

Example with https://

Crawl-delay directive

Sets the time interval for indexing site pages by the search robot. The value is specified in seconds and milliseconds.

Example:

It is mainly used on large online stores, information sites, portals, where site traffic is from 5,000 per day. It is necessary for the search robot to make a request for indexing in a certain period of time. If you do not specify this directive, then this can create a serious load on the server.

The optimal crawl-delay value for each site is different. For search engines Mail, Bing, Yahoo, the value can be set minimum value 0.25, 0.3, since these search engine robots can crawl your site once a month, 2 months, and so on (very rarely). For Yandex, it is better to set a larger value.

If the load of your site is minimal, then there is no point in specifying this directive.

Clean-param Directive

The rule is interesting because it tells the crawler that pages with certain parameters do not need to be indexed. 2 arguments are written: page URL and parameter. This directive is supported search engine Yandex.

Example:

Disallow: /admin/

Disallow: /plugins/

Disallow: /search/

Disallow: /cart/

Disallow: *sort=

Disallow: *view=

User agent: GoogleBot

Disallow: /admin/

Disallow: /plugins/

Disallow: /search/

Disallow: /cart/

Disallow: *sort=

Disallow: *view=

Allow: /plugins/*.css

Allow: /plugins/*.js

Allow: /plugins/*.png

Allow: /plugins/*.jpg

Allow: /plugins/*.gif

User agent: Yandex

Disallow: /admin/

Disallow: /plugins/

Disallow: /search/

Disallow: /cart/

Disallow: *sort=

Disallow: *view=

Allow: /plugins/*.css

Allow: /plugins/*.js

Allow: /plugins/*.png

Allow: /plugins/*.jpg

Allow: /plugins/*.gif

Clean-Param: utm_source&utm_medium&utm_campaign

In the example, we have written rules for 3 different bots.

Where to add robots.txt?

Added to root folder site. In addition, so that it can be followed by a link:

How to check robots.txt?

Yandex Webmaster

On the Tools tab, select Analyze robots.txt and then click Check

Google Search Console

On the tab Scanning choose robots.txt file inspection tool and then click check.

Conclusion:

The robots.txt file must be mandatory on every promoted site, and only its correct configuration will allow you to get the necessary indexing.

And finally, if you have any questions, ask them in the comments under the article and I'm also wondering, how do you write robots.txt?

Just about the complex. Programs. Iron. Internet. Windows

The source code of the robots txt file. Yandex robots. Crawl-delay - stopwatch for weak servers

Instructions for the "robot"

Robots.Txt Syntax

Regular Expressions

Blocking specific search bots and individual folders

Why it doesn't work and what to do

What is robots.txt for?

How to make robots.txt

Customization Rules

File syntax

User agent directive

Disallow and Allow Directives

Directives Sitemap, Host

Clean-param Directive

Crawl-delay directive

robots.txt errors

The simplest Robots.txt

Disallow and Allow Directives

Regular expressions in robots.txt

Example:

Example:

Sitemap directive

Host Directive

Robots.txt example for Bitrix

WordPress robots.txt example

robots meta tags

Crawl-delay directive

Robots.txt for WordPress

Correct Robots.txt for WordPress

Possible problems

Why robots.txt is needed

Impact of robots.txt on search engines?

What is the robots.txt directive?

What does user-agent mean?

Google crawlers

Yandex search robots

Search robots Bing, Yahoo, Mail.ru, Rambler

Disallow and Allow Directives

Host directive in robots.txt

Crawl-delay directive

Clean-param Directive

How to check robots.txt?

Conclusion: