The 404 page is designed to inform the user that the url (page address) he specified does not exist.
Such incorrect urls can also be called "broken links".
Many sites make their 404 pages for the convenience of their users. Often these are beautiful and interesting pages that make the user smile instead of being disappointed that the page address is incorrect.
When creating a 404 page, there is an important technical component that greatly affects the ranking of sites in search engines if everything is not set up correctly.

If you are puzzled by the creation of the 404 page, then you need to consider three points:
1) Redirect from all incorrectly entered urls to the 404 page in .htaccess.
2) Correct server response after the redirect (the http code of the page should be 404, not 200).
3) Closing the 404 page from indexing in robots.txt

I note right away that all of the above is written for self-written sites, mainly in php. For wordpress, there are plugins to customize the same. But in this article we will look at how everything looks in reality. %)

Redirecting (redirecting) incorrect urls to a 404 page

The first thing you do is create the 404 page itself so that there is where to send people%%.
Redirect url is configured in the .htaccess file
Just enter the line:
ErrorDocument 404 http://mysite.com/404.php
Where "mysite.com" is your domain and http://mysite.com/404.php is the path to the real page. If your site is in html, then the line will look like:
ErrorDocument 404 http://mysite.com/404.html
The verification is very simple. After uploading the .htaccess file with the above line to the hosting, make a check by entering a deliberately non-existent url (broken link), for example: http://mysite.com/$%$%
If the redirect to the page you created happened, then everything is working.
So, the complete .htaccess file, where ONLY redirect to 404 is configured, will look like this:
____________________________
Rewrite Engine on
ErrorDocument 404 http://mysite.com/404.html
____________________________

Correct server response (page http code)

It is very important that there is a correct server response during the redirect, namely 404 Not Found.
This needs to be explained separately.

Any url upon request is assigned a status (http code of the page).
For all existing pages, this is: HTTP/1.1 200 OK
For redirected pages: HTTP/1.1 302 Found
If the page does not exist, it should be HTTP/1.1 404 Not Found

That is, whatever URL is entered, it is assigned a status, a certain server response code.
You can check the server response on a resource such as bertal.ru or SEARCH CONCOLE GOOGLE - Scan / View as a GOOGLE bot.
When you did not have a .htaccess redirect to a 404 page, then any non-existent url entered by the user, as well as broken links, received the response “HTTP/1.1 404 Not Found”

After you set up a redirect to your author's 404 page via .htaccess, as described above, then entering a broken link (invalid url that obviously does not exist), such as http://mysite.com/$%$% , the server response will be:
- first HTTP/1.1 302 Found (redirect),
- followed by HTTP/1.1 200 OK (page exists).

Check through bertal.ru.
What does it threaten? This will mean that Google can enter all broken links into its database (index) as existing pages with the content of the 404 page. In fact, duplicate pages. And this is incredibly harmful for search engine optimization.

In this case, you need to do two things:
1) Set up the correct server response on the 404 page.
2) Close the 404 page from indexing. This is done through the robots.txt file

Configure HTTP/1.1 404 Not Found server response for non-existent pages

Server response is configurable thanks to php functions at the very top of the page:

Write it at the beginning of the 404 file.
As a result, we should get a response to a broken link:

Close 404 page from indexing

You can close the page from indexing in the rodots.txt file. Be careful with this tool, because through this file your site, in fact, communicates with search robots!
The full text of the rodots.txt file, where the 404 page indexing is ONLY closed, looks like this:
____________________________
User-agent: *
Disallow:
Disallow: /404.php
____________________________

Code notes: "/404.php" means the path to the page. If on your site the 404.php (or 404.html, respectively) page is located in some folder, then the path will look like:
/holder/404.php
where "holder" is the name of the folder.

That, in fact, is all about the 404 page. Check the operation of the page, redirects of broken links, and server responses.
I repeat: All of the above is for self-written sites. If you are using wordpress, you can look for a decent 404 error plugin.

The 404 error is the most recognizable and common hypertext document error. It reports that the page does not exist at the given address. In fact, we are talking about the absence of an html file for the specified document, so the site returns an error.

To deal with the issue in more detail, as well as with a number of service files that each resource has, you need to study the moment associated with the hypertext presentation of pages using HTML language(HyperText Markup Language- "Hypertext Markup Language"), and the HTTP protocol through which access is made. Despite the fact that you have to understand the programming language, the form of its presentation is so simple that anyone can understand it.

More videos on our channel - learn internet marketing with SEMANTICA

Hypertext pages and their features

The Internet was born at the moment when the English engineer Timothy John Berners-Lee came up with a hypertext form of representation. text pages on the network and described the principle of access to them via the HTTP application protocol. According to the general idea, the user from his device, namely from the browser, makes a network request for a specific resource. At this point, a session is opened on the server being accessed. An HTML page is returned as a response.

Of course, in our time, more complex algorithms for accessing and “swapping” large pages are used, but general principle remains the same. To access a resource, you need Domain name and IP address. Only if these criteria are met and the resource is in a working state will a "404 not found" error be returned for the missing document.

What the default 404 page looks like

A site may or may not have a decorated "http 404 not found" access error page. A user with little experience is usually quite nervous when receiving it and believes that this is his fault. In fact, everything is much simpler, the answer follows from the above.

The 404 error code in the form of a separate page designed in the site design is returned only if the site has a 404.html file. It is usually located in the root directory. Otherwise, the browser will report this error along with a message about the lack of access. And it usually looks like a white sheet with an error message.

When developing websites, as a rule, content management systems are used. They contain a 404 page indicating in functional files way to her. Typically, such a page contains a message about a non-existent address and a link to go to home page site. The template for the 404 page can be changed to your liking, as this is a page on your site that you can use however you wish.

In order to remake the template, you will need knowledge of HTML to mark up the file. Please note that in individual cases the page file may have a different name - err404. html, 404.php. The difference from the standard is most often associated with wider functionality, as well as system features. For example, in WordPress, the document can be found in the 404.php directory. In the address bar "error 404 page not found" will be displayed something like this: domain.ru/404/.

Adapting a Standard 404 Page to Your Purposes

To improve the usability (acceptability for users) of the site, of course, it is necessary to make a page that will attract the visitor to your site and help them continue browsing. Things to keep in mind when writing code:

  • A significant proportion of visitors who encounter non-existent pages go to the site from search engines or via links on forums, websites and in social networks, that is, from where old links to long-defunct pages can be located.
  • Users are not looking for your site, but for information of interest by keyword, that is, in the absence of the desired one, the visitor leaves the site and rarely views it.

That is, you must understand that it will not be so easy to keep such visitors, but it is possible!

Take a standard template available on the Internet, or make your own, taking into account the above features:

  1. Briefly explain to the person what happened and why he does not see what he was looking for. Give him options further action to help you find what you are looking for.
  2. Display a search string on the 404 page so that the visitor can immediately find what they want.
  3. Be sure to display the menu of your site here, with the help of which, a person can understand where to go.
  4. Make sure that the page attracts the user, and he wants to find information on your resource. Use colorful and interesting text-visual solutions.

To make the 404 server error page attractive to the user, it is enough to cause him a smile or interest. Therefore, try to work on the originality of the idea for such a section of your resource.

Editing the 404 Page

You can edit the file from the content management system directly, for this you need to add the desired markup and images.
When creating it, be guided by the fact that information should open quickly and without delay. The page should be "light" (take up little space), useful, and offer alternatives to finding a non-existent document.

  • transition to the main;
  • list of rating pages of the site;
  • transition to the resource map;
  • a button to inform the administration about a "broken" link on a specific source.

Otherwise the best assistant there will be fantasy, corporate standards and the original idea of ​​the designer.

Conclusion

The 404 not found page is a service file that can be modified and supplemented to attract more visitors to the site. This file is required, because otherwise, the browser will display an error message, after which the chance to lure a person to you will be zero. Try to fill it with colorful images and even light humor.

All indicated errors are not critical for Yandex, it will index the pages anyway. But they can make it difficult for potential visitors to your page to find it among millions of others.

404 error not being handled correctly not found"
Check what your scripts do in case of errors. If the script reports an error and returns a normal exit code of 200, then the message will be indexed. If your script returns HTTP code 404, this error message will not be indexed.
This also applies to regular documents. Some servers are configured to send a 200 normal exit code in case of an error. This prevents the robot from deleting the link to the page in the database. Any modern web server allows you to modify standard error messages and send them back with the correct error code.

Spam, or not to deceive the user
Spam is headers and keywords, flavored with a large number of words from the most popular queries, large arrays of text "written" on the page with background color or very small print, as well as many other tricks to attract users to their pages by deception.
They should not be used for two reasons. Firstly, it does not add glory to the page creator and naturally annoys users. Secondly, Yandex tracks such abnormal changes and reduces the place of the document on the results page. In addition, spam increases the size of the document and, consequently, reduces the contrast of words in it, which also affects the place of the document in the list of found ones. In cases of malicious use of spam, the Yandex administration may exclude such pages and sites from the database.
Pages with a redirect time to other pages (redirect) equal to zero are also excluded from indexing.

Wrong dates
Search and sorting by date works on Yandex, but in 20% of cases the servers do not give the correct file modification date. Set up your server correctly. Do not deprive the user of additional information and use the opportunity to correctly show your pages when searching by date.

Indexing identical documents in different encodings
A lot of resources are wasted when indexing the same documents issued by web servers in different encodings. At the same time, Russian search engines still keep documents in one of the encodings in their databases. It is recommended to disable all encodings for indexing, except for one. If encodings are issued by server ports, then it is necessary to issue different robots.txt on different ports (servers). This means that in all ports / servers, except for the main one, it should be written

If encodings are issued, for example, by directories, then you need to make one robots.txt file, in which it will be written

Disallow: /alt
disallow: /mac
disallow: /koi

Indexing the same site on different servers
This problem occurs when the server has mirrors and/or encodings are prefixed in the hostname, e.g. for host www.chto-to.ru:
win.chto-to.ru, koi-www.chto-to.ru, wwwmac.chto-to.ru etc.
The robot is not able to independently determine the "main" address. The only thing it can (and does) do is determine that two documents match up to encodings and index only one of them.
Thus, it may turn out that different parts of your site are indexed on different hosts. If you want your entire site to be indexed at one address (host), disable the indexing of the rest by setting the appropriate robots.txt file. After some time (as the robot crawls), all indexed documents will refer to this host.