Searching for information on the Internet

The Internet is growing at a very fast pace, so find necessary information among hundreds of billions of Web pages and hundreds of millions of files, it becomes more and more difficult. To search for information, special search engines are used, which contain constantly updated information about the location of Web pages and files on hundreds of millions of Internet servers.

Search engines contain thematically grouped information about the information resources of the World Wide Web in databases. Special programs-robots periodically "bypass" Internet Web-servers, read all encountered documents, highlight keywords in them and enter Internet addresses of documents into the database.

Most search engines allow the Web site author to enter information into the database by filling out a registration form. In the process of filling out the questionnaire, the site developer enters the site address, its name, short description the content of the site, as well as the keywords by which it will be easiest to find the site.

Keyword search. The search for a document in the database of the search engine is carried out by entering queries into search box.

The request must contain one or more keywords that are the main ones for this document. For example, to search for the Internet search engines themselves, you can enter the keywords " Russian system search for information on the Internet "(Fig. 6.21).

Some time after sending the request, the search engine will return a list of Internet addresses of documents in which the specified keywords were found. To view this document in the browser, it is enough to activate the link pointing to it (Fig. 6.22).

If the keywords were chosen unsuccessfully, then the list of document addresses may be too large (may contain tens or even hundreds of thousands of links). To narrow the list, you can enter additional keywords in the search field or use the search engine directory.

One of the most complete and powerful search engines is Google (, which stores 8 billion Web pages in its database and 5 million new pages are entered by robots every month. In the Runet (the Russian part of the Internet), extensive databases containing 200 million documents each have Yandex ( and Rambler ( search engines.

Search in a hierarchical directory system. In a search engine database, Web sites are grouped into hierarchical subject directories, which are analogous to a subject catalog in a library.

Top-level thematic sections, for example: Internet, Computers, Science and education and so on, contain nested directories. For example, the Internet directory may contain subdirectories Search, Mail and others (Fig. 6.23).

Searching for information in a directory is reduced to selecting a specific directory, after which the user will be presented with a list of links to the Internet addresses of the most visited and meaningful Web sites. Each link is usually annotated, i.e. it contains a short commentary on the content of the document.

The Aport search engine ( has the most complete multi-level hierarchical thematic catalog of Russian-language Internet resources. The catalog contains a detailed annotation of the content of the Web sites and an indication of their geographic location.

File search. To search for files on file archive servers, there are specialized search engines, including the FileSearch search engine ( To search for a file, you must enter the file name in the search field, and the search engine will return the Internet addresses of the file archive servers that store the file with the given name.

Searching for information in the Russian-speaking part of the Internet using the most search engines: Google, Rambler, Aport, Applex and the Research file search engine can be done using the integrated search engine (Fig. 6.24). To do this, just enter keywords in the search bar, use the switches to set the type of information required and click on the button with the name of the search engine (Fig. 6.24). To do this, simply enter keywords in the search bar, use the switches to set the type of information required and click on the button with the name of the search engine.

Rice. 6.24. Integrated search engine

Internet search methods

Three Ways to Search the Internet

The Internet in general, and the World Wide Web in particular, provide the subscriber with access to thousands of servers and millions of Web pages that store an unimaginable amount of information. How not to get lost in this "information ocean"? To do this, you need to learn how to search and find the necessary information on the network.

As already mentioned, there are three main ways to find information on the Internet.

1. Specifying the page address. This is the most fast way search, but it can only be used if the exact address of the document is known.

2. Navigation through hyperlinks. This is the least convenient method, since it can be used to search for documents that are only close in meaning to the current document. If the current document is dedicated to, for example, music, then using the hyperlinks of this document, it will hardly be possible to get to a site dedicated to sports.

3. Calling the search server search engine) . Using search engines is the most convenient way to find information. Currently, the following search servers are popular in the Russian-speaking part of the Internet:


There are other search engines as well. For example, efficient system search implemented on the server postal service

Search servers

The most accessible and convenient way to search for information in world wide web is the use of search engines. At the same time, information can be searched for by catalogs, as well as by a set of keywords characterizing the searched text document.

Consider the use of search servers in more detail. search server contains a large number of links to the most various documents, and all these links are organized into thematic directories. For example: sports, cinema, cars, games, science, etc. Moreover, these links are set by the server independently, in automatic mode by regularly viewing all the Web pages that appear on the World Wide Web. In addition, search servers provide the user with the ability to search for information by keywords. After entering keywords, the search server starts browsing documents on other Web servers and displays links to those documents in which the specified words are found. Typically, search results are sorted in descending order by a special document rating that indicates how well a given document matches the search criteria or how often it is requested on the web.

Search engine query language

A group of keywords, formed according to certain rules - using the query language, is called a request to the search server. Query languages ​​for different search engines are very similar. You can learn more about this by visiting the "Help" section of the desired search server. Consider the rules for generating queries using the Yandex search engine as an example.

Operator syntax What does operator mean Request example
space or & Logical AND (within sentence) physiotherapy
&& Logical AND (within the document) recipes && (processed cheese)
| Logical OR photo | photography | snapshot | photographic image
+ Mandatory presence of the word in the found document +to be or +not to be
() Grouping words (technology | production) (cheese | cottage cheese)
~ binary operator AND NOT (within sentence) banks ~ law
Binary AND NOT operator (within document) Paris travel guide ~~ (agency | tour)
/(nm) Distance in words (minus (-) - back, plus (+) - forward) suppliers /2 coffee music /(-2 4) education vacancies ~ /+1 students
" " Phrase search "little red riding hood" Equivalently: red / +1 riding hood
&&/(nm) Distance in sentences (minus (-) - back, plus (+) - forward) bank && /1 taxes

To obtain top scores search, you need to remember a few simple rules:

    1. Do not search for information on only one keyword.

    2. It's best not to enter keywords in capital letters, as this may result in the same words written in lower case not being found.

    3. If your search did not return any results, check to see if there are keywords x spelling errors.

Modern search engines provide the ability to connect to the generated request of a semantic analyzer. With its help, you can, by entering a word, select documents in which there are derivatives of this word in various cases, tenses, etc.

test questions

1. How are documents searched for by keywords? In a directory system?

Tasks for self-fulfillment

6.8 Practical task. Compare search results for documents by keyword using different search engines (use the integrated Google search engine).

6.9 Practical task. Search the file archive servers for the WinAmp media player file.

What is it

DuckDuckGo is a fairly well-known open source search engine. source code. The servers are located in the USA. In addition to its own robot, the search engine uses the results of other sources: Yahoo, Bing, Wikipedia.

The better

DuckDuckGo positions itself as the ultimate privacy and privacy search. The system does not collect any data about the user, does not store logs (no search history), use cookies maximally limited.

DuckDuckGo does not collect or share personal information from users. This is our privacy policy.

Gabriel Weinberg, founder of DuckDuckGo

Why do you need this

All major search engines try to personalize search results based on data about the person in front of the monitor. This phenomenon is called "filter bubble": the user sees only those results that are consistent with his preferences or that the system considers as such.

Forms an objective picture that does not depend on your past behavior on the Web, and eliminates thematic google ads and "Yandex", based on your requests. With the help of DuckDuckGo, it is easy to search for information in foreign languages, while Google and Yandex prefer Russian-language sites by default, even if the query is entered in another language.

What is it

not Evil is a system that searches the anonymous Tor network. To use it, you need to go to this network, for example, by launching a specialized .

not Evil is not the only search engine of its kind. There is a LOOK (default search in the Tor browser, available from regular internet) or TORCH (one of the oldest search engines on the Tor network) and others. We settled on not Evil because of the unmistakable allusion to Google (just look at the start page).

The better

He is looking for where Google, Yandex and other search engines are denied access in principle.

Why do you need this

There are many resources on the Tor network that cannot be found on the law-abiding Internet. And their number will grow as the control of the authorities over the contents of the Web tightens. Tor is a kind of network within the Web with its social networks, torrent trackers, media, trading platforms, blogs, libraries and so on.

3. YaCy

What is it

YaCy is a decentralized search engine that works on the principle of P2P networks. Each computer that has a primary software module, scans the Internet on its own, that is, it is an analogue of a search robot. The results obtained are collected in a common database, which is used by all YaCy participants.

The better

It is difficult to say here whether this is better or worse, since YaCy is a completely different approach to organizing search. The lack of a single server and owner company makes the results completely independent of anyone's preferences. The autonomy of each node excludes censorship. YaCy is capable of searching the deep web and non-indexed public networks.

Why do you need this

If you are an open source supporter and free internet, not influenced by government agencies and large corporations, then YaCy is your choice. It can also be used to organize searches within a corporate or other autonomous network. And although YaCy is not very useful in everyday life, it is a worthy alternative to Google in terms of the search process.

4. Pipl

What is it

Pipl is a system designed to search for information about a specific person.

The better

The authors of Pipl claim that their specialized algorithms search more efficiently than "regular" search engines. In particular, profiles are prioritized social networks, comments, lists of participants, and various databases where information about people is published, such as databases of court decisions. Pipl's leadership in this area is confirmed by, TechCrunch and other publications.

Why do you need this

If you need to find information about a person living in the US, then Pipl will be much more efficient than Google. Databases of Russian courts, apparently, are inaccessible to the search engine. Therefore, he does not cope so well with the citizens of Russia.

What is it

FindSounds is another specialized search engine. Searches open sources for various sounds: house, nature, cars, people, and so on. The service does not support requests in Russian, but there is an impressive list of Russian-language tags that you can search for.

The better

In the issuance of only sounds and nothing more. In the settings you can set the desired format and sound quality. All found sounds are available for download. There is a pattern search.

Why do you need this

If you need to quickly find the sound of a musket shot, the blow of a sucking woodpecker, or the cry of Homer Simpson, then this service is for you. And we chose this only from the available Russian-language queries. On the English language the spectrum is even wider.

Seriously, a specialized service implies a specialized audience. But will it come in handy for you too?

What is it

Wolfram|Alpha is a computational search engine. Instead of links to articles containing keywords, it gives a ready-made answer to the user's query. For example, if you enter in the search form "compare the population of New York and San Francisco" in English, then Wolfram|Alpha will immediately display tables and graphs with a comparison.

The better

This service is better than others for finding facts and calculating data. Wolfram|Alpha accumulates and systematizes the knowledge available on the Web from various areas including science, culture and entertainment. If this database contains a ready answer to search query, the system shows it, if not, it calculates and displays the result. In this case, the user sees only and nothing more.

Why do you need this

If you are, for example, a student, analyst, journalist, or researcher, you can use Wolfram|Alpha to find and calculate data related to your activities. The service does not understand all requests, but is constantly evolving and becoming smarter.

What is it

Metasearch engine Dogpile displays a combined list of results from search engines. Google SERPs, Yahoo and other popular systems.

The better

First, Dogpile displays fewer ads. Secondly, the service uses a special algorithm to find and display the best results from different search engines. According to the developers of Dogpile, their system generates the most complete issue on the entire Internet.

Why do you need this

If you can't find information on Google or another standard search engine, look it up in several search engines at once using Dogpile.

What is it

BoardReader is a text search system for forums, Q&A services and other communities.

The better

The service allows you to narrow the search field to social sites. Thanks to special filters, you can quickly find posts and comments that match your criteria: language, publication date, and site name.

Why do you need this

BoardReader can be useful for PR specialists and other media professionals who are interested in the opinion of the mass media on certain issues.


The life of alternative search engines is often fleeting. Lifehacker asked the former CEO of the Ukrainian branch of the Yandex company Sergey Petrenko about the long-term prospects for such projects.

Sergey Petrenko

Former CEO of Yandex.Ukraine.

As for the fate of alternative search engines, it is simple: to be very niche projects with a small audience, therefore, without clear commercial prospects, or, conversely, with the complete clarity of their absence.

If you look at the examples in the article, you can see that such search engines either specialize in a narrow but in-demand niche, which, perhaps only so far, has not grown enough to be noticeable on the radars of Google or Yandex, or are testing an original hypothesis in ranking, which is not yet applicable in conventional search.

For example, if a Tor search suddenly turns out to be in demand, that is, at least a percentage of the Google audience will need the results from there, then, of course, ordinary search engines will begin to solve the problem of how to find them and show them to the user. If the behavior of the audience shows that a significant proportion of users in a significant number of queries seem to have more relevant results, data without taking into account factors that depend on the user, then Yandex or Google will begin to give such results.

"To be better" in the context of this article does not mean "to be better at everything". Yes, in many aspects our heroes are far from Yandex (even far from Bing). But each of these services gives the user something that the giants of the search industry cannot offer. Surely you also know similar projects. Share with us - let's discuss.

Searching for information on the Internet

Searching for information on the Internet

To search for information in commonly used three ways(See Fig.1). The first of them - search by address. It is used when the user knows the address of an information resource containing the information he needs. When organizing the search for information by address (the form of the address - IP, domain or URL - in this case does not matter), the user simply needs to enter the address of the resource in the appropriate field of the browser - a program designed to provide access to network resources.

Rice. 1. Ways to search for information in hypertext databases

Second- search using hyperlink navigation. When using this type of search, the user must first access the server associated with the corresponding database. You can then find the document using hyperlinks. Obviously, this method is convenient when the address of the resource is unknown to the user. To be used as a starting point for searching when implementing this method, Web portals are intended - servers that provide direct access to a certain set of servers, including information resources installed on them, as well as Web applications that implement Web services corresponding to the purpose of the portal. The servers accessible through the portal may be specific to a specific system (for example, corporate) or various systems and be specially selected according to the specific, thematic or other features of the documents and data contained on their sites. Typically, portals combine a variety of functions in order to keep the client as long as possible. The dominant service of the portal is the service help desk: search, rubricators, financial indices, weather information, etc. While Web sites are mostly collections of static Web pages, portals are collections of software tools and pre-unstructured information that these tools turn into structured data at the request of specific users.

Third the search method involves the use of Internet search servers. Search servers are dedicated hosts - computers that host databases of Internet resources. User interface such a server has a field for entering keywords that describe the topic of interest to the user (See Fig. 2).

Fig.2. View of the Yandex search server window

The server perceives these words as an information request, in accordance with which it searches for resources and presents a list of found documents to the user. Obviously, when implementing this method, errors of both the 1st (missing the target) and the 2nd kind (information noise) are possible. It should be mentioned that two groups of search servers are distinguished: search engines and subject directories. Their difference is due to the method of creation and subsequent replenishment of the database of Internet resources, which given server carries out information retrieval. Thus, search engines have special program- search robot. It constantly monitors the network, collects information from Web pages, indexes them and fixes their search image in its database. In subject catalogs, a database of Internet documents is formed "manually" by specialist editors. Since there is no single administration on the Internet, its information resources are constantly changing. New documents can appear in it and existing documents can disappear. The frequency of updating information in documents for different sites is different: for some it is several times per hour, for some it is once a day, day, month, etc. Therefore, it is very important to understand that when using information retrieval systems to find information on the Internet, the search is carried out not in the real space of the Web documents, but in some model, the content of which may differ significantly from the actual content of the Internet at the time of the search. According to the degree of coverage of indexed resources, search engines can be divided into two groups: international and Russian-speaking. The former index all documents published on the Internet in a row. The second indexes resources located in domain zones with the predominance of the Russian language. The list of the most popular systems is given in Table. one.

Tab. 1. Most Popular Search Engines

International Russian speakers
Google Yandex (44.4% of Runet)
Yahoo! Rambler (10.6% of Runet)
Bing (7.3% of Runet)
msn Nigma (0.5% Runet)
AltaVista (0.3% Runet)
Ask Aport (0.2% Runet)

Note: Runet is the Russian-speaking part of the Internet, which makes up domains with names ru and rf.

It should be mentioned that there is a special category of search engines - metasearch engines. Their fundamental difference from search engines and subject catalogs is that they do not have their own index database, and therefore, upon receiving a user request, they redirect it to several search servers at once (See Fig. 3).

Rice. 3. The scheme of the metasearch system

The ability to simultaneously use multiple search engines for a single request is an obvious advantage of metasearch engines. At present, the system has found wide application, the interface of which is shown in Fig. 4. This system allows you to use both international and Russian-language search servers to search for resources.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Similar Documents

    Means of information search in the Internet. Basic requirements and methods of information retrieval. Structure and characteristics of search services. Global search engines WWW (World Wide Web). Planning the search and collection of information on the Internet.

    abstract, added 02.11.2010

    Characteristics of methods for searching for information on the Internet, namely, using hypertext links, search engines and special tools. Analysis of new Internet resources. The history of the emergence and description of Western and Russian-language search engines.

    abstract, added 05/12/2010

    Description and classification of modern information retrieval systems. hypertext documents. Overview and ratings of the world's major search engines. Development of an information retrieval system that demonstrates the mechanism for searching for information on the Internet.

    thesis, added 06/16/2015

    Analysis of the capabilities of Yandex and Google search engines, their comparison in terms of usefulness. The history of the creation of search engines, characteristics of their interface, search tools and algorithms. The formation of the question and the criterion for the answer to it.

    abstract, added 05/07/2011

    Consideration of Internet search engines as a software and hardware complex with a web interface that provides the ability to search for information. Search engine types: Archie, Wandex, Aliweb, WebCrawler, AltaVista, Yahoo!, Google, Yandex, Bing and Rambler.

    abstract, added 05/10/2013

    The structure and principles of building the Internet, searching and saving information in it. The history of the emergence and classification of information retrieval systems. The principle of operation and characteristics of search engines Google, Yandex, Rambler, Yahoo. Search by URLs.

    term paper, added 03/29/2013

    The essence and principle of operation of the global Internet. Search for information by parameters in Google system. Specialized information retrieval systems: "KtoTam", "Tagoo", "Truveo", "Kinopoisk", "Catch-Umov". Appropriate use of search engines.

    presentation, added 02/16/2015

    Storing data on the Internet. Hypertext documents, types of files. Graphic files, their types and features. Search engines and information search rules. Survey of Internet search engines. All about search engines Yandex, Google, Rambler.

    term paper, added 03/26/2011

General information.

Currently, the Internet unites hundreds of millions of servers that host billions of different sites and individual files containing various kinds of information. It's a giant repository of information. There are various methods of searching for information on the Internet.

Search by known address. The required addresses are taken from directories. Knowing the address, just enter it in address bar Browser. - server of Russian state authorities.

Address construction by the user. Knowing the Internet address generation system, you can construct addresses when searching for Web sites.

It is necessary to add a thematic or geographical domain to a keyword (the name of a company, enterprise, organization or a simple English noun), and intuition must be connected.

Commercial Web site addresses: (CNN World News), SONY), (MTV music news).

Addresses of educational institutions: (US National University).

Regional server addresses: (Poland), (Israel).

Internet search engines

To search for information on the Internet, special information retrieval systems have been developed. Search engines have a regular address and are displayed as a Web page containing special tools for organizing search (search string, subject catalog, links). To call a search engine, just enter its address in the address bar of the Browser.

According to the method of organizing information, information retrieval systems are divided into two types: classification (rubricators) and dictionary.

Rubricators (classifiers) are search engines that use a hierarchical (tree-like) organization of information. When searching for information, the user looks through thematic headings, gradually narrowing the search field (for example, if you need to find the meaning of a word, then first you need to find a dictionary in the classifier, and then find it in it right word).

Dictionary search systems are powerful automatic software and hardware systems. With their help, information on the Internet is viewed (scanned). Data on the location of this or that information is entered into special reference books-indexes. In response to the request, a search is performed in accordance with the query string. As a result, the user is offered those addresses (URLs) where the searched word or group of words was found at the time of scanning. By selecting any of the proposed links, you can go to the found document. Most modern search engines are mixed.

The most famous and popular search engines:

There are systems that specialize in searching information resources in various directions.

Search for people on the Internet: www.

Search by newsgroups (Usenet):

Subject search engines:

Search software:

Search in file archives:, http://ftpsearch.

Catalogs (thematic collections of links with annotations):

Often, an effective search for information can be carried out using regional catalogs - specialized servers containing data about enterprises or Web resources of a city or region. For example, for St. Petersburg, such a catalog is located at

A list of IPS can be found at www.monk.

A more detailed list of search engines and directories is presented in Table. 3.2.

Query Execution Rules

In each search engine, in the Help section, you can get information on how to search, how to compose a query string. Below is information about a typical, "average" query language.

Simple request.

Enter one word that defines the search topic. For example, in the search engine, it is enough to enter: automation.

Documents are found that contain the words specified in the request. All forms of Russian words are recognized, as a rule, the case of letters is ignored.

You can use the character "*" or "?" in the query. Sign "?" in the keyword, one character is replaced, in place of which any letter can be substituted, and the character "*" is a sequence of characters.

For example, a query automaton* will find documents that include the words automatic, automatic, and so on.

Complex request.

Often there is a need to combine keywords to get more specific information. In this case, additional linking words, functions, operators, symbols, combinations of operators separated by brackets are used.

For example, the query music & (beatles | beatles) means that the user is looking for documents containing the words music and beatles or music and beatles.

Table 3.1 shows the rules for generating requests adopted in the Aport system (

Table 3.1

Operators for Forming Requests

Operator Synonyms Comment
And AND & The query will find documents containing both keywords. It may or may not be written. For example, the query: computer science and textbook is equivalent to computer science textbook
OR OR | Searches for those documents that use either of the specified words or both words at the same time
NOT NOT-~ The search is limited to documents that do not contain the word specified after the operator
" " " " Double or single quotes allow you to find a phrase
date= date:date= The search is limited to documents that fall within the specified date interval. Example 1. currency date=01/02/2002-01/03/2002. This request will return documents containing the word "currency" and having a date between February 1, 2002 and March 1, 2002. Example 2. date=01/03/2002 currency Example 3. date:<02/03/2002 валюта

Table 3.2

List of search servers and directories

Address Description Search engine with node reviews and guides Search server, advanced search capabilities available search server Regional search servers of Poland, Israel Search Server (easy to use) Internet Publik library, a public library operating as part of the World Village project WiseWire - organization of search using artificial intelligence WebCrawler - search server, easy to use Web catalog and interface for accessing full-text search on the AltaVista server Aport - Russian language search server Yandex - Russian-language search server Rambler - Russian-language search server
Internet Help Resources Internet Yellow Pages
monk. Search engines of various profiles Top 200 Websites Catalog of Russian Internet resources htm Educational Resources Russian student server asp Distance Learning Center ac. UK Open University UK US National University Electronic text translator library.html List of links to net libraries Scientific electronic library E-library Psychological tests Internet Education Federation website
www.method. Educational Resources
www.spb. Distance learning on the Internet Exams and tests Computer science textbook
Mega. Encyclopedias and dictionaries

Searching for information on the Internet: pitfalls

Problems that do not lie on the surface often make themselves felt only "in retrospect", after a certain stage of prospecting work has been completed and, perhaps, based on its results, some decision has already been made. What prevents making the situation transparent from the very beginning of the operation of this or that information retrieval system (IPS)? The answer is quite simple: the lack of comprehensive information of this kind on the part of the developer. The direct consequence of this is the unreliability of the received data and their uncontrolled loss. It is rare to find a search engine on the Web that does not have some "undocumented" features. It would seem that the user does not need so much information, namely:

how the IPS database is filled and what is its volume;

full range of possibilities of the search language of the system;

the main features of the presentation of search results, primarily the algorithm for ranking records from the list of responses to a search query.

Alas, the source of such information is usually not a document available from the main page of the search server, but publications of individual authors scattered over the Web, books and computer magazines. The reasons for this state of affairs, apparently, include not only the negligence of the developer, but also a factor called marketing policy. Simply put, providing the search engine with the most complete information about itself does not always have a positive effect on its ranking. Nevertheless, in some cases, the user is quite capable of taking the situation under control. It is often possible to find out the features of the selected search service with the help of testing. Building special test queries that quickly clarify exactly that aspect of the system's operation that is most important for the current task turns out to be non-trivial in many cases. How to avoid some of the troubles when working with IPS, we will devote our discussion. As examples illustrating the presentation, widely known Internet search engines will be considered.