Search Engine Anatomy: Google Infrastructure

The design is based on a description provided by Google founders Lawrence Page and Sergey Brin.

Since 1998, this document has been simplified, and in particular, the PageRank module must be designed in several units, taking into account various criteria, including user behavior.

Search Engine Architecture

Anatomie du moteur de recherche de Google - Réutilisation interdite
(c) 2010 .com/.fr - No reuse on web page

Components

Crawlers	There are several of them, they distribute them, they say pages, find links and keywords in them .
URL Server	Contains a list of URLs to scan.
Bank server	Crowler sends the collected data to the bank's server. It compresses the pages and places them in this view in the vault. Each saved page has an ID, docID.
Bank	Contains a copy of pages and images, allows comparisons and caching.
Indexer	It indexes pages to provide their SERPs (results). It unpacks documents and converts them into a set of words called "hits." He hands out hits among the barrel ensemble. This gives a partially sorted index. It also creates a list of URLs on the page. Hit contains the following information: word, its position in the document, font size, capitalization.
Barrels	These figurative barrels are databases that classify documents by docID. They are created by the indexer and used by the sorter.
Anchors	The reference file created by the indexer contains internal references and the text associated with each reference.
URL solver	It contains the contents of the binding file, converts relative URLs to absolute addresses, and finds or creates a docID. It creates a document index and a relationship database.
Document Index	It contains text for each URL.
Communications	The database of links associates each of them with a docID (therefore, also with a document on the Internet).
PageRank	The program uses a reference base to determine the PageRank of each page.
Sorter	It interacts with barrels. It plays documents classified by docID and creates a reverse list classified by wordID.
Lexicon	Software called DumpLexicon takes the list provided by the sorter (classified wordID) and also repeats the lexicon created by the indexer (list of keywords on each page) and produces a new lexicon for the researcher.
Researcher	It runs on a web server in a data center, uses a lexicon produced by DumpLexicon in conjunction with an index classified by wordID, accounts for PageRank, and creates a results page.

It goes without saying that this scheme is simplified. The system tends to become increasingly complex with added time and features.

Links

Anatomy of a wide scale hypertextual web search engine. Larry Page and Sergey Brin.
This document in English, posted by Stanford, probably dates back to 1998 and describes in detail the operation of the search engine. This document is his resume and contains the new .com/.fr.

How search works
The explanation of the Google search engine is very developed. (English).

Icons are either created for Script or taken from a bank of free-use icons.