How Google works on its algorithm

Google, the world's most visited website, fine-tunes its algorithm for ranking pages in search results. It is interesting to note, based on an article by Saul Hansell in the New York Times, that PageRank, the popularity index This well-known algorithm is just one of many evaluation criteria, and on the other hand, the actual algorithm is not something fixed, but rather Google teams are constantly working to analyze the results to correct it and control the page rating.
This will explain to webmasters why their site sometimes jumps ahead in the list of results and why it sometimes disappears into the depths of the rating for no reason, regardless of the so-called sandbox penalty.
The journalist had the opportunity to spend a day with Google engineers directly involved in the development of the algorithm in their working environment and participate in their working meeting.
We follow the indicators
The team's work is motivated by complaints from companies whose sites are poorly ranked for no reason and their own analysis of the results. It should be noted that each of the 10,000 employees has a "boogeyman," a tool for reporting problems that arise during the search, and that all comments are sent to the algorithm team .
For example, we noted that the search for the "French Revolution" led to election articles, because the candidates talked about the "revolution"! The amendment in this case was simply to give more weight to the terms "French Revolution" or "French Revolution" when the terms are used together.
The tools we use
The team has a special tool called Debug that shows how computers rate each request and each web page. This allows us to see what importance the algorithm ascribes to links on the page and correct them if necessary.
As soon as the problem is identified, a new mathematical formula is developed for processing a specific case and included in the algorithm.
Betting on models
In addition to PageRank and other signals, the algorithm uses several models.
- Language models: ability to understand sentences, synonyms, accents, spelling errors, etc.
- Query patterns: This is not only the language, but also how it is used today.
- Time patterns: Some pages respond better when they've been around for 30 minutes, and others when they've stood the test of time.
- Custom templates: not all people look for the same things (with the same words, Ed.).
(See links below).
Novelty is adilemma
The most important question for the development team is the question of freshness. Should we prioritize newer pages that probably better reflect current events, or, conversely, older ones that have already demonstrated their quality, especially through the number of backlinks?
Google has always preferred the latter, but recently we realized that this is not always the right choice, so we had to develop a new algorithm that determines when a user needs new information and when it should be stable the other way around. This is called the QDF formula, "Request deserves freshness."
We can determine that a topic is hot when blogs start talking about it, or when there is a sudden influx of queries on a topic.
Code snippets must be created
The group is working on the fragments. This includes improving the presentation of results by extracting site information and displaying it to inform users of the site before they click on the link.
Maintaining a giant index
Google has hundreds of thousands of computers toindex billions of pages of all websites in the world... The goal - regardless of the constant addition of new pages - is to be able to update the entire index in a few days!
It is important to know that data centers store copies of all web pages so that they can be accessed more quickly.
Add new signals with PageRank
PageRank, developed in the early days of the company by Larry Page and Sergey Brin, is an indicator corresponding to the number of links on the page, a guarantee of its quality. But it is outdated in many ways. Google now uses 200 criteria, which it calls "signals." It depends both on the content of the page and on its evolution, requests, behavior of visitors... but all this is described in detail in the patent PageRank and Sandox.
Along with signals on pages and their history, Google uses classifiers on queries, the purpose of which is to restore the context of the search, the framework in which it is placed. For example, do we want to look for an item to buy or find out something?
The most famous element of our ranking is PageRank, an algorithm developed by Larry Page and Sergey Brin, who founded Google. PageRank is still in use today, but is now part of a larger system.
The post that is the source of this quote (see below) tells us that PageRank was changed in January 2008 so it is not immutable!
Drive for diversity in outcomes
Once the pages are selected and ranked, some should occupy the top ten positions, the most profitable, but that's not all. Google wants to add variety From another point of view, for example, blogs and commercial sites, pages with a lower rating will also be added to the top of the rating, with the first in each category being promoted.
We always improve the algorithm
Some groups are working to improve the algorithm, while others are working to evaluate the results. The quality of the algorithm's responses is evaluated in real time to test the relevance of the responses, especially with control for improvements once they are made. The task of statisticians is to measure the quality of results.
One group is devoted to spam and all types of abuse, such as hidden text. This "webspam" group, we learn, works in conjunction with the Google Webmaster Central group, which provides help and tools to webmasters.
Andit doesn't say everything yet...
Google's methods seem quite academic, with its signals and classifiers, compared to competitors like Microsoft, which uses neural networks. But we don't know everything. Google still keeps a lot of secrets, not wanting to reveal all its techniques to competitors.
References: Official Google Blog. (Eng.). Some of the information above comes from a New York Times op-ed by Saul Hansell.