FreshRank - Google Page Freshness Rating

In patent 7.346.839 of March 18, 2008, Google defines the principles according to which a page is considered obsolete, and when it considers it to be more of a benchmark. Thus, the concept of FreshRank is implicitly detailed .

The distinction is important because in the first case, the page disappears from the first places of the results in favor of pages considered more relevant, and in the second, on the contrary, it sees its position, supported by its experience and not affected by many blog articles on the same topic .

The name of the patent is Retrieval Information based on historical data, which translates as: Information retrieval based on historical data .

When making a decision, according to the patent, the following factors are taken into account:

  1. Date the document was created.
    Or rather, since Google only knows the indexing date, the one when the crawler recognizes a new page.
  2. Updates.
    The frequency and importance of updates are important in order to consider that the document, although old, remains valid.
  3. Verifying requests.
    If a page is more often selected from the results displayed for a query, this increases its score. If it is considered outdated, but non-ammonia, chosen by Internet users, then the situation will be revised.
    If a page is included in an increasing number of different queries, then it is relevant. The opposite suggests that its content is becoming less relevant.
  4. Link-based criteria.
    The timing of the emergence of new and disappearing existing connections is taken into account. If new links appear less frequently, the page is considered obsolete. If the total number of backlinks is gradually decrypted, the output is the same.
    The algorithm weighs the importance of backlinks based on the freshness of the pages that contain them. Which one depends on the same criteria (here in detail) as the page being evaluated, so there is FreshRank, in principle, similar to PageRank .
    Other weighting criteria apply to links:
    - TrustRank
    - A significant and sudden number of backlinks indicate readiness for spam, self-created links, or a circle for promoting a document.
  5. Text from anchors.
    Changing page link bindings means that the page is refreshed and up-to-date. Conversely, if the document does not change link bindings and the pages you specify change, the document is not updated.
  6. Movement.
    Reducing traffic on a web page means that it is out of date. The algorithm takes into account seasonal fluctuations. It considers ads on the page:
    - changing advertising or not.
    - the importance of the site on which such advertising is posted.
    - Number of clicks on these commercials.
    (Note: The patent does not say how this data is collected, but Adsense seems to be a better vector.)
  7. User behavior.
    As mentioned above, this is mainly the number of times a page is selected in the results, but this is also the time that visitors spend on it. If over time, visitors spend less and less time on the page, then its freshness rating decreases.
    Ditto if they spend less time than on other pages on the same topic.
  8. Domain name.
    To counter spammers who create domains to host their pages, Google takes into account the legitimacy of the domain. Prepaid domains for several years are considered more "legitimate," therefore, the expiration date is taken into account for evaluation.
    Frequent change of the hoster (DNS), contacts makes the document be considered illegal. Hosting, which manages many domains and different registries, increases domain legitimacy.
  9. Position history.
    Consecutive positions in the result ranking are taken into account, and a sharp change in position for a specific request indicates spam.
    If the total number of results on request increases dramatically, this will mean a current topic, and the corresponding pages will receive a higher rating for the future .
    If this number increases for one document, the algorithm should divide between spam or the burning topic in question. To do this, he will take into account links to the document in new articles, discussion groups where spam is not supposed.
    But in all this, the exception is reference materials that have long had a good position.
  10. Bookmarks.
    User-managed data is taken into account. Favorites are seen as backlinks, their number, their evolution serves to judge the news of one page.
  11. Unique words, bigrams and phrases in anchors.
    The appearance of a significant number of identical bindings in documents or vice versa, all different bindings in many documents indicate spam. The explosion of these unique words, bigrams, and phrases in anchors is indicative of consent, and thus spam .
  12. Connections without relationships.
    A sharp increase in the number of links between pages to non-report content indicates spam. This is confirmed if an increase in bindings to sequential or contradictory content is added to this.
  13. Topic of documents.
    The subject of a document can be known by:
    - Categorization.
    - URL analysis.
    - Content analysis.
    - Clustering.
    - Create a summary.
    - Unique domain-specific keywords.
    - And others...
    If topics change, pages need to be reviewed. The peak in the number of different subjects indicates the intention to spam .

Conclusion

Google's definition for legacy pages consists of one sentence:

Stale content refers to documents that have been updated for the time period and, thus, contain stale data.

Translation: Impaired content refers to documents that have not been updated over a period of time and thus contain obsolete data.

It can be seen that the specific application of the definition is somewhat more complicated.

However, the basic idea remains simple: A document such as the statement of June 18 will never be outdated, but comments, for example, on the publication of the Olympics will eventually lose interest.
Google's algorithm is responsible for making a difference .

Further information