Patent translation describing Panda's algorithm
The details of this algorithm and how it operates to select and punish pages are fully described in patent 8,682,892.
This document is a double translation, on the one hand, into French, on the other hand, into the plain language of the document submitted to USPTO and approved, written in English and in a legal sublanguage, where any pronoun is excluded and, therefore, unnecessarily repeated.

The method described in the patent applies only to pages selected for a search response and designed to change their initial ranking depending on the coefficient of change applied to the site and determined by the ratio.
The calculation of this ratio is to recalculate on the one hand independent links to site documents. on the other hand, to count search queries leading to documents constituting this group, and based on these two numbers, determine the coefficient of change to be applied to the initial score of each document, which depends on other positioning criteria.
The method tends to penalize sites where the bulk of their traffic comes from a search engine because their webmasters are better at SEO. In addition, it supports later pages that are more likely to receive new links.
Google's patent 8,682,892, authored by Navnit Panda and Vladimir Ofitserov, was filed on September 28, 2012 and issued on March 25, 2014.
Definition of terms
Resource: Web page, image, text document, multimedia content presented in search results.
Resource Group: The goal is to assign the same score to a set of related resources, as well as to assess the independence of returning links, and to group resources. This can be a domain or subdomain, or a set of domains belonging to the same owner.
Multiple resource groups: A collection of groups involved in the search (more or less sites) and considered as separate and independent of each other.
Independent links: In order to use links from a page to a resource to assess its usefulness, links from different resource groups of the target resource group are defined as independent.
Coefficient of change: A group is assigned a factor that changes the position of results in the group compared to a positioning indicator depending on relevance and other signals, and the same factor applies to all resources of the group. Thus, all pages of the site are punished with the same factor before posting them.
Standardized coefficient of change. After calculating the change factor for a group (possibly a site), this factor is adjusted according to all groups belonging to the same section.
A help query is a query that results in a resource in a group and one user. To be a reference, it must be made by a user who has not previously searched for other resources from the same group. The user is identified by an identifier or cookie.
These requests are involved in calculating the change rate, so the panda penalty.
Navigation Query: Some search queries are considered by users when searching for a specific site or page. This is determined by data stored in the search engine that identifies this type of query.
Stated
the following:- A method implemented by one or more computers that includes determining
- for each of the plurality of resource groups, a corresponding count of incoming resource references in the group;
- for each of the plurality of resource groups, a corresponding reference request count;
To define
- for each of the plurality of resource groups, a change factor corresponding to the resource groups;
- while the corresponding score is based on counting independent references and counting reference requests for the group;
and associate
- with each in multiple resource groups, change factor for group
- while the corresponding group-specific change changes the initial estimate created for resources in the group in response to received search queries. - This method allows you to receive a request from the user; there is data on multiple resources with their initial assessment of positioning for each resource; identifying a resource group for each; The starting score of each resource is adjusted according to the change factor that is applied to the group of which it is a part. This generates a new score .
- The new score is used to obtain a positioning rating for each resource. The results are presented to the user according to the new score.
- Before presenting the result, further adjustments are made to the rating.
- A resource group is distinguished based on each available resource URL in the index.
- To set up an initial estimate with a change factor, you must determine the change factor for each resource based on the value in the group.
- To correct the initial score of each resource, it is multiplied by the coefficient of change that concerns it.
- When you create a specific change factor for the first search result resource, it is determined whether it is a navigation query. At the same time, the initial score does not change.
- When determining the resource change factor, which is the first search result, it is determined that this is not a navigation query.
- Determine that the initial score of the first search result resource does not exceed the first threshold. And if it exceeds it, it generates a change factor for this resource that does not change the original score.
- When creating a change factor for the second search result resource, it is determined that the initial score exceeds the value of the first factor, but does not exceed the value of the second and higher level. If the T.sub.1 is the first compensator, IS is the initial score, M is the group change factor, the formula for determining the f.sub.1 change factor for the resource is:
f.sub.1 = T.sub.1 + ( IS - T.sub.1) M / IS
- When generating a change factor specific to the third resource of search results: it is determined that the initial score exceeds the value of the second level; and a modifying factor f.sub.2 is generated, such as:
f.sub.2 = f.sub.3 / log.sub.T.sub.2 (IS) g(f.sub.3)
where T.sub.2 - value of the second compensator,
f.sub.3 - initial resource-specific change factor,
and g (f.sub.3) is a mitigation function that reduces the effect of the change factor for specific ranges of values of the initial coefficient of change. - f.sub.3 is evaluated according to the following formula:
f.sub.3 = T.sub.1 + (IS-T.sub.1) M / IS
where T.sub.1 is the first level value, IS is the initial score, M is the change factor for the group . - The mitigation function is defined as:
g(f.sub.3)=1, if f.sub.3.ltoreq.Q and g(f.sub.3) = (1-f.sub.3)/1-P, if f.sub.3 > Q
where Q is the level setpoint. - The method in 1 indicates when the independent link for the resource group is a link from the source resource to the target resource, where the target is included in a specific group and the source and target are defined as independent .
- To determine in 15 that the source and target are independent, it is determined that they are included in different resource groups.
- The 15th method involves determining that the source and target groups have no chance of being in a relationship .
- The method in 15 includes determining that the source resource has no chance of being a duplicate of the target resource . According
- to the method in 1, the reference search for a particular resource group is a previously presented search query that has been classified as belonging to a resource in that group.
- The method at 19 includes determining that the previously presented request includes one or more terms that have been determined to be related to a resource in the group.
- The method in 1, where the change factor for the group is determined, includes: determining an initial change factor for the group, which is a ratio of the number of independent links counted for the group to the number of link requests counted for the group .
- The method of 21 relating to a particular resource group includes: partitioning a plurality of resource groups into a plurality of partitions based on corresponding reference query accounts;
and determining a normalized coefficient of change for a particular group by normalizing its initial coefficient based on the initial coefficient of normalization of the groups in the same partition. - The system includes: determining, for each of the plurality of resource groups, a corresponding count of independent incoming references to group resources. Define for each corresponding reference request count. Determine, for each of the groups, a change factor based on independent link counts and help counters, and associate each resource group with a group-specific change factor so that this factor changes the initial performance of the group's resources.
- Thus, the system according to 23 receives a first search request from a user, obtains data identifying a plurality of search result resources with their initial score for each, Determines resource groups to which each belongs, and adjusts the initial score according to the group change rate to create a new score.
- The system according to 24 uses the obtained new score to classify resources satisfying the request and display results based on these positioning estimates.
- The system, according to 25, makes further adjustments to the position estimates obtained before displaying the results.
- The system of claim 23, wherein the links between the source and the target are identified and the source and target are determined to be independent.
- The system according to 27, wherein it is determined that the source and the target are independent, determines that the source and the target resource belong to different resource groups.
- The system of 27 determines that the source and target groups are not related.
- The system of 27, in which the independence of the groups is determined, determines that the original resource has little chance of being a target duplicate.
- The system according to 23, wherein the reference request for the resource group is a request previously sent for a resource considered to belong to this group.
- The system of claim 31, wherein the reference request is identified by containing a term or more relating to the resource .
- The system of 23, wherein the group change rate is determined, determines an initial change rate, which is a ratio of the number of independent links calculated for that group to the number of reference requests calculated for that group.
- The system according to 33 allocates a plurality of resource groups into a plurality of partitions based on their respective reference request count; and determines a normalized coefficient of change for the group by standardizing the initial coefficient based on the coefficient of the resource groups in the partition to which it belongs.
- (35 to 46 specify that all previously described steps are performed by the computer. Since legal language ignores pronouns, each claim is repeated, indicating that it is realized by the machine...)
Description
BASIS
This specification concerns the classification of search results submitted to an Internet search engine.
The goal of search engines is to identify resources, that is, web pages, images, text documents, multimedia materials that are relevant to the needs of the user, and present information about resources in the most useful way for the user. Internet search engines typically return a set of search results, each identifying a resource, in response to a query sent by the user.
Resume Summary
(The summary reflects the points previously made in the claim.)
The item material described in the specification is implemented to achieve at least one of the following advantages:
- Search results that identify low quality resources can be demoted in the order in which the search results are presented.
- Thus, the user experience can be improved because more classified outcomes better meet their information needs.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 shows an example of a search engine.
FIG. 2 is an exemplary process sequence number for setting the initial resource score identified by the search engine for the received search request.
FIG. 3 is the sequence number of an example process for determining a change factor for a resource group.
FIG. 4 is an exemplary process sequence number for determining standard change factors for groups of resources.
FIG. 5 is a sequence number of an example process for generating a resource-specific change factor.
Such numbers and designations in the various drawings indicate such elements.
DETAILED DESCRIPTION
(The details reflect the points regarding the implementation of the above methodology, which will be of interest only to lawyers during the trial. These points are not reflected in this translation, which is intended for webmasters. But it also gives additional details, which we translate below.)
Figure 1 shows an example of a search engine, number 114. This is an example of a system for retrieving information implemented in a computer program on one or more computers in one or more locations.
In some cases, the search engine 114 may be implemented on the user device 104, for example, if the user installs an application that performs a search for the user device.
A resource group is part of an Internet resource. The group can be diverse. An address-based resource group is defined by the resource URL of the group.
Resources are grouped so that none of them belong to more than one group. For example, a group can include all resources that can be accessed using a domain name. Thus, the group may include http://www.domaine.com/ressource1, http://www.domaine.com/ressource2, etc. regardless of when the resources become available to the search engine for their indexing.
Alternatively, the resource group can include each resource that can be accessed by a specific node name on the form http://hôte.example.com/ressourceX. (NDT: subdomain).
Other address-based groupings are possible. Only a portion of domain or subdomain resources can be used.
Alternatively, the resource group may include multiple domains or subdomains.
2 is the sequence number of the process example (# 200) for setting the initial resource score .
The system receives data identifying the resource and its initial valuation. The initial resource score (before Panda, NDT) may be a measurement of resource relevance to the request, a measurement of resource quality, or both.
The system identifies the resource group based on the address to which the resource (204) belongs based on the URL. The group can share the same domain with our hosting.
It accesses the change factor data (206). The database stores the change factors for all groups.
The site generates a resource-specific change factor (208). Typically, the system can adjust the group change factor based on one or more query parameters to create it.
The system applies a particular rate of change to the original score (210). The resource change factor may be a multiplication factor applied to the initial score to obtain a modified score. (NdT: no other method specified, "may" seems "oriental").
Fig. 3 is a process example sequence number (300) for determining a change factor for a resource group. It is performed for each group as a whole.
The system counts independent links for the group (302). A resource group reference is an incoming reference to the group resource that is the target. Links can include hyperlinks or implicit links. An implicit link is a link to a target resource without a hyperlink that the user cannot follow.
A link is considered independent if the source and target belong to different groups .
The system can access data indicating that resource groups can be associated with each other. Because they belong to the same entity, are hosted by the same entity, or are created by the same entity. At the same time, the system believes that the resources of both groups are not independent.
Another example: The system may have access to data that indicates how the two resources are similar in one or more aspects, so have content, images, format, style sheet, etc. identical or similar. If the data indicates that both resources are sufficiently similar, he concludes that they are not independent.
Perhaps the system calculates an independence score based on the attribute values considered for the resource pair and classifies them as independent if the score satisfies the criterion.
It is possible that the system is associated with no more than every resource in every group that points to the target group. In other cases, if more than one link is found in the resources of one source group to the resources of the target group, the number of independent links counted for the target group may be a function of the total number of independent links. The total number of independent links can be the total number found in resources. It could be the logarithm of that number. Or another function of this number .
A request is considered to refer to a resource in accordance with a recognized term contained in. This term can be a URL or part of a URL. For example, "example.com." As another example, if the data indicates that "example sf" and "esf" are typically used by Internet users to refer to a resource whose URL is "http://www.example.com," then queries containing terms such as "example sf news" and "esf reviews" are considered reference queries for the group that includes the resource whose URL is "http://www.example.com."
A navigation query can also be considered a link to a resource. This is a request to get a specific site or page (instead of a list of results, NdT). The system evaluates it against a database that records queries of this kind.
In some implementations, the system considers only queries sent by unique users as reference queries for a group. So if they haven't already submitted requests for resources from the same group. The system identifies users using conventional means, such as a cookie, identification login. This may apply for a limited period or not.
A change factor can be the ratio of the number of independent links to the group to the number of link requests for the group. So by the formula:
M = IL / RQ
where M is the change factor, IL is the number of independent links, RQ is the number of link requests.
In some implementations, instead of storing the coefficient of change for the group, this factor is normalized and the normalized factor is stored.
Fig. 4 is a process example sequence number (# 400) for determining standard change factors for resource groups.
The system may partition resource groups (402) based on reference request counting so that each partition includes resource groups whose number is within the interval.
To do this, the system can only compare change factors between groups that have a similar number of reference requests.
For each group in the partition, the system normalizes the coefficient of change (404) based on the coefficients of the other groups in the partition. For example, it can calculate a statistical measurement of factors. For example, it may be a central trend such as arithmetic mean, geometric or harmonic, median, dominant value, etc.
Or the statistical measurement may be a minimum or maximum coefficient of change.
Normalized coefficient of LM change for this group in the section can be expressed as:
NM = M - m / m
where M is the coefficient of change for the group and m is the statistical measurement.
Fig. 5 is a process example sequence number (# 500) for generating a resource-specific change factor. Process 500 may be performed on each of the resources in response to a request received from a user.
When you receive a request to navigate to a resource, the system assigns a value to the change factor that does not affect the initial valuation of the corresponding resource .
Otherwise, the system determines whether the initial score is lower than the first level value. If yes, the change factor changes so that the original score does not change.
If the initial score is not lower than the first level, the system determines whether it is lower than the second level. If so, the system determines the first change factor to be applied to the initial estimate.
For example, if the coefficient of change is multiplicative, then the first coefficient f.sub.1 can be expressed as:
f.sub.1 = T.sub.1 + (IS - T.sub.1)M / IS
where T.sub.1 is the first level value, IS is the initial score, and M is the change factor for the resource group. So the factor of change, which falls especially as the initial score rises.
If the initial score is not below the second level, the system generates a second change factor to be applied to it. It may be based on the former.
For example, if FM is multiplicative, the second factor f.sub.2 can be expressed as:
f.sub.2 = f.sub.1 / log.sub.T.sub.2(IS)g(f.sub.1)
where T.sub.2 is the second level value and g (f.sub.1) is the mitigation function reducing the second FM effect on IS for specific first FM intervals.
For example, the mitigation function may be defined such that if the first FM exceeds one level, the second FM, when applied to the initial score, has a reduced effect so as not to affect the value of the initial score.
In some implementations, a mitigation function is defined as a morsified function, such as:
g(f.sub.1)=1, if f.sub.1.ltoreq.Q and g(f.sub.1)=(1-f.sub.1)/1-P, if f.sub.1>Q
where Q is the level setpoint. In these implementations, if the value of log.sub.T.sub.2 (IS) g (f.sub.1) is less than 1, so if f.sub.1 is equal to one and therefore the product is zero, the system can make quef.sub.2 equal to f.sub.1 so that the value of f.sub.2 is not greater than f.sub.1 or f.sub.2 is undefined
(The following paragraphs are intended to include the legal terms necessary to apply the patent, specifying the material on which the method is implemented. Therefore, they are ignored.)
Got feedback
All this can be defined as an improvement to PageRank and we are far enough from the quality criteria that Google gives. These are only backlinks from different sources and searches by different users, forming a ratio to determine the quality of the site! It all boils down to a mathematical formula that leaves room for great approximation and, in particular, declassifies pages containing very accurate answers, but on little-known sites. The popularity of the site is supposed to be a factor of quality and it should be distributed to all pages and "resources" of the site...
Translated and commissioned by Denis Suro on March 31, 2014. Any reproduction is prohibited on another site, but allowed in print .