The training engine trains the ranking machine learning model on training data that includes multiple training examples. As described in this patent, a selection of a result document in response to a search query is selecting a search result identifying the result document from a result list of search results presented in response to the submission of a query by a searcher. The SERPs are the documents that are identified by the search results in the result list. To improve the quality of ranking scores from the ranking machine learning model once trained, the training engine trains in a manner that accounts for the result list of the search result that the searcher selected.
The training engine determines a respective importance value for each training example based on the position in the result list of the search result that the searcher selected in response to the search query in the training example. By training the machine learning model this way, the training engine reduces or eliminates the impact of position bias on ranking scores generated by the ranking machine learning model once the model has been trained.
The system receives position data, which identifies a respective position in the result list for the search query that the searcher selected in response to the search query, i. The system determines a respective selection bias value, which represents a degree to which the position of the selected result document in the result list for the search query in the training example impacted the selection of the result document.
The system determines, for each training example in the training data, a respective importance value. The importance value for a given training example defines how important the training example is in training the ranking machine learning model.
The respective importance value for each training example in the training data can be determined based on the respective selection bias value for the training example. For example, the importance value for a particular training example can be the inverse of the selection bias value for the training example or, more generally, be inversely proportional to the selection bias value for the training example.
The system receives experiment data identifying experiment search queries. For each experiment search query, a respective position in an experiment result list of experiment result documents for the experiment search query of an experiment results in document that a searcher selected.
Thus, the experiment result document that the searcher selected from a given result list was equally likely to be assigned to any of the experiment results list positions. For each of the positions in the experiment result lists, the system determines a respective count of selections of experiment result documents at the position by searchers in response to the experiment search queries in the experimental data.
For example, the system can determine a respective count of selections for the top N positions in experiment result lists, where N is an integer greater than 1, e. For example, where the system receives experiment data including 10 experiment result lists if searchers selected the first position for 7 of the experiment result lists, a second position for 2 of the experiment result lists.
Thus, the third position for 1 of the experiment result lists, the count of selections for the first position can be 7, the count of selections for the second position can be 2, and the count of selections for the third position can be 1.
For each of the positions, the system determines a respective position bias value for the position based on the respective count of selections for the position. The position bias value represents a degree to which the selected experiment result document in the experiment result list for the experiment search query in the experiment data impacted the selection of the experiment result document.
In some implementations, a respective position bias value for each position can be computed by dividing the count of selections at the position by selecting selections at any position of the positions in the experiment result lists. For each training example in the training data, the system assigns the position bias value corresponding to the position of the selected result document in the result list of result documents for the training example to be the selection bias value for the training example.
For example, where the system determines a position bias value b 1 for a first position using the count of selections of experiment result documents at the first position if the first position is the position of a result document that the searcher selected in the result list for the training example in the training data, the system determines that the position bias value 1 is the selection bias value for the training example. In this example, each experiment search query in experiment search queries has been classified as belonging to a respective query class of a predetermined set of query classes.
Then, the system performs the process for each class in the predetermined set of query classes. For a given query class, the system receives experiment data identifying experiment search queries that were classified as belonging to the given query class and, for each of these experiment search queries, a respective position in an experiment result list of result documents for the experiment search query of an experiment result document that a searcher selected.
For the given query class, the system determines, for each of some or all of the positions in the experiment result lists, a respective count of selections of experiment result documents at the position by searchers in response to the experiment search queries belonging to the query class in the experiment data. Thus, for example, the system can determine a respective count of selections for the top N positions in the experiment result lists.
For the given query class, the system determines a respective class-specific position bias value for the position based on the respective count of selections for the position for each of the positions. For each training example in the training data, the system obtains data identifying a query class to which the search query for the training example belongs. The system assigns, for each training example in the training data, the class-specific position bias value for the query class to which the search query belongs and corresponding to the position of the selected result document in the result list of result documents for the training example to be the selection bias value for the training example.
For example, where a search query Q belongs to a query class t and the system determines a class-specific position bias value b i t for a first position using the count of selections of experiment result documents at the first position, if the first position is the position of a result document that the searcher selected in the result list for the training example in the training data, the system determines that the class-specific position bias value b 1 t is the selection bias value for the training example.
For each experiment search query, a respective position in an experiment result list of result documents for the experiment search query of an experiment result document that a searcher selected. The system obtains a respective feature vector for each experiment search query of the experiment search queries.
The feature vectors can be query-specific or searcher-specific. For example, the query features may include the number of words in the query, the class of the query, or the preferred language of the searcher.
The system generates training data for training a classifier. The classifier is trained to receive a respective feature vector for an input search query and output a respective query-specific position bias value for each of the input search query positions. The training data can include positive examples of experiment search queries and negative examples of experiment search queries. For example, the system can label an experiment search query as a positive example for the experiment result list of result documents for the experiment search query of the experiment search result that the searcher selected.
The system can label the experiment search query as a negative example for non-selected positions of the positions in the experiment result list of result documents.
The system trains the classifier on the training data. Training the classifier can be a machine learning process that learns respective weights to apply to each input feature vector. In particular, a classifier is trained using a conventional iterative machine learning training process that determines trained weights for each result list position.
Based on initial weights assigned to each result list position, the iterative process attempts to find optimal weights. For example, the query-specific position bias value b i Q for a given position i for a given search query Q can be given using the following formula:. The classifier may be a logistic regression model.
For each training example in the training data, the system obtains a feature vector for the search query in the training example.
For each training example in the training data, the system processes the feature vector using the trained classifier to generate a respective query-specific position bias value for each of the positions for the search query in the training example. First, the trained classifier receives the feature vector as input. Then, it outputs a respective query-specific position bias value for each position in the result list for the search query in the training example. For each training example in the training data, the system assigns the query-specific position bias value corresponding to the position of the selected result document for the training example in the result list of result documents for the search query to be the selection bias value for the training example.
For example, where the system determines a query-specific position bias value b 1 Q for a first position using the trained classifier if the first position is the position of a result document that the searcher selected in the result list for the training example in the training data, the system determines that the query-specific position bias value b 1 Q is the selection bias value for the training example.
Sharing User Activity Data at Google. Posted Nov 10 by Bill Slawski. Posted Oct 22 by Bill Slawski. This patent is about improving the working of a Contextual Information Recommendations from a Search Engine. Posted Oct 13 by Bill Slawski.
Google has introduced additions to user interfaces on mobile devices, like how Video Collages With Interesting Moments. Posted Sep 30 by Bill Slawski. Photo Collages and Video Collages We may see video collages in hardware associated with Google that generates videos.
The information is obtained and parsed from the rank detail. Just like with the rank detail, to view the ExplainRank page, you need to be the administrator of the Search service application SSA. Rank features work like tuning dials for a ranking model. The following sections describe the rank features that are available in the default SharePoint ranking model and how they contribute to relevance rank calculation. The BM25 rank feature ranks items based on the appearance of query terms in the full-text index.
The input to BM25 can be any of the managed properties in the full-text index. You must map the managed properties used for the BM25 rank feature to the default full-text index in the Choose advanced searchable settings UI.
Within a user query, query terms that are part of the following operators are excluded from relevance rank calculations: NOT??? In addition, query terms that are under scope, for example, title:apple AND body:orange , are excluded from relevance rank calculations. In a custom ranking model, you can have two or more managed properties that are mapped to the same weight group in the search schema. In such cases, content of these managed properties is combined in the full-text index and can't be ranked separately in BM25 calculation.
To prevent this, map managed properties to one of the 16 different weight groups available in the search schema. Weight groups are also known as context. See Influencing the ranking of search results by using the search schema on TechNet for more information about the relationship between a managed property and its context.
The static rank feature ranks items based on numeric managed properties that are stored in the search index. The numeric managed properties used for relevance rank calculation in static rank features must be of type Integer and set to Refinable or Sortable in the search schema.
You can't use multivalued managed properties in combination with the static rank feature. Before the static rank feature can be aggregated with other rank features, each static rank feature is preprocessed via a single transformation. Table 1 lists all supported transform functions.
Table 1. Supported transform functions for the static and proximity rank features. The bucketed static rank feature ranks documents based on their file type and language. The definition of a bucketed static rank feature within a ranking model depends on whether the rank feature is part of a linear model or a neural network. The following examples apply only to linear models. The managed properties used for relevance rank calculation in bucketed static rank features must be of type Integer and set to Refinable or Sortable in the search schema.
You can't use multivalued managed properties in combination with the bucketed static rank feature. Every document has an associated file type that the content processing component detects and stores in the search index as a zero-based integer value.
When you use the bucketed static rank feature to rank documents based on their file types, each document type is associated with a specific relevance rank score. For example, in the following definition, the bucket 2 corresponds to a.
The content processing component detects the language for each document automatically before it's added to the search index. When you use the bucketed static rank feature to rank documents based on their language, you can define how to calculate the rank score based on whether the document's language that was automatically detected matches the query's language.
At query time, information about the user's language is written to the search engine as a query property. The proximity rank feature ranks items depending on the distance between query terms inside the full-text index.
The rank score is boosted if two query terms appear in the same managed properties within the full-text index. Proximity calculations are expensive in terms of disk activity and CPU consumption; as a result, proximity boost is carried out only during the second stage of the default SharePoint rank model if available. You can evaluate the proximity rank feature by using several different options, controlled by attributes described in Table 2. The following example is an excerpt from the default SharePoint rank model.
In this model, the proximity feature is only part of the second stage calculation, which involves a neural network. You must map the managed properties used in proximity rank features to the default full-text index in search schema. The dynamic rank feature ranks an item depending on whether the query property matches a given managed property. If there is a match, the item's rank score is multiplied with a specific value to distinguish that particular item.
The weight attribute is used to control how much this feature affects the overall rank score. The dynamic rank feature is not customizable; it's for internal use only. However, if you install the SharePoint cumulative update of August , the AnchortextComplete rank feature is a customizable dynamic rank feature that is part of the default ranking model. The default SharePoint ranking model doesn't boost the rank of search results based on their freshness. You can achieve this by adding a new static rank feature that combines information from the LastModifiedTime managed property with the DateTimeUtcNow query property, using the freshness transform function.
The freshness transform function is the only transform that you can use for this freshness rank feature, because it converts the age of the item from an internal representation into days. A ranking model consists of various rank features that are considered together to calculate a rank score. A ranking model can have two rank stages.
In the first stage, the ranking model applies relatively inexpensive rank features to get a gross ranking of the results. In the second stage, the ranking model applies additional and more expensive rank features to the items with the highest rank scores.
The SharePoint default ranking model is an example of two-stage ranking model. In this model, the second stage works with the top items with the highest rank score that result from the first stage. When the ranking process in the first stage is complete, the search engine re-sorts all of the items, including the items that were excluded from the second stage.
This usually results in items from the second stage having a lower rank score when compared to items in the first stage. However, to ensure that the search engine re-sorts the items accurately, items from the second stage must have a higher rank score than items from the first stage.
To solve this dilemma, the rank scores of the second stage are boosted. The search engine performs this calculation automatically, based on a combination of rank features. If you install the SharePoint cumulative update of August , the default ranking model uses a linear first stage and a neural network second stage.
We recommend using this model as the base model for your custom ranking model because it is easier to tune a linear model than a model containing a neural network. The neural network defines a nonlinear combination of rank scores from rank features. Currently, SharePoint supports neural networks that are limited to one hidden layer with up to eight neurons.
The overall schema of rank score computation with a two-layer neural network is represented in the following diagram. This diagram doesn't consider the bucketed static rank feature that contributes to neural networks by adding custom values directly into hidden nodes, without any transformation or normalization. In a ranking model, BM25 and static rank features can benefit from precalculations to improve query latency for query terms that frequently occur in items.
This query latency improvement is achieved with the cost of additional indexing, both in terms of disk space used by the search index and CPU consumption.
You should use precalculation only in the first stage of a ranking model. Consequently, if precalculation is enabled, the rank detail of the first stage will not be complete.
To enable precalculation, set the precalcEnabled attribute to 1 in the rank stage definition. You can only use precalculation once in a ranking model. Query properties is a ranking mechanism that populates additional information useful for rank score calculation. For example, query properties can be the time and date when the query was run, which can be used by the freshness rank feature.
0コメント