Related pins at pinterest the evolution of a real-world recommender system

7 minute read


This is summary and what I learned from the paper “Related pins at pinterest the evolution of a real-world recommender system”

0. Introduction

Pinterest is a visual discovery tool for saving and discovering content. Users save content they find on the Web as pins and create collections of these pins on boards. Related Pins leverages this human-curated content to provide personalized recommendations of pins based on a given query pin. In this paper, will describe how “Related pins system” has evolved from original recommendation system which was consisted of a simple candidate generator from many heuristic rules. How Memboost and Ranking component has added and how they evolved the performance.

1. System Overview

As described in introduction, the system comprises 3 major components : Candidate generation, Memboost and Ranking. And each components has evolved separately over time. Candidate generation : Narrow the candidate set from billions to roughly 1,000 pins that are likely related to the query pin. Memboost : Memorize the past engagement on specific query and result fairs Ranking : Maximize target engagement with machine learning algorithm. It uses a combination of features like query, candidate pins, user profile, session context and Memboost signals and train the system with past user engagement.

From now, will describe how each layer has evovled.

1-1) Candidates

The original system was consisted of just one candidate generation model so the output of candidates were shown directly to the users. As Memboost and Ranking layer has added, presision and recall problem has arised from candidate layer, so it makes the system to be added new candidate sources.

Candidate generator is based on the user-curated graph of boards and pins and the method has evolved to produce more relevant results and to cover more query pins. Let’s look over how the method evolved.

Heuristic to Onlince Ramdom Walk

At first time, Related Pins were mapped over the set of pairs of pins that occurred on the same board in offline. Because of the huge number of pairs, the pairs were randomly sampled, then added a heuristic socre based on category matching.

With higher co-occurrence, the relevance of candidates qualitatively increased. However, because candidate was based on offline heuristic score, it was hard to maximize board co-occurrence. To improve these weakness, moved to generate at serving time through an online traveral of the board-to-pin graph. So now it is generated by a random walk service called Pixie, which conducts random walks on the graph starting from the query pin with a reset probability at each step of walk. So it can leverage board co-occurrence because highly connected pins are more likely to be visited by random walk and also increases candidate coverage for rare pins, since it can retrieve pins that are several hops away from the query.

Session Co-occurrence

Even though board co-occurrence offers good recall when generating candidates, it has too broad / narrow shortcomings so tries to solve by incorporating the temporal dimension of user behavior like session co-occurrence. The method is Pin2Vec which is a learned embedding of N most popular pins in d-demensional space to minimize the distance between same session saved pins as the picture below. It captures a large amount of user behavior in a compact vector representation.

Search based candidates & Visually similar candidates

To prevent cold start problem (rare pins do not have a lot of candidates) and to expand candidate sets for diversity of results, started to leverage with search based and visually similar candidates. For search based candidates, used query pin’s annotations with text-based search. Those search based candidates tend to be less relevant but offer a nice trade-off from an exploration perspective, in terms of more divesity in set of pins. For visually similar candidates, if a query image is a near-duplicate adds the related pins recommendations, if no near-duplicate is identified, uses the visual search backend to return visually similar images based on a nearest-neighbor lookup of the query.

Segmented Candidates

To solve the content activation problem (rare pins do not show up as candidates), especially for international contents, generated additional candidate sets segmented by locale. And this kind of method can be expanded to other category like gender-specific category, etc.

1-2) Memboost

Memboost model is built to memorize the best result pins for each query. At first, used historical CTR, but as log data is subject to a stroing position bias (shown in earlier positions are more likely to be clicked on), instead chose to compute COEC (Clicks Over Expected Clicks). Expected clicks can have below formula \(Eclicks(q,r)={\sum_{p}\sum_{k}i_{p,k}(q,r)Clickrate_{p,k}}\)

while \(clicks(q,r)\) be the total number of clicks received by result pin, \($r\)$ on query pin \(q\), and \(i_{p,k}(q,r)\) is the number of impressions it received on platform \(p\) and rank \(k\).

Simply, expected clicks of query pin \(q\) and result pin \(r\) is the summation of multiply value of impressions and global click rate from each platform and rank.

Extend this definition to other actions like longclick, closeup and save with weighting, then \(actions(q,r)/Eactions(q,r)\) is simmilar to COEC with below equation.

\(actions(q,r)=\beta_{1}clicks(q,r)+\beta_{2}longclicks(q,r)+\beta_{3}closeups(q,r)+\beta_{4}saves(q,r)\) \(Eactions(q,r)=\beta_{1}Eclicks(q,r)+\beta_{2}Elongclicks(q,r)+\beta_{3}Ecloseups(q,r)+\beta_{4}Esaves(q,r)\)

And by using logarithm to get a zero-centered score and smoothing \($\alpha\)$, Memboost score is as below.


With continuous adjustments of existing scores, the final results are output with below score. \(MemboostedScore(q,r)=Score(q,r)+\gamma MB(q,r)\)

Until recently weighed with hand-tuned variables \(\beta_1,...,\beta_4,\gamma\), but it produces an undesirable coupling to the scoring, so to remove this coupling, now retrain Memboost parameters when changing the model.

And in case of changes in candidate generator and ranking model, candidate is no longer present in the results. To handle this, there is Memboost insertion algorithm that re-inserts the top n results if they are not in the result set.

1-3) Ranking

It has evolved from just using the pins’ raw data to using Memboost and user dataand personalized user activities. In building the ranking system, there are 3 large decision as below and have experimented to improve ranking.

  • Training Data Collection : Memboost scores or individual related pins
  • Model Objective : Classification (pointwise) or Ranking (pairwise)
  • Model formulation : Used 2 types, Linear and Gradient-Boosted decision tree

During improving the ranking system, one of major challenge is that as time flies, the training data only reflected the pins that were highly ranked by existing model. To prevent this so called “previous-model bias”, allocate a small percentage of traffic for unbiased collection. Although the volume of training data becomes limited to this small percentage of overall traffic, the resulting models perform better than models trained with biased data.

2. Challenges and Conclusion


  1. Improving some component may result in worse overall performance, the general soultion is to jointly train/automate as much as possible for each experiment.
  2. Balancing fresh and dark (high quality / rarely surfaced) content with well-established and high quality content (Classic explore vs. exploit problem).
  3. Recommendation of local pins for global users is hard because of lack of local contents and historical engagement.

3. Conclusion

  1. Organic system growth leads to complexity and change of part can mess the everything.
  2. To solve “cold-start” and “rich get richer” problem, exploit engagement could be good option.
  3. Rather than offline calculation, real-time system can increase velocity of experimentation and responsiveness of the results.