Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task] Simplify Top-k recommender API #622

Closed
5 tasks
sararb opened this issue Aug 4, 2022 · 2 comments
Closed
5 tasks

[Task] Simplify Top-k recommender API #622

sararb opened this issue Aug 4, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request P0

Comments

@sararb
Copy link
Contributor

sararb commented Aug 4, 2022

Problem

Goals

  • Decouple the top-k local prediction top-k evaluation from the retrieval contrastive learning task.
  • Convert retrieval models to a top-k recommender model : Matrix Factorization, Two-Tower, and YoutubeDNN
  • Ensure the Keras analogy for the top-k recommender where the user can call .predict, .evaluate, .save, and load the model.
  • ItemRecommender should support different top-k strategies: Brute-force || Streaming || ANN or any user-specific top-k strategy.
  • Ensure retrieval experiments and CI performance tests are returning same level of performance with the new Retrieval API

Starting Point:

  • Definition: The Top-k recommender is a model with : Query encoder + Top-k layer

  • Prerequisite of the top-k recommender:

    • Predict method: returns top-k items (scores and ids) for a given query (user)
    • Evaluate method: compute ranking metrics for a dataset of users/queries
    • batch_predict: return a dataset with top-k items for a dataset of users/queries
    • save: The top-k model is the 'useful' part of the retrieval pipeline as it is the one that generates the prediction for the external endpoint. The user needs to save this model and reload it later for evaluation or local prediction
      • Supporting different top-k strategies
  • Arguments of the top-k layer:

    • A cut-off k
    • The dataset of candidates: pre-trained item embeddings
    • The method index_from_dataset: to set the index for the top-k search
    • The method score:  the distance metric to use for computing the score between the query and the item embeddings. (default: dot-product)
    • Call method:
      • Takes as input the query embeddings
      • Define the logic of how to retrieve the top-k items : * The scope of this first work is "Brute-Force"

Open questions:

  • Do we need to re-train the new recommender model with the pre-trained item embeddings? ==> e.g., convert a two-tower model to a youtube-dnn like model. ==> This can be done outside of the top-k recommender class. We should simplify the top-k recommender for supporting different top-k strategies

  • Should we define the Top-k layer as a sub-class of the CategoricalPrediction block?

  • implementation starting points:
    --> main...tf/retrieval-models
    --> https://github.com/NVIDIA-Merlin/models/pull/663/files

@sararb sararb changed the title [FEA] Simplify Top-k recommender API [Task] Simplify Top-k recommender API Aug 10, 2022
@sararb sararb self-assigned this Aug 10, 2022
@gabrielspmoreira
Copy link
Member

When this refactory is done, we should retest #339 to check if the slowness building the top-k index still persists.

@sararb
Copy link
Contributor Author

sararb commented Sep 6, 2022

Closing this issue as it is a duplicate of a new task tracked in the session-based roadmap ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants