MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation
[toc]
This repository provides the code for our paper at ACM MM 2023:
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation. Jinpeng Wang, Ziyun Zeng, Yunxiao Wang, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang, Hai-Tao Zheng, Shu-Tao Xia. 📝[Paper]. 🖼️[Poster]. 📺[2-min Video]. 🇨🇳[中文解读 (PaperWeekly)].
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we first design a clustering-based interest discovery algorithm to mine users' interests from their multi-modal behaviors. Then, we build a Transformer-based encoder-decoder model, where the encoder learns to capture personalization cues from interest tokens while the decoder is developed to grasp item-modality-interest relations for better sequence representation. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation. We pre-train the model with contrastive learning objectives and fine-tune it in an efficient manner. Experiments demonstrate the effectiveness and flexibility of MISSRec, indicating a practical solution for real-world recommendation scenarios.
In the following, we will guide you how to use this repository step by step. 🤗
git clone https://github.com/gimpong/MM23-MISSRec.git
cd MM23-MISSRec/
- cuda 11.7
- python 3.7.8
- pytorch 1.13.1
- numpy 1.21.6
- cupy 11.6.0
- tqdm 4.64.1
Before running the code, we need to make sure that everything needed is ready. The working directory is expected to be organized as below:
MM23-MISSRec/
- misc/
- data/
- reference_log/
- props/
- recbole/
- torchpq/
- saved/
- MISSRec-FHCKM_mm_full-10.pth
- MISSRec-FHCKM_mm_full-20.pth
- ...
- MISSRec-FHCKM_mm_full-100.pth
- datasets/
- pretrain/
- FHCKM_mm_full/
- downstream/
- Scientific_mm_subset/
- Scientific_mm_full/
- Pantry_mm_subset/
- Pantry_mm_full/
- Office_mm_subset/
- Office_mm_full/
- Instruments_mm_subset/
- Instruments_mm_full/
- Arts_mm_subset/
- Arts_mm_full/
- Arts_mm_full.feat1CLS
- Arts_mm_full.feat3CLS
- Arts_mm_full.text
- Arts_mm_full.item2index
- Arts_mm_full.user2index
- Arts_mm_full.test.inter
- Arts_mm_full.train.inter
- Arts_mm_full.valid.inter
- scripts/
- run01.sh
- run02.sh
- ...
- cluster_utils.py
- config.py
- ddp_finetune.py
- ddp_pretrain.py
- finetune.py
- missrec.py
- model_utils.py
- trainer.py
- utils.py
-
The pre-processed dataset with extracted features can be downloaded from Google Drive. For each sub-dataset (e.g., Arts_mm_full), text and image features are saved in files named with suffixes ".feat1CLS" and ".feat3CLS", respectively, e.g.,
Arts_mm_full.feat1CLS
andArts_mm_full.feat3CLS
. "subset" means the filtered subset of "full" that removes the items with incomplete modalities and only retains the full-modality items. -
Customized feature extraction: We use the pre-trained CLIP-ViT-B/32 as the feature extractor for texts and images. You may want to use other feature extractors for the raw data. The raw text information can be obtained from the review data of the Amazon dataset. For the raw images, you can either crawl them according to URLs or download the version we crawled via Baidu Cloud (password: 791e).
-
Customized datasets: First, pre-process the user-item interaction data according to the instructions. Then you may use the pre-trained CLIP-ViT-B/32 to extract multi-modal item features.
-
saved/MISSRec-FHCKM_mm_full-*0.pth
are checkpoint files, which will be generated during the pre-training (See below).
To pre-train the model for 100 epochs, run the following command in a multi-GPU environment:
# an example: pre-training on 4 GPUs
CUDA_VISIBLE_DEVICES="0,1,2,3" python ddp_pretrain.py
We have provided pre-trained checkpoints on Google Drive.
For ease of usage, we provide the scripts with configurations for each experiment. These scripts can be found under the scripts/
folder. For example, if you want to fine-tune the pre-trained checkpoint on the Scientific dataset, you can do
cd scripts/
# '0' is the id of GPU
bash run01.sh 0
The script run01.sh
includes the running commands:
#!/bin/bash
cd ..
CUDA_VISIBLE_DEVICES=$1 python finetune.py \
-d Scientific_mm_full \
-mode transductive
cd -
Script | Dataset | With ID? | Pre-trained? | Log | R@10 | N@10 | R@50 | N@50 |
---|---|---|---|---|---|---|---|---|
run01.sh | Scientific | ✓ | ✗ | log01 | 0.1282 | 0.0711 | 0.2376 | 0.0946 |
run02.sh | ✓ | log02 | 0.136 | 0.0753 | 0.2431 | 0.0983 | ||
run03.sh | ✗ | ✗ | log03 | 0.1269 | 0.0659 | 0.2354 | 0.0891 | |
run04.sh | ✓ | log04 | 0.1278 | 0.0658 | 0.2375 | 0.0893 | ||
run05.sh | Pantry | ✓ | ✗ | log05 | 0.0771 | 0.0363 | 0.1804 | 0.0583 |
run06.sh | ✓ | log06 | 0.0779 | 0.0365 | 0.1875 | 0.0598 | ||
run07.sh | ✗ | ✗ | log07 | 0.0715 | 0.0337 | 0.1801 | 0.0569 | |
run08.sh | ✓ | log08 | 0.0771 | 0.0345 | 0.1833 | 0.0571 | ||
run09.sh | Instruments | ✓ | ✗ | log09 | 0.1292 | 0.0842 | 0.2369 | 0.1072 |
run10.sh | ✓ | log10 | 0.13 | 0.0843 | 0.237 | 0.1071 | ||
run11.sh | ✗ | ✗ | log11 | 0.1207 | 0.0771 | 0.2191 | 0.0981 | |
run12.sh | ✓ | log12 | 0.1201 | 0.0771 | 0.2218 | 0.0988 | ||
run13.sh | Arts | ✓ | ✗ | log13 | 0.1279 | 0.0744 | 0.2387 | 0.0982 |
run14.sh | ✓ | log14 | 0.1314 | 0.0767 | 0.241 | 0.1002 | ||
run15.sh | ✗ | ✗ | log15 | 0.1107 | 0.0641 | 0.2093 | 0.0853 | |
run16.sh | ✓ | log16 | 0.1119 | 0.0625 | 0.21 | 0.0836 | ||
run17.sh | Office | ✓ | ✗ | log17 | 0.1269 | 0.0848 | 0.2001 | 0.1005 |
run18.sh | ✓ | log18 | 0.1275 | 0.0856 | 0.2005 | 0.1012 | ||
run19.sh | ✗ | ✗ | log19 | 0.1072 | 0.0694 | 0.1726 | 0.0834 | |
run20.sh | ✓ | log20 | 0.1038 | 0.0666 | 0.1701 | 0.0808 |
If you find this code useful or use the toolkit in your work, please consider citing:
@inproceedings{wang23missrec,
title={MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation},
author={Jinpeng Wang and Ziyun Zeng and Yunxiao Wang and Yuting Wang and Xingyu Lu and Tianxiang Li and Jun Yuan and Rui Zhang and Haitao Zheng and Shu-Tao Xia},
booktitle = {Proceedings of the 31st ACM International Conference on Multimedia},
year={2023}
}
Our code is based on the implementation of UniSRec and TorchPQ.
If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.