Confusion about item_id_list read order and padding #1326

gcbrown · 2022-06-22T22:00:26Z

gcbrown
Jun 22, 2022

Hello,

I'm still working with FPMC just to learn the library. I'm having some trouble understanding how the item_id_lists are generated and used. I understand with FPMC that only the most recent single item is used, so either the first or last item in the list is used.

For a simple example with items [1,2,3,4,5] purchased in order, we have
dataset['item_id'][3] = 5
dataset['item_id_list'][3] = tensor([1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

The truth is that item #4 was purchased directly before item #5. But the padding is confusing me. How will the model be able to figure out #4 is the last item purchased without some specific logic for the padding? Because if it thinks 1 or 0 is the most recent purchase this is incorrect. It seems to me like it should either be 0....0, 1,2,3,4 or the reverse, 4,3,2,1,0....0, because 4 was the most recent item before 5.

Edit: based on the code in forward() and full_sort_predict(), it looks like the model uses the item_seq_len to simply grab the last item in the sequence. So the dataset['item_length'] field is used to choose the last item in the item_id_list above. I guess now I'm just unsure why the padding is done if the item_length is stored anyway. Whereas if you threw the 1,2,3,4 at the end with the padding before it, you could always count on the end of the sequence being the true end.

Answered by Wicknight

Jun 24, 2022

@gcbrown
First, all sequential recommendation models adopt various ways to model sequential data. Just like the processing of sequential data in NLP, padding makes it convenient for us to process variable length sequence. Therefore, padding is necessary.

Second, for the padding position, I think our padding method is relatively general and conventional. After padding, the earliest interacted items are located in index 0, while the most recently interacted items are located in index [item_seq_len - 1]. This is consistent with our general understanding, and is convenient for other models to use.

Third, you can modify the code if you want to change the padding logic. You are welcome to commu…

View full answer

Wicknight · 2022-06-24T04:49:47Z

Wicknight
Jun 24, 2022
Collaborator

@gcbrown
First, all sequential recommendation models adopt various ways to model sequential data. Just like the processing of sequential data in NLP, padding makes it convenient for us to process variable length sequence. Therefore, padding is necessary.

Second, for the padding position, I think our padding method is relatively general and conventional. After padding, the earliest interacted items are located in index 0, while the most recently interacted items are located in index [item_seq_len - 1]. This is consistent with our general understanding, and is convenient for other models to use.

Third, you can modify the code if you want to change the padding logic. You are welcome to communicate the results with us if you want to conduct further experiments.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about item_id_list read order and padding #1326

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Confusion about item_id_list read order and padding #1326

gcbrown Jun 22, 2022

Replies: 1 comment

Wicknight Jun 24, 2022 Collaborator

gcbrown
Jun 22, 2022

Wicknight
Jun 24, 2022
Collaborator