Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paged attention #1460

Closed
wants to merge 42 commits into from
Closed

Paged attention #1460

wants to merge 42 commits into from

Conversation

Bob-Chen222
Copy link

@Bob-Chen222 Bob-Chen222 commented Aug 8, 2024

Description of changes:
This PR adds page manager into the specscheduler branch.

Specifically, page manager is responsible for the allocation and deallocation of physical pages (GPU memory) that are used to store tokens' key value cache.

This version of page manager is many commits behind the newest version of main branch (i.e. specscheduler branch) and I will try to merge main branch into this branch in the next few days and do some further testing. It would be good if you can take a look and see if you have any questions on the current design and implementation.


This change is Reviewable

@Bob-Chen222 Bob-Chen222 self-assigned this Aug 8, 2024
@Bob-Chen222 Bob-Chen222 marked this pull request as draft August 8, 2024 00:42
@jiazhihao jiazhihao added the inference Features and fixes related to the inference project. label Aug 22, 2024
@Bob-Chen222 Bob-Chen222 marked this pull request as ready for review September 25, 2024 04:57
@Bob-Chen222 Bob-Chen222 deleted the paged_attention branch October 31, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inference Features and fixes related to the inference project.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants