Roadmap #3

ctlllll · 2023-09-12T01:49:06Z

JianbangZ · 2023-09-12T18:52:53Z

Looks like a promising roadmap. I think llama.cpp support should be held a higher priority

Kimiko-AI · 2023-09-13T15:45:32Z

Agree, that faster t/s is really important for llamacpp users.

yhyu13 · 2023-09-13T16:40:08Z

Would love to you Medusa be as a plugin of ooba's textgen webui for medusa head models

yhyu13 · 2023-09-13T16:51:30Z

Would Medusa compatible with GPTQ quantized models?

Specifically, two Medusa heads fine-tuned on unquantized and quantized model, would they be the same? Or can they be swapped？

ctlllll · 2023-09-13T18:04:27Z

Would Medusa compatible with GPTQ quantized models?

Specifically, two Medusa heads fine-tuned on unquantized and quantized model, would they be the same? Or can they be swapped？

We didn't try this, but we can make an analogy to the 33B model we trained with bitsandbytes's 8-bit quantized base model, where the difference seems to be minor. Yet, more investigation is needed :)

AIApprentice101 · 2023-09-18T12:32:14Z

Please consider supporting quantized models, like GPTQ, AWQ, etc

ctlllll · 2023-09-18T15:28:09Z

Please consider supporting quantized models, like GPTQ, AWQ, etc

Thanks for the suggestion. Those models should be easily integrated just by loading the base model in those formats. We are trying to integrate Medusa into frameworks that the speed actually benefits from quantization, e.g., mlc-llm, llama.cpp.

JianbangZ · 2023-09-18T15:34:15Z

Please consider supporting quantized models, like GPTQ, AWQ, etc

Thanks for the suggestion. Those models should be easily integrated just by loading the base model in those formats. We are trying to integrate Medusa into frameworks that the speed actually benefits from quantization, e.g., mlc-llm, llama.cpp.

Exciting. Is there a timeline for llama.cpp support? your best guess?

ctlllll · 2023-09-18T16:17:15Z

Please consider supporting quantized models, like GPTQ, AWQ, etc

Thanks for the suggestion. Those models should be easily integrated just by loading the base model in those formats. We are trying to integrate Medusa into frameworks that the speed actually benefits from quantization, e.g., mlc-llm, llama.cpp.

Exciting. Is there a timeline for llama.cpp support? your best guess?

We'll start with MLC-LLM first as it's more user-friendly for integration. For llama.cpp, we currently don't have the bandwidth to do it and it would be greatly appreciated if there were volunteers who could help us with it :)

ctlllll · 2023-09-18T23:15:20Z

🎉 Exciting News! 🎉

We are thrilled to announce that we have received an award from Chai Research! While the monetary value may not be substantial, we are dedicating it as a token of our appreciation for the invaluable contributions made by our community. The funds will be allocated as development bounties to incentivize the achievement of key milestones.

🏆 First Bounty: Porting Medusa to Llama.cpp #35 🏆
Bounty Amount: $100

feifeibear · 2023-09-20T09:20:02Z

Hello @ctlllll , Thanks for providing such a wonderful project. I am interested in the part of Fine-grained KV cache management. Could you offer me more guidance on this?

I have been working on a demo for SpeculativeSampling for a while.

https://github.com/feifeibear/LLMSpeculativeSampling

ctlllll · 2023-09-20T17:16:46Z

Hello @ctlllll , Thanks for providing such a wonderful project. I am interested in the part of Fine-grained KV cache management. Could you offer me more guidance on this?

I have been working on a demo for SpeculativeSampling for a while.

https://github.com/feifeibear/LLMSpeculativeSampling

Hi @feifeibear , thanks for your interest! In the current version, we implemented a pre-allocated KV cache with the philosophy of keeping the original HF APIs and only for reducing the memory movement cost when updating KV cache. I think to be more dynamic, the PagedAttention mechanism in vllm might be a better reference :)

nikshepsvn · 2023-11-21T21:39:05Z

Hey all, any updates on this?

ctlllll · 2023-11-21T23:13:41Z

Hey all, any updates on this?

We have some exciting stuff baking now. Let's wait and see :p

nivibilla · 2024-01-22T07:55:07Z

Hi, could sglang be placed on the roadmap too? It's a recent release also from lmsys who made vllm. But it's faster.

https://github.com/sgl-project/sglang

ctlllll added the documentation Improvements or additions to documentation label Sep 12, 2023

ctlllll pinned this issue Sep 12, 2023

yhyu13 mentioned this issue Sep 13, 2023

[Feat Request] Integrate Medusa Head for faster inference for models with available Medusa Head oobabooga/text-generation-webui#3906

Closed

ctlllll mentioned this issue Sep 13, 2023

Add support for LoRA finetuning for LLaMA-65B #7

Closed

ctlllll mentioned this issue Sep 17, 2023

gguf #24

Closed

Ryu1845 mentioned this issue Oct 8, 2023

Draft & Verify lucidrains/speculative-decoding#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap #3

Roadmap #3

ctlllll commented Sep 12, 2023 •

edited

Loading

JianbangZ commented Sep 12, 2023

Kimiko-AI commented Sep 13, 2023

yhyu13 commented Sep 13, 2023

yhyu13 commented Sep 13, 2023

ctlllll commented Sep 13, 2023

AIApprentice101 commented Sep 18, 2023

ctlllll commented Sep 18, 2023

JianbangZ commented Sep 18, 2023

ctlllll commented Sep 18, 2023

ctlllll commented Sep 18, 2023

feifeibear commented Sep 20, 2023

ctlllll commented Sep 20, 2023 •

edited

Loading

nikshepsvn commented Nov 21, 2023

ctlllll commented Nov 21, 2023

nivibilla commented Jan 22, 2024

Roadmap #3

Roadmap #3

Comments

ctlllll commented Sep 12, 2023 • edited Loading

Roadmap

Functionality

Integration

Local Deployment

Serving

Research

JianbangZ commented Sep 12, 2023

Kimiko-AI commented Sep 13, 2023

yhyu13 commented Sep 13, 2023

yhyu13 commented Sep 13, 2023

ctlllll commented Sep 13, 2023

AIApprentice101 commented Sep 18, 2023

ctlllll commented Sep 18, 2023

JianbangZ commented Sep 18, 2023

ctlllll commented Sep 18, 2023

ctlllll commented Sep 18, 2023

feifeibear commented Sep 20, 2023

ctlllll commented Sep 20, 2023 • edited Loading

nikshepsvn commented Nov 21, 2023

ctlllll commented Nov 21, 2023

nivibilla commented Jan 22, 2024

ctlllll commented Sep 12, 2023 •

edited

Loading

ctlllll commented Sep 20, 2023 •

edited

Loading