You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@NickyDark1, I ran that model in colab and it work
Without quanitizing
# Load model directlyfromtransformersimportAutoTokenizer, AutoModelForCausalLMtokenizer=AutoTokenizer.from_pretrained("h2oai/h2o-danube-1.8b-chat")
model=AutoModelForCausalLM.from_pretrained("h2oai/h2o-danube-1.8b-chat")
# from transformers import pipelinepipe=pipeline("text2text-generation", model=model, tokenizer=tokenizer)
pipe("Hello, How")
Output:
[{'generated_text': 'Hello, How are you?\n\n"I\'m doing well, thank you. How about'}]
After replacing Linear layer with bitnet
frombitnetimportreplace_linears_in_hfreplace_linears_in_hf(model)
# change model back to device cudamodel.to("cuda")
pipe_1_bit=pipeline("text-generation", model=model, tokenizer=tokenizer)
pipe_1_bit("Hello, How")
Output is:
[{'generated_text': 'Hello, How島 waters everyoneürgen Mess till revel馬 Vitt officials ambos">< czł plusieurs ap riv居'}]
But it takes ages to give this answer(8 min in my case in free colab).
model_id = "h2oai/h2o-danube-1.8b-chat"#
Upvote & Fund
The text was updated successfully, but these errors were encountered: