Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aihub models run failure #15

Open
1826133674 opened this issue Nov 13, 2024 · 0 comments
Open

aihub models run failure #15

1826133674 opened this issue Nov 13, 2024 · 0 comments
Labels
question Further information is requested

Comments

@1826133674
Copy link

Hello, regarding the binary files of compiled LLMs that can be directly downloaded from AIhub, what is the target device for these files? Can they run on an 8Gen3 chipset?

When I attempted to run inference on them using an 8gen3 smartphone, I encountered the following error:

./genie-t2t-run -c qwen2_7b_instruct_quantized.json -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"

[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 174998016 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 5005"
[ERROR] "Create From Binary FAILED!"

qwen2_7b_instruct_quantized.json :

{
    "dialog": {
        "version": 1,
        "type": "basic",
        "context": {
            "version": 1,
            "size": 4096,
            "n-vocab": 152064,
            "bos-token": -1,
            "eos-token": 151645
        },
        "sampler": {
            "version": 1,
            "seed": 42,
            "temp": 0.8,
            "top-k": 40,
            "top-p": 0.95
        },
        "tokenizer": {
            "version": 1,
            "path": "tokenizer.json"
        },
        "engine": {
            "version": 1,
            "n-threads": 3,
            "backend": {
                "version": 1,
                "type": "QnnHtp",
                "QnnHtp": {
                    "version": 1,
                    "use-mmap": true,
                    "spill-fill-bufsize": 0,
                    "mmap-budget": 0,
                    "poll": false,
                    "pos-id-dim": 64,
                    "cpu-mask": "0xe0",
                    "kv-dim": 128,
                    "rope-theta": 1000000,
                    "allow-async-init": false
                },
                "extensions": "htp_backend_ext_config.json"
            },
            "model": {
                "version": 1,
                "type": "binary",
                "binary": {
                    "version": 1,
                    "ctx-bins": [
                        "weight_sharing_model_1_of_4.serialized.bin",
                        "weight_sharing_model_2_of_4.serialized.bin",
                        "weight_sharing_model_3_of_4.serialized.bin",
                        "weight_sharing_model_4_of_4.serialized.bin"
                    ]
                }
            }
        }
    }
}

htp_backend_ext_config.json

{
    "devices": [
        {
            
            "dsp_arch": "v75",
            "cores": [
                {
                    "core_id": 0,
                    "perf_profile": "burst",
                    "rpc_control_latency": 100
                }
            ]
        }
    ],
    "memory": {
        "mem_type": "shared_buffer"
    },
    "context": {
        "weight_sharing_enabled": true
    }
}
@mestrona-3 mestrona-3 added the question Further information is requested label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants