aihub models run failure #15

1826133674 · 2024-11-13T09:13:04Z

Hello, regarding the binary files of compiled LLMs that can be directly downloaded from AIhub, what is the target device for these files? Can they run on an 8Gen3 chipset?

When I attempted to run inference on them using an 8gen3 smartphone, I encountered the following error:

./genie-t2t-run -c qwen2_7b_instruct_quantized.json -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"

[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 174998016 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 5005"
[ERROR] "Create From Binary FAILED!"

qwen2_7b_instruct_quantized.json :

{
    "dialog": {
        "version": 1,
        "type": "basic",
        "context": {
            "version": 1,
            "size": 4096,
            "n-vocab": 152064,
            "bos-token": -1,
            "eos-token": 151645
        },
        "sampler": {
            "version": 1,
            "seed": 42,
            "temp": 0.8,
            "top-k": 40,
            "top-p": 0.95
        },
        "tokenizer": {
            "version": 1,
            "path": "tokenizer.json"
        },
        "engine": {
            "version": 1,
            "n-threads": 3,
            "backend": {
                "version": 1,
                "type": "QnnHtp",
                "QnnHtp": {
                    "version": 1,
                    "use-mmap": true,
                    "spill-fill-bufsize": 0,
                    "mmap-budget": 0,
                    "poll": false,
                    "pos-id-dim": 64,
                    "cpu-mask": "0xe0",
                    "kv-dim": 128,
                    "rope-theta": 1000000,
                    "allow-async-init": false
                },
                "extensions": "htp_backend_ext_config.json"
            },
            "model": {
                "version": 1,
                "type": "binary",
                "binary": {
                    "version": 1,
                    "ctx-bins": [
                        "weight_sharing_model_1_of_4.serialized.bin",
                        "weight_sharing_model_2_of_4.serialized.bin",
                        "weight_sharing_model_3_of_4.serialized.bin",
                        "weight_sharing_model_4_of_4.serialized.bin"
                    ]
                }
            }
        }
    }
}

htp_backend_ext_config.json

{
    "devices": [
        {
            
            "dsp_arch": "v75",
            "cores": [
                {
                    "core_id": 0,
                    "perf_profile": "burst",
                    "rpc_control_latency": 100
                }
            ]
        }
    ],
    "memory": {
        "mem_type": "shared_buffer"
    },
    "context": {
        "weight_sharing_enabled": true
    }
}

The text was updated successfully, but these errors were encountered:

mestrona-3 added the question Further information is requested label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aihub models run failure #15

aihub models run failure #15

1826133674 commented Nov 13, 2024

aihub models run failure #15

aihub models run failure #15

Comments

1826133674 commented Nov 13, 2024