[Feature] update index automatically #2

zhyncs · 2024-12-01T08:02:08Z

zhyncs · 2024-12-12T10:56:37Z

fix the latest main build

Iven2132 · 2024-12-22T10:54:49Z

Hi @zhyncs I'm getting "from flashinfer.decode import _grouped_size_compiled_for_decode_kernels
ImportError: cannot import name '_grouped_size_compiled_for_decode_kernels' from 'flashinfer.decode' (/usr/local/lib/python3.10/site-packages/flashinfer/decode.py)" Error

Even though I'm using this nightly version, is there a fix? I'm trying to run Lama3.2 with SGLang.

zhyncs · 2024-12-22T10:56:08Z

@Iven2132 Please use SGLang latest release

Iven2132 · 2024-12-22T11:21:21Z

@zhyncs I treid using 0.4.0.post2 now getting 'ModuleNotFoundError: No module named 'zmq'' any if I install then get another one

zhyncs · 2024-12-22T11:27:58Z

pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/ —force-reinstall

Iven2132 · 2024-12-22T11:40:58Z

Getting 'ModuleNotFoundError: No module named 'torch'' But I've torch installed. What torch, CUDA would you recommend to use SGLANG to run Llama3.2 vision

zhyncs · 2024-12-22T11:43:13Z

https://sgl-project.github.io/start/install.html#method-3-using-docker

zhyncs · 2024-12-22T11:45:47Z

torch 2.5.1 cuda 12.4 works good for me

Iven2132 · 2024-12-22T11:46:50Z

how does your "https://flashinfer.ai/whl/cu121/torch2.5/flashinfer/" look like? for that

zhyncs · 2024-12-22T11:48:13Z

You don’t need to specify torch 2.5 for flashinfer. torch 2.4 and torch 2.5 are ABI compatible

Iven2132 · 2024-12-22T11:51:21Z

I'm confused is this image config right??

image = (
    modal.Image.from_registry('nvidia/cuda:12.4.0-runtime-ubuntu22.04', add_python="3.10")
    .apt_install("git")
   .pip_install(
        "transformers",
        "numpy<2",
        "fastapi[standard]==0.115.4",
        "huggingface_hub",
        "torch==2.5.1"
    ) 
    .run_commands('pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer/ --force-reinstall')
)

Also, I can use Llama.3.2 Vision directly like what I'm doing in my code?

PATH = "meta-llama/Llama-3.2-11B-Vision-Instruct"

 runtime = sgl.Runtime(
        model_path=PATH,
        tokenizer_path=PATH,
        tp_size=1
    )

Iven2132 · 2024-12-22T11:53:51Z

Now getting 'ModuleNotFoundError: No module named 'flashinfer''

zhyncs · 2024-12-22T11:56:37Z

You shouldn’t use 2.5 in the url

Iven2132 · 2024-12-22T11:57:44Z

I removed and used "pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/ --force-reinstall'" and got ModuleNotFoundError: No module named 'torch'

zhyncs · 2024-12-22T12:00:28Z

You can install torch separately

Iven2132 · 2024-12-22T12:02:30Z

How would that look like

I tried .run_commands('pip install torch && pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/ --force-reinstall') but that same error when did pip_install("torch") before instal sglang and flashinfer and got same issue

zhyncs · 2024-12-22T12:03:42Z

Why do you install with code, can you use command line

Iven2132 · 2024-12-22T12:07:57Z

I get

Collecting torchao>=0.7.0 (from sglang[all])
  Downloading torchao-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.metadata (13 kB)
Collecting gemlite (from sglang[all])
  Downloading gemlite-0.4.1.tar.gz (26 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-8ecbq_jv/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requireserror: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

zhyncs · 2024-12-22T12:11:30Z

https://github.com/sgl-project/sglang/blob/main/docker/Dockerfile#L29

Iven2132 · 2024-12-26T09:23:27Z

@zhyncs Getting AttributeError: module 'sglang' has no attribute 'Runtime'

  import sglang 
    from huggingface_hub import login
    import os

    image = "https://modal-public-assets.s3.amazonaws.com/golden-gate-bridge.jpg"
    login(
        os.environ["HF_TOKEN"]
    )

    runtime = sglang.Runtime(
        model_path="meta-llama/Llama-3.2-11B-Vision-Instruct",
        tokenizer_path="meta-llama/Llama-3.2-11B-Vision-Instruct",
        tp_size=2
    )
    runtime.endpoint.chat_template = sglang.lang.chat_template.get_chat_template(
        "llama_3_vision"
    )
    sglang.set_default_backend(runtime)

    async def process_image(image_url: str) -> str:
        response = requests.get(image_url)
        response.raise_for_status()

        image_filename = f"/tmp/{uuid4()}-{image_url.split('/')[-1]}"
        with open(image_filename, "wb") as file:
            file.write(response.content)
        return image_filename

    @sglang.function
    def image_qa(s, image_path: str, question: str):
        s += sglang.user(sglang.image(image_path) + question)
        s += sglang.assistant(sglang.gen("answer"))

    image_path = process_image(image)
    state = image_qa.run(
        image_path=image_path,
        question="What is in this image?",
        max_new_tokens=128
    )
    return {"answer": state["answer"]}

Iven2132 · 2024-12-26T10:06:42Z

here is the error. "ValueError: bad value(s) in fds_to_keep"

Iven2132 · 2024-12-26T10:20:23Z

@zhyncs here are the full log:

2024-12-26 10:19:06] Setting Triton cache manager to: sglang.srt.utils:CustomCacheManager
[2024-12-26 10:19:06] server_args=ServerArgs(model_path='Qwen/Qwen2-VL-7B-Instruct', tokenizer_path='Qwen/Qwen2-VL-7B-Instruct', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', 
kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='Qwen/Qwen2-VL-7B-Instruct', chat_template=None, is_embedding=False, revision=None, host='127.0.0.1', port=10000, mem_fraction_static=0.87, 
max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=2, stream_interval=1, random_seed=831226175, 
constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='debug', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, 
file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, 
ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', 
disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, 
enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, 
triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.11/dist-packages/sglang/srt/server.py", line 527, in launch_server
    launch_engine(server_args=server_args)
  File "/usr/local/lib/python3.11/dist-packages/sglang/srt/server.py", line 459, in launch_engine
    proc.start()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 40, in _launch
    tracker_fd = resource_tracker.getfd()
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/resource_tracker.py", line 91, in getfd
    self.ensure_running()
  File "/usr/lib/python3.11/multiprocessing/resource_tracker.py", line 148, in ensure_running
    pid = util.spawnv_passfds(exe, args, fds_to_pass)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/util.py", line 456, in spawnv_passfds
    return _posixsubprocess.fork_exec(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: bad value(s) in fds_to_keep

zhyncs self-assigned this Dec 1, 2024

zhyncs linked a pull request Dec 14, 2024 that will close this issue

feat: support index updates automatically #5

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] update index automatically #2

[Feature] update index automatically #2

zhyncs commented Dec 1, 2024

zhyncs commented Dec 12, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024 •

edited

Loading

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 26, 2024

Iven2132 commented Dec 26, 2024

Iven2132 commented Dec 26, 2024

[Feature] update index automatically #2

[Feature] update index automatically #2

Comments

zhyncs commented Dec 1, 2024

zhyncs commented Dec 12, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024 • edited Loading

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 22, 2024

zhyncs commented Dec 22, 2024

Iven2132 commented Dec 26, 2024

Iven2132 commented Dec 26, 2024

Iven2132 commented Dec 26, 2024

Iven2132 commented Dec 22, 2024 •

edited

Loading