Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] update index automatically #2

Open
zhyncs opened this issue Dec 1, 2024 · 22 comments · May be fixed by #5
Open

[Feature] update index automatically #2

zhyncs opened this issue Dec 1, 2024 · 22 comments · May be fixed by #5
Assignees

Comments

@zhyncs
Copy link
Member

zhyncs commented Dec 1, 2024

ref flashinfer-ai/whl#1 (review)

@zhyncs zhyncs self-assigned this Dec 1, 2024
@zhyncs
Copy link
Member Author

zhyncs commented Dec 12, 2024

fix the latest main build

@zhyncs zhyncs linked a pull request Dec 14, 2024 that will close this issue
2 tasks
@Iven2132
Copy link

Hi @zhyncs I'm getting "from flashinfer.decode import _grouped_size_compiled_for_decode_kernels
ImportError: cannot import name '_grouped_size_compiled_for_decode_kernels' from 'flashinfer.decode' (/usr/local/lib/python3.10/site-packages/flashinfer/decode.py)" Error

Even though I'm using this nightly version, is there a fix? I'm trying to run Lama3.2 with SGLang.

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

@Iven2132 Please use SGLang latest release

@Iven2132
Copy link

@zhyncs I treid using 0.4.0.post2 now getting 'ModuleNotFoundError: No module named 'zmq'' any if I install then get another one

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/ —force-reinstall

@Iven2132
Copy link

Getting 'ModuleNotFoundError: No module named 'torch'' But I've torch installed. What torch, CUDA would you recommend to use SGLANG to run Llama3.2 vision

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

torch 2.5.1 cuda 12.4 works good for me

@Iven2132
Copy link

Iven2132 commented Dec 22, 2024

how does your "https://flashinfer.ai/whl/cu121/torch2.5/flashinfer/" look like? for that

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

You don’t need to specify torch 2.5 for flashinfer. torch 2.4 and torch 2.5 are ABI compatible

@Iven2132
Copy link

I'm confused is this image config right??

image = (
    modal.Image.from_registry('nvidia/cuda:12.4.0-runtime-ubuntu22.04', add_python="3.10")
    .apt_install("git")
   .pip_install(
        "transformers",
        "numpy<2",
        "fastapi[standard]==0.115.4",
        "huggingface_hub",
        "torch==2.5.1"
    ) 
    .run_commands('pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer/ --force-reinstall')
)

Also, I can use Llama.3.2 Vision directly like what I'm doing in my code?

PATH = "meta-llama/Llama-3.2-11B-Vision-Instruct"

 runtime = sgl.Runtime(
        model_path=PATH,
        tokenizer_path=PATH,
        tp_size=1
    )


@Iven2132
Copy link

Now getting 'ModuleNotFoundError: No module named 'flashinfer''

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

You shouldn’t use 2.5 in the url

@Iven2132
Copy link

I removed and used "pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/ --force-reinstall'" and got ModuleNotFoundError: No module named 'torch'

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

You can install torch separately

@Iven2132
Copy link

How would that look like

I tried .run_commands('pip install torch && pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/ --force-reinstall') but that same error when did pip_install("torch") before instal sglang and flashinfer and got same issue

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

Why do you install with code, can you use command line

@Iven2132
Copy link

I get

Collecting torchao>=0.7.0 (from sglang[all])
  Downloading torchao-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.metadata (13 kB)
Collecting gemlite (from sglang[all])
  Downloading gemlite-0.4.1.tar.gz (26 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-8ecbq_jv/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 334, in get_requireserror: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

@zhyncs
Copy link
Member Author

zhyncs commented Dec 22, 2024

@Iven2132
Copy link

@zhyncs Getting AttributeError: module 'sglang' has no attribute 'Runtime'

  import sglang 
    from huggingface_hub import login
    import os

    image = "https://modal-public-assets.s3.amazonaws.com/golden-gate-bridge.jpg"
    login(
        os.environ["HF_TOKEN"]
    )

    runtime = sglang.Runtime(
        model_path="meta-llama/Llama-3.2-11B-Vision-Instruct",
        tokenizer_path="meta-llama/Llama-3.2-11B-Vision-Instruct",
        tp_size=2
    )
    runtime.endpoint.chat_template = sglang.lang.chat_template.get_chat_template(
        "llama_3_vision"
    )
    sglang.set_default_backend(runtime)

    async def process_image(image_url: str) -> str:
        response = requests.get(image_url)
        response.raise_for_status()

        image_filename = f"/tmp/{uuid4()}-{image_url.split('/')[-1]}"
        with open(image_filename, "wb") as file:
            file.write(response.content)
        return image_filename

    @sglang.function
    def image_qa(s, image_path: str, question: str):
        s += sglang.user(sglang.image(image_path) + question)
        s += sglang.assistant(sglang.gen("answer"))

    image_path = process_image(image)
    state = image_qa.run(
        image_path=image_path,
        question="What is in this image?",
        max_new_tokens=128
    )
    return {"answer": state["answer"]}

@Iven2132
Copy link

here is the error. "ValueError: bad value(s) in fds_to_keep"

@Iven2132
Copy link

@zhyncs here are the full log:

2024-12-26 10:19:06] Setting Triton cache manager to: sglang.srt.utils:CustomCacheManager
[2024-12-26 10:19:06] server_args=ServerArgs(model_path='Qwen/Qwen2-VL-7B-Instruct', tokenizer_path='Qwen/Qwen2-VL-7B-Instruct', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', 
kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='Qwen/Qwen2-VL-7B-Instruct', chat_template=None, is_embedding=False, revision=None, host='127.0.0.1', port=10000, mem_fraction_static=0.87, 
max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=2, stream_interval=1, random_seed=831226175, 
constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='debug', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, 
file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, 
ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', 
disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, 
enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, 
triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.11/dist-packages/sglang/srt/server.py", line 527, in launch_server
    launch_engine(server_args=server_args)
  File "/usr/local/lib/python3.11/dist-packages/sglang/srt/server.py", line 459, in launch_engine
    proc.start()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 40, in _launch
    tracker_fd = resource_tracker.getfd()
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/resource_tracker.py", line 91, in getfd
    self.ensure_running()
  File "/usr/lib/python3.11/multiprocessing/resource_tracker.py", line 148, in ensure_running
    pid = util.spawnv_passfds(exe, args, fds_to_pass)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/util.py", line 456, in spawnv_passfds
    return _posixsubprocess.fork_exec(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: bad value(s) in fds_to_keep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants