Files
oam/knowledge base/ai/vllm.md
2026-02-06 00:18:02 +01:00

1.0 KiB

vLLM

Open source library for LLM inference and serving.

  1. TL;DR
  2. Further readings
    1. Sources

TL;DR

Setup
pip install 'vllm'
pipx install 'vllm'
Usage
vllm serve 'meta-llama/Llama-2-7b-hf' --port '8000' --gpu-memory-utilization '0.9'
vllm serve 'meta-llama/Llama-2-70b-hf' --tensor-parallel-size '2' --port '8000'

Further readings

Sources