mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-09 05:44:23 +00:00
1.0 KiB
1.0 KiB
vLLM
Open source library for LLM inference and serving.
TL;DR
Setup
pip install 'vllm'
pipx install 'vllm'
Usage
vllm serve 'meta-llama/Llama-2-7b-hf' --port '8000' --gpu-memory-utilization '0.9'
vllm serve 'meta-llama/Llama-2-70b-hf' --tensor-parallel-size '2' --port '8000'