brandon/oam

mirror of https://gitea.com/mcereda/oam.git synced 2026-02-09 05:44:23 +00:00

Files

Michele Cereda e99a49e851 fix(kb/ai): give vllm a title

2026-02-06 00:18:02 +01:00

1.0 KiB

Raw Blame History

vLLM

Open source library for LLM inference and serving.

TL;DR
Further readings
1. Sources

TL;DR

Setup

pip install 'vllm'
pipx install 'vllm'

Usage

vllm serve 'meta-llama/Llama-2-7b-hf' --port '8000' --gpu-memory-utilization '0.9'
vllm serve 'meta-llama/Llama-2-70b-hf' --tensor-parallel-size '2' --port '8000'

Further readings

Sources

Documentation