SystemPanic/vllm-windows
active
significant_divergence
Selected Choose this fork if Windows support is the deciding constraint and you can tolerate substantial upstream drift. Choose upstream if you value fastest access to new fixes, broadest model support, and lower maintenance risk.
chu-tianxiang/vllm-gptq
stale
significant_divergence
Choose this fork only if you specifically need its GPTQ-centered customizations or other fork-specific workflows. If you want current vLLM behavior, active bug/security fixes, and lower operational risk, upstream is the safer default.
ROCm/vllm
active
significant_divergence
Choose this fork if ROCm is your deployment target and you want an AMD-focused vLLM branch with active downstream enablement. Choose upstream if you need the broadest, freshest mainline feature set or want to minimize divergence risk.
Choose this fork if you want vLLM with essentially no fork-specific behavior and can accept being slightly behind upstream; choose upstream if you need the latest fixes, security updates, or hardware/model support.
MooreThreads/vllm-musa
active
significant_divergence
Choose this fork if you need Moore Threads/MUSA-targeted vLLM support and are prepared to own upstream drift. Choose upstream if you want the broadest feature set, fastest security/bugfix cadence, and lowest maintenance burden.
HabanaAI/vllm-fork
active
significant_divergence
Choose this fork if Habana/HPU support is the priority and you want a branch shaped around that deployment model. Choose upstream if you want the broadest hardware coverage, fastest access to recent fixes, and lower long-term merge risk.
Said-Akbar/vllm-rocm
stale
significant_divergence
Choose this fork if AMD ROCm support is the priority and you need a fork tailored to MI25/50/60. Choose upstream if you want the newest fixes, broader hardware support, and lower maintenance risk.
fyabc/vllm
stale
significant_divergence
Choose this fork only if its extra platform, benchmark, or serving changes are the point; otherwise upstream is the safer default because this branch looks materially older, more divergent, and harder to maintain.
stepfun-ai/vllm
active
significant_divergence
Choose this fork if you need its hardware-, kernel-, or model-specific changes and can tolerate divergence from upstream. Avoid it if you want the broadest upstream compatibility, easiest upgrades, or the least integration risk.
QwenLM/vllm
stale
significant_divergence
Prefer this fork only if you specifically need its downstream customizations and can accept being far behind upstream. For most adopters, upstream vLLM is the safer default because it is much more active, current, and likely to receive fixes and support sooner.
wangshuai09/vllm
stale
significant_divergence
Prefer this fork if you specifically need its LoRA/model-serving customizations and can tolerate being materially behind upstream. Prefer upstream if you value freshness, broad hardware coverage, and lower maintenance risk.
SakanaAI/vllm
stale
significant_divergence
Prefer this fork only if you need its custom hardware/benchmarking work and are prepared to own a large maintenance burden. For most adopters, upstream is the better choice because it is far more current and actively maintained.
mesolitica/vllm-whisper
stale
significant_divergence
Choose this fork if you specifically need Whisper-serving behavior and are prepared to own a stale, heavily diverged codebase. Prefer upstream vLLM if you want current general-purpose LLM serving, active maintenance, and broad model/hardware support.
cduk/vllm-pascal
stale
significant_divergence
Choose this fork only if Pascal GPU support is the requirement. If you want current vLLM features, fixes, and lower maintenance burden, upstream is the better default.
XiaomiMiMo/vllm
slowing
significant_divergence
Prefer this fork only if you need its XiaomiMiMo/MiMo-specific behavior or its custom runtime/workflow changes. For general-purpose vLLM serving, upstream is the safer choice because it is far more active, broader in hardware/model support, and materially ahead on ongoing fixes.
tenstorrent/vllm
active
significant_divergence
Prefer this fork if you need Tenstorrent-specific integration and are comfortable owning a large downstream divergence. Prefer upstream if you want the broadest model compatibility, fastest access to fixes, and the least operational friction.
aws-neuron/upstreaming-to-vllm
slowing
significant_divergence
Choose this fork if your primary goal is AWS Neuron support and you want a codebase shaped around that deployment path. Choose upstream vLLM if you want the broadest feature freshness, fastest access to new fixes, and the least integration risk.
ai-infos/vllm-gfx906-mobydick
active
significant_divergence
Choose this fork if gfx906 AMD GPU support is the main requirement and you want ROCm-specific tuning. Choose upstream if you need the newest features, widest model coverage, or lower maintenance risk.
yanxiyue/vllm
stale
significant_divergence
Prefer this fork only if you need its specific customizations and can accept substantial maintenance burden. For most adopters, upstream vLLM is the safer default because this fork is materially stale and heavily diverged.
Prefer this fork only if you specifically want a near-upstream mirror to own internally; otherwise upstream is the better choice because it is ahead and actively receiving fixes and feature updates.