tloen/llama-int8
stale
significant_divergence
Selected Choose this fork if your priority is int8/quantized local inference and lower RAM use. Avoid it if you want an actively maintained base or compatibility with the current upstream Llama stack, because it is old, narrow, and materially behind upstream.
Prefer this fork only if CPU-only execution is the main requirement and you accept a legacy, stale codebase. For most adopters, upstream or the newer Llama repos are the safer default.
Choose this fork if your goal is to experiment with or extend Mistral/Mixtral inside the original Llama inference code. Avoid it if you want current upstream maintenance, polished docs, or a stable baseline for Llama 2 itself.
soulteary/llama-docker-playground
Choose this fork if you value a packaged, interactive LLaMA 2 playground and are willing to accept stale lineage and more moving parts. Stick with upstream if you want the smallest reference implementation or alignment with Meta's newer Llama Stack direction.
OpenLMLab/OpenChineseLLaMA
stale
significant_divergence
Choose this fork only if you specifically need its Chinese-model workflow, conversion tools, or ColossalAI/offload support. For new work, the fork is stale and materially diverged from an already-deprecated upstream, so it is better as a legacy reference than as a foundation for a fresh deployment.
Prefer this fork if you want a lightweight, more controllable legacy inference example. Prefer upstream or the newer Llama Stack repos if you need current guidance, maintained workflows, or broader ecosystem support.
galatolofederico/vanilla-llama
Choose this fork only if you specifically want its extra server and decoding-workflow features and are comfortable owning a stale, legacy codebase. For most new adopters, upstream’s newer ecosystem is the better default.
Prefer this fork if your goal is legacy LLaMA inference on CPU and you want ready-made CPU/bfloat16 examples plus weight-merging helpers. Prefer upstream or newer Meta Llama repos if you need current support, newer workflows, or an actively maintained entry point.
modular-ml/wrapyfi-examples_llama
Choose this fork if Wrapyfi integration and deployment convenience matter more than staying current. Avoid it if you want the maintained upstream path or the newer Llama Stack ecosystem.
Choose this fork only if you need a legacy, minimally changed Llama inference baseline. If you want the current recommended Meta path or the latest fixes, upstream is the better default.