/jump to repo

Repos Recent Bookmarks Watched Notes Tags Discover Compare Stats GitHub

Repository brief

deepspeedai/DeepSpeed

Read the upstream summary on the left, browse the cached forks below it, and load each fork comparison into the right-hand panel.

Cached analysis

cached 2026-03-31T09:41:55.663Z

1mo ago

deepspeedai/DeepSpeed

DeepSpeed is an active Apache-2.0 deep learning optimization library for distributed training and inference. It has a large upstream footprint with 41,948 stars, 4,770 forks, and very recent activity on 2026-03-30, which makes it a high-interest upstream if you care about large-scale model training systems.

Loading tags...

Stars41,948

Forks4,770

Default branchmaster

Last pushed2026-03-30T20:07:25Z

Recommended shortcuts

Jump straight into Discofork's strongest cached fork picks, or open a compare view in one click.

Forks

Choose a fork to inspect

10 of 10 fork briefs

Maintenance:

Magnitude:

Sort:

Selected

Prefer upstream if you want current DeepSpeed capabilities and active feature development. Prefer this fork only if you specifically need its downstream patches and are prepared to own the divergence and porting burden.

Choose this fork if your priority is Habana/Gaudi support and you want a DeepSpeed variant already adapted for that platform. Choose upstream if you need the newest DeepSpeed features, fastest bugfix flow, or the least-divergent codebase.

Prefer this fork only if you need its older, customized behavior and are prepared to own maintenance. If you want current DeepSpeed capabilities, active fixes, and modern distributed-training features, upstream is the better choice.

Choose this fork if your priority is accelerator-specific compatibility and you can tolerate lagging upstream features. Choose upstream if you want the latest DeepSpeed capabilities, active maintenance, and lower integration risk.

Choose this fork only if you need its specific older/custom DeepSpeed behavior and are prepared to own major divergence. For most adopters, upstream DeepSpeed is the safer choice because it is active, much newer, and far richer in maintained features.

Choose upstream unless you specifically need this older, unchanged snapshot. This fork does not add capabilities and lags substantially behind current DeepSpeed.

Prefer upstream unless you specifically need this fork's older snapshot or legacy chat/CPU/AMD changes; for new adoption, the fork is too stale and too divergent to be a safe default.

Prefer this fork only if you need Snowflake-specific maintenance or the narrowed codebase it represents. If you want current DeepSpeed features, active upstream alignment, or broad model-system support, upstream is the better choice.

Prefer this fork only if you need its specific 2023-era customizations and can accept major divergence from upstream. For most adopters, upstream DeepSpeed is the safer choice because this fork is stale, heavily rewritten, and likely missing newer features and fixes.

Prefer this fork only if you explicitly want an older, heavily pruned DeepSpeed baseline and are prepared to own maintenance yourself. For most adopters, upstream is the safer choice because this fork is stale and materially diverged.

Fork comparison

EleutherAI/DeeperSpeed

38/100

stale

significant_divergence

Prefer upstream if you want current DeepSpeed capabilities and active feature development. Prefer this fork only if you specifically need its downstream patches and are prepared to own the divergence and porting burden.

Likely purpose

A downstream-maintained DeepSpeed variant for users who need a customized, older, or simplified training stack with local patches and compatibility fixes instead of upstream’s latest feature cadence.

Best for

Teams already dependent on this fork’s specific behavior and willing to stay on a custom maintenance branch.; Users running older PyTorch or legacy DeepSpeed integrations that need compatibility patches more than new upstream features.; Operators who value stability of an internalized fork over adopting upstream’s newest research features.

Additional features

ZeRO-3 partitioned parameter get/set APIs
Slurm launcher support
gradient accumulation memory leak fix
pipeline engine logging fix
BF16 optimizer synchronization fixes
Torch compatibility fixes for older PyTorch versions

Missing features

Upstream’s newer feature work such as SuperOffload, ZenFlow, DeepCompile, DeepNVMe, Arctic long-sequence training, and extended Muon support is not present in this fork’s visible history.
Large parts of upstream runtime appear removed or rewritten, including Ulysses sequence parallel, ZenFlow runtime/optimizer code, and compile-related pieces.
The fork is about 200 commits behind upstream, so adopters likely miss recent bugfixes, tests, and release updates from current DeepSpeed.
Some upstream behavior appears to be removed, disabled, or intentionally stripped down in this fork.
It trails upstream by 200 commits, so some recent upstream features and fixes are likely not present yet.

Strengths

Can be attractive if you need a fork pinned to a known downstream behavior set.
Includes practical operational fixes and compatibility patches that may matter in older training environments.
The history suggests focused maintenance around real training workflows rather than broad feature chasing.

Risks

Materially behind upstream, so new upstream fixes and features will not be available without manual porting.
Large deletion-heavy diff suggests API and behavior drift that can complicate upgrades or third-party integrations.
Core subsystems were touched heavily, so regression risk is higher than for a small patch fork.

deepspeedai/DeepSpeed · Discofork