Repository brief

apache/spark

Read the upstream summary on the left, browse the cached forks below it, and load each fork comparison into the right-hand panel.

Cached analysis
cached 2026-03-30T15:56:02.063Z
1mo ago

apache/spark

Apache Spark is a large, active Apache project for large-scale data processing and unified analytics. It supports Scala, Java, Python, and R (deprecated) and includes Spark SQL, pandas API on Spark, MLlib, GraphX, and Structured Streaming. The repo is very mature and heavily forked, with 29,139 forks and 43,059 stars, and it was updated/pushed on 2026-03-30.

GitHub
Loading tags...
Stars43,059
Forks29,139
Default branchmaster
Last pushed2026-03-30T14:00:27Z
Recommended shortcuts

Jump straight into Discofork's strongest cached fork picks, or open a compare view in one click.

Forks

Choose a fork to inspect

10 of 10 fork briefs
Selected

Choose this fork if you want Palantir-specific Spark behavior and can live with an older, highly diverged codebase. Choose upstream Spark if you want current features, easier upgrades, and the broadest community support.

Prefer upstream Spark unless you specifically need this fork's legacy StreamSQL/Kafka streaming extensions and are willing to maintain a heavily outdated, highly divergent codebase yourself.

Choose this fork only if you need its legacy 1.1.x behavior or custom integrations. For most adopters, upstream Apache Spark is the better choice because this fork is stale, highly divergent, and missing modern Spark capabilities.

Choose this fork only if GPU acceleration is the primary requirement and you can absorb the maintenance burden. For most users, upstream Spark is the safer default because this fork is stale and materially behind.

Prefer this fork only if you need its older Hive/Spark compatibility and are willing to maintain a heavily lagging Spark branch. For most adopters, upstream Apache Spark is the safer choice because this fork is stale and likely missing many newer APIs, fixes, and usability improvements.

Choose this fork only if you need an old, historical Spark baseline. For active development, production use, or modern Spark features, upstream is the better choice by a wide margin.

Prefer this fork only if you need an old, frozen Spark baseline. If you want current Spark features, compatibility, or ongoing maintenance, upstream is the better choice by a wide margin.

Choose this fork only if you need legacy MapR integration and can accept an old Spark baseline. For anyone starting fresh or wanting current Spark features, upstream Apache Spark is the better fit.

Prefer this fork only if AWS Fargate serverless deployment is the primary requirement and you can accept a frozen, highly divergent Spark codebase. If you need current Spark features, compatibility, or active upstream support, upstream Apache Spark is the safer choice.

Prefer this fork only if you need its legacy compatibility and custom patches and can accept a large gap from active Apache Spark development. If you want current Spark features, fixes, and ecosystem compatibility, upstream is the better choice.