apache/spark
Read the upstream summary on the left, browse the cached forks below it, and load each fork comparison into the right-hand panel.
apache/spark
Apache Spark is a large, active Apache project for large-scale data processing and unified analytics. It supports Scala, Java, Python, and R (deprecated) and includes Spark SQL, pandas API on Spark, MLlib, GraphX, and Structured Streaming. The repo is very mature and heavily forked, with 29,139 forks and 43,059 stars, and it was updated/pushed on 2026-03-30.
Jump straight into Discofork's strongest cached fork picks, or open a compare view in one click.
Choose a fork to inspect
Prefer upstream Spark unless you specifically need this fork's legacy StreamSQL/Kafka streaming extensions and are willing to maintain a heavily outdated, highly divergent codebase yourself.
Choose this fork only if you need its legacy 1.1.x behavior or custom integrations. For most adopters, upstream Apache Spark is the better choice because this fork is stale, highly divergent, and missing modern Spark capabilities.
Choose this fork only if GPU acceleration is the primary requirement and you can absorb the maintenance burden. For most users, upstream Spark is the safer default because this fork is stale and materially behind.
Prefer this fork only if you need its older Hive/Spark compatibility and are willing to maintain a heavily lagging Spark branch. For most adopters, upstream Apache Spark is the safer choice because this fork is stale and likely missing many newer APIs, fixes, and usability improvements.
Choose this fork only if you need an old, historical Spark baseline. For active development, production use, or modern Spark features, upstream is the better choice by a wide margin.
Prefer this fork only if you need an old, frozen Spark baseline. If you want current Spark features, compatibility, or ongoing maintenance, upstream is the better choice by a wide margin.
Choose this fork only if you need legacy MapR integration and can accept an old Spark baseline. For anyone starting fresh or wanting current Spark features, upstream Apache Spark is the better fit.
Prefer this fork only if AWS Fargate serverless deployment is the primary requirement and you can accept a frozen, highly divergent Spark codebase. If you need current Spark features, compatibility, or active upstream support, upstream Apache Spark is the safer choice.
Prefer this fork only if you need its legacy compatibility and custom patches and can accept a large gap from active Apache Spark development. If you want current Spark features, fixes, and ecosystem compatibility, upstream is the better choice.