Repository brief

apache/spark

Read the upstream summary on the left, browse the cached forks below it, and load each fork comparison into the right-hand panel.

Cached analysis
cached 2026-03-30T15:56:02.063Z

apache/spark

Apache Spark is a large, active Apache project for large-scale data processing and unified analytics. It supports Scala, Java, Python, and R (deprecated) and includes Spark SQL, pandas API on Spark, MLlib, GraphX, and Structured Streaming. The repo is very mature and heavily forked, with 29,139 forks and 43,059 stars, and it was updated/pushed on 2026-03-30.

GitHub
Stars43,059
Forks29,139
Default branchmaster
Last pushed2026-03-30T14:00:27Z
Best maintainedNone
Closest to upstreamthunderain-project/StreamSQL
Most feature-richstratiocommit/spark
Most opinionatedstratiocommit/spark
Forks

Choose a fork to inspect

10 cached fork briefs