Integration into Spark

22 views
Skip to first unread message

James Willis

unread,
Apr 2, 2025, 5:15:45 PMApr 2
to graph...@googlegroups.com
Hi,

I recently read on the documentation website that there are plans to merge GraphFrames with Apache Spark, effectively making it a core Spark component.

I'm wondering if this is the best approach. I believe it might be more beneficial for GraphFrames to remain an independent project. This would allow it to maintain its own release cycle and specific practices for making changes and conducting reviews.

Spark is a large, well-established project with a broad range of use cases and a large user base. In contrast, GraphFrames is a more specialized and, in some ways, less mature project with a smaller user base.

Keeping GraphFrames separate from the Spark project would allow it the freedom to grow and adapt independently, guided by a smaller group of owners with specific motivation and domain expertise.

I would hate to see graphframes suffer the same fate as GraphX.

Thanks,
James

Russell Jurney

unread,
Apr 2, 2025, 10:30:10 PMApr 2
to James Willis, graph...@googlegroups.com
Our first goal is to make GraphFrames as a project healthy again and this is underway. So far we haven't taken any steps towards Spark inclusion save for some conventions we've adopted. There are definitely downsides to being part of Spark but the upside is increased exposure through the documentation.

Thanks for raising this for discussion.

Russell



--
You received this message because you are subscribed to the Google Groups "GraphFrames" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphframes...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/graphframes/CAG%2B_WJKxA%2BKSNeVzATFp1cqLo5-viWRnmCSPrXVvOPdi3xFoaw%40mail.gmail.com.

Ángel Álvarez Pascua

unread,
Apr 2, 2025, 10:47:05 PMApr 2
to Russell Jurney, James Willis, graph...@googlegroups.com

Hi James,

I totally agree with you. In fact, just last week, I asked Russell and Sem the same question: "Do we really want to merge GraphFrames with Spark? Spark is getting out of hand..."

Spark is a massive project with numerous issues, PRs, dependencies, and challenges. Do we really want to complicate things further? Graphs are a niche topic, and GraphFrames is already well-established in this domain. I don't think merging it with Spark will attract thousands of new users. If that’s our goal, we should focus more on promoting the library on LinkedIn, giving talks, fixing issues, publishing new releases, and adding exciting new features and integrations with other systems—not just Spark-based ones, but also Polars, Pandas, Snowpark, and more.

Regards,
Ángel


Reply all
Reply to author
Forward
0 new messages