Dear GraphFrames users,
GraphFrames 0.9.2 is out on PyPi as graphframes-py and as io.graphframes on Maven Sonatype Central! Documentation is now available on graphframes.io… and we even have a new logo!
You can see below that GraphFrames is back! It has seen contributions every week for most of the year — we have half a dozen active contributors now. This release is due to the efforts of many people but I need to express our deep gratitude to Sem Sinchenko, who drove this release.
The project has gone from effectively dead to vibrant in the six months since GraphX was deprecated from Spark, which prompted us to get to work on an all-DataFrame replacement. You can see in the chart below that there are more frequent contributions than since the project’s inception!
It was necessary for GraphFrames to support both Spark 4 and Spark Connect to remain integral to the Spark community. There were many issues resolved in the release, but the core of it was:
io.graphframes
graphframes-py
The GraphFrames community has achieved our first goal: make the project viable again! Still in the future?
Sem has started implementing Property Graphs for GraphFrames, which currently has relationship
for edges but not type
for nodes. In current practice, this means property graph processing requires you to merge all your node schemas together into a kitchen sink schema before using GraphFrames’ algorithms. It is a real drag… property graphs will be a huge improvement! Sem recently outlined a beautiful vision for property graphs as part of the Open Lakehouse. Check it out!
This is actively debated: it would be a lot of trouble to release with Spark, but based on the number of search hits for GraphX versus GraphFrames, it would get us 10x as many users. When I put that way, GraphFrames in Spark sounds pretty good!
Spark deprecating GraphX was the call to action that led us to revive GraphFrames, and we heard it well. We’re building DataFrame implementations of all GraphX components. GraphX has already been removed from ShortestPaths and from LabelPropagation. The rest of the work is being tracked here and is underway. GraphX will be deprecated from GraphFrames as of 1.0. GraphFrames 2.0 will remove GraphX completely. Soon GraphFrames will be entirely built on DataFrames!
Developers from Apache Sedona joined the development of GraphFrames 0.9. Sedona 1.80 will depend on the new version. They’ve been a huge help! James Willis, Adam Binford and the Apache Sedona team gave us new configurations, helped us fix our CI to enable the 0.9 release and drove Spark 4 support. James Willis became an official maintainer of GraphFrames to coordinate efforts between these projects.
We have a lot of new contributors for this release!
We are building a list of dependent projects, so if you use GraphFrames, please let us know! We want your help testing new versions before the release.
Got questions or concerns? Let us know what you think! Find us on Discord in #graphframes on GraphGeeks, or join the GraphFrames Google Group.
Note: this email updated originally appeared at https://blog.graphlet.ai/graphframes-is-back-with-v0-9-2-5773d55d3291
Russell Jurney | rju...@graphlet.ai | graphlet.ai | Graphlet AI Blog | LinkedIn | BlueSky