Is GraphFrames motif finding typically I/O or compute bound?

5 views
Skip to first unread message

Russell Jurney

unread,
Mar 1, 2025, 8:44:35 AMMar 1
to GraphFrames
As obsessed with property graph motif finding as I am, I realized that I have never checked. For billions of nodes and edges, is network motif finding with GraphFrames typically I/O or compute bound? I’m thinking of playing with FPGAs for Spark to assist in motif finding but need to understand the compute workload better. I’ve used them as a user but focused on feasibility and efficiency rather than in characterizing the workloads.

Any help greatly appreciated!

Thanks,

Sem

unread,
Mar 1, 2025, 9:21:30 AMMar 1
to graph...@googlegroups.com
In my understanding it is more i/o bound. It is based on nested joins
that will generate a lot of shuffles and a lot of ser-de.
> <https://graphlet.ai/> | Graphlet AI Blog
> <https://blog.graphlet.ai/> | LinkedIn
> <https://linkedin.com/in/russelljurney> | BlueSky
> <https://bsky.app/profile/rjurney.bsky.social>
>
> --
> You received this message because you are subscribed to the Google
> Groups "GraphFrames" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to graphframes...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/graphframes/CANSvDjrRnkZPkiRzw5K4OyVLF56Exj35kgfPPL%3D16xziVTwCxg%40mail.gmail.com
> <https://groups.google.com/d/msgid/graphframes/CANSvDjrRnkZPkiRzw5K4OyVLF56Exj35kgfPPL%3D16xziVTwCxg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Ángel

unread,
Mar 2, 2025, 12:53:24 AMMar 2
to Sem, graph...@googlegroups.com
Yes, I got the same impression too. However, ConnectedComponents tries to use broadcast when less than a threshold limit in hubs is detected (look at the GraphFrame.skewedJoin method), so ... no shuffle there.

Sounds like an interesting - and maybe complex - topic to investigate. Why not opening an issue?

Anything related to performance ... count me in, please!

Russell Jurney

unread,
Mar 2, 2025, 2:39:32 AMMar 2
to Ángel, Sem, graph...@googlegroups.com
I wonder if eight FOGA boards with 100Gbps connections and loads of RAM can handle any I/O throughout efficiently with their pipelining and direct connectivity?

Thanks,


Reply all
Reply to author
Forward
0 new messages