How do I GraphFrames in a Jupyter Notebook?

9 views
Skip to first unread message

Russell Jurney

unread,
Mar 25, 2025, 7:59:15 AMMar 25
to GraphFrames
Can someone help me out real fast with a recipe? I get errors.

Thanks,

Erik Eklund

unread,
Mar 25, 2025, 8:00:54 AMMar 25
to Russell Jurney, GraphFrames
What errors do you get?

--
You received this message because you are subscribed to the Google Groups "GraphFrames" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphframes...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/graphframes/CANSvDjrt3V9vrhX8n0m2%2B9UwAgZhwnMSWJzvZe%2BMAeCUrP2HpA%40mail.gmail.com.

Sem

unread,
Mar 25, 2025, 8:04:16 AMMar 25
to graph...@googlegroups.com
What exactly are you going to do? If it is about local development on a
sinlge-node, you can just start the Spark Connect server in the
background. It would be the simplest way imo.

On 3/25/25 1:00 PM, Erik Eklund wrote:
> What errors do you get?
>
> Den tis 25 mars 2025 kl 12:59 skrev Russell Jurney
> <russell...@gmail.com>:
>
> Can someone help me out real fast with a recipe? I get errors.
>
> Thanks,
>
> Russell Jurney | rju...@graphlet.ai | graphlet.ai
> <https://graphlet.ai/> | Graphlet AI Blog
> <https://blog.graphlet.ai/> | LinkedIn
> <https://linkedin.com/in/russelljurney> | BlueSky
> <https://bsky.app/profile/rjurney.bsky.social>
>
> --
> You received this message because you are subscribed to the Google
> Groups "GraphFrames" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to graphframes...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/graphframes/CANSvDjrt3V9vrhX8n0m2%2B9UwAgZhwnMSWJzvZe%2BMAeCUrP2HpA%40mail.gmail.com
> <https://groups.google.com/d/msgid/graphframes/CANSvDjrt3V9vrhX8n0m2%2B9UwAgZhwnMSWJzvZe%2BMAeCUrP2HpA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "GraphFrames" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to graphframes...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/graphframes/CAEu6kbh%2BSO9vycym%3DxU83MCHbstxOyoX%2BJ461bBacf3ZnZJ8-A%40mail.gmail.com
> <https://groups.google.com/d/msgid/graphframes/CAEu6kbh%2BSO9vycym%3DxU83MCHbstxOyoX%2BJ461bBacf3ZnZJ8-A%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Russell Jurney

unread,
Mar 25, 2025, 8:09:56 AMMar 25
to Sem, graph...@googlegroups.com
I am just demoing from my machine. I cracked under the pressure of my 9AM demo and reached out, however I just got it working again :)



Sem

unread,
Mar 25, 2025, 8:13:49 AMMar 25
to Russell Jurney, graph...@googlegroups.com
1. Locate the script "python/dev/run_connect.py"

2. Run it somewhere in the background

3. Use spark = SparkSession.remote("sc://localhost:15002")... in your
notebook to connect

4. Should work...

Otherwise you should manually add graphframes assembly to the spark.jars
for work with it in a classic way.

On 3/25/25 1:09 PM, Russell Jurney wrote:
> I am just demoing from my machine. I cracked under the pressure of my
> 9AM demo and reached out, however I just got it working again :)
>
>
>
> On Tue, Mar 25, 2025 at 5:04 AM Sem <ssinc...@apache.org> wrote:
>
> What exactly are you going to do? If it is about local development
> on a
> sinlge-node, you can just start the Spark Connect server in the
> background. It would be the simplest way imo.
>
> On 3/25/25 1:00 PM, Erik Eklund wrote:
> > What errors do you get?
> >
> > Den tis 25 mars 2025 kl 12:59 skrev Russell Jurney
> > <russell...@gmail.com>:
> >
> >     Can someone help me out real fast with a recipe? I get errors.
> >
> >     Thanks,
> >
> >     Russell Jurney | rju...@graphlet.ai | graphlet.ai
> <http://graphlet.ai>
> >     <https://graphlet.ai/> | Graphlet AI Blog
> >     <https://blog.graphlet.ai/> | LinkedIn
> >     <https://linkedin.com/in/russelljurney> | BlueSky
> >     <https://bsky.app/profile/rjurney.bsky.social>
> >
> >     --
> >     You received this message because you are subscribed to the
> Google
> >     Groups "GraphFrames" group.
> >     To unsubscribe from this group and stop receiving emails
> from it,
> >     send an email to graphframes...@googlegroups.com
> <mailto:graphframes%2Bunsu...@googlegroups.com>.
> <mailto:graphframes%2Bunsu...@googlegroups.com>.
> <mailto:graphframes%2Bunsu...@googlegroups.com>.

Russell Jurney

unread,
Mar 25, 2025, 9:44:46 AMMar 25
to Sem, graph...@googlegroups.com
How do I configure Delta tables and a schema registry? The recipes I find don't seem to work. I want to be able to store Delta tables in specified locations...

```python
from pyspark.sql import SparkSession

spark: SparkSession 
= (
    SparkSession.builder.appName("SparkGraphFrames")
    .config("spark.jars.packages", "io.delta:delta-core_2.12:2.4.0,graphframes:graphframes:0.8.2-spark3.2-s_2.12")
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    .getOrCreate()
)

Copy# Drop existing tables first
spark.sql("DROP TABLE IF EXISTS vertices")
spark.sql("DROP TABLE IF EXISTS edges")

# Save as Delta tables
g.vertices.write.format("delta").saveAsTable("vertices")
g.edges.write.format("delta").saveAsTable("edges")

# Load back
g = GraphFrame(
    spark.read.table("vertices"),
    spark.read.table("edges")
)
```

Fails. This also fails... the LLMs are telling me to do this stuff but it fails.

```python
# Create external tables
spark.sql("CREATE OR REPLACE TABLE vertices USING parquet LOCATION '/path/to/vertices'")
spark.sql("CREATE OR REPLACE TABLE edges USING parquet LOCATION '/path/to/edges'")
```

Sem

unread,
Mar 25, 2025, 9:50:18 AMMar 25
to Russell Jurney, graph...@googlegroups.com
I think you can try to avoid using catalog and to work with FS directly.

Something like:

```python

g.vertices.write.format("delta").mode("overwrite").save("path-to-vertices-delta-root-directory")

g.edges.write.format("delta").mode("overwrite").save("path-to-edges-delta-root-directory")

```

and

```python

spark.read.format("delta").load("path-to-vertices-delta-root-directory")

spark.read.format("delta").load("path-to-edges-delta-root-directory")

```

Are you 100% sure you want to use DeltaCatalog?
> >     <mailto:graphframes%2Bunsu...@googlegroups.com
> <mailto:graphframes%252Buns...@googlegroups.com>>.
> >     <mailto:graphframes%2Bunsu...@googlegroups.com
> <mailto:graphframes%252Buns...@googlegroups.com>>.
> >     <mailto:graphframes%2Bunsu...@googlegroups.com
> <mailto:graphframes%252Buns...@googlegroups.com>>.

Russell Jurney

unread,
Mar 25, 2025, 10:13:57 AMMar 25
to Sem, graph...@googlegroups.com
Well, it's just a demo of Delta Tables so it doesn't really matter I suppose.

To unsubscribe from this group and stop receiving emails from it, send an email to graphframes...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/graphframes/9294ca0a-a60d-4a82-b37a-920912b44729%40apache.org.

Russell Jurney

unread,
Mar 25, 2025, 1:55:42 PMMar 25
to Russell Jurney, Sem, graph...@googlegroups.com
Can you easily run unity catalog or is that a pain? Appreciate the help.

Thanks,


Ángel Álvarez Pascua

unread,
Mar 25, 2025, 2:13:51 PMMar 25
to Russell Jurney, Russell Jurney, Sem, GraphFrames
Do you mean standalone or on Databricks? if the later, I have a lab with UC

Russell Jurney

unread,
Mar 25, 2025, 2:28:36 PMMar 25
to Ángel Álvarez Pascua, Russell Jurney, Sem, GraphFrames
So it sounds like a pain in the ass, not something you can apt install on Ubuntu? :) I looked it up it is really neat on Databricks but no I don't gets it.

Russell

Ángel Álvarez Pascua

unread,
Mar 25, 2025, 2:42:12 PMMar 25
to Russell Jurney, Russell Jurney, Sem, GraphFrames
It shouldn't be, but haven't tried yet. I actually had in mind writing another article about UC, debugging it outside its "natural habitat" ;)
Reply all
Reply to author
Forward
0 new messages