Save and Restore State of 3rd party data

194 views
Skip to first unread message

Learned Byerror

unread,
Jun 21, 2025, 7:21:52 PMJun 21
to golang-nuts

All,

I have an application that uses gonum/vptree and gonum/graph/simple.  Currently every run of the application instantiates the vptree from a database, queries the vptree to find nearest neighbors, writes the data to database table A, creates a weighted undirected graph from table A, extracts k-clique communities from the graph and then generates SVGs for each community.

All of this works as expected and runs in about 7 minutes.  The initial version of the program ran in about 2 hours.  I heavily used pprof to identify optimization opportunities.  I think the code is optimized as much as is possible using the gonum packages.

This program is run against a ever growing set of data.  The current data set has almost 3MM entities and grows by approximately 25K for each new data set.  As such, I expect the run time to continue increase.  There are two steps that take over 80% of the run time: vptree nearest neighbor query, generation of the weighted undirected graph. I would like to be able to save the state of the vptree and graph at the end of each run and use that as input in the next run. 

Both gonum/vptree an gonum/graph/simple contain private fields. Neither implement a GobEncode or GobDecode Interface. Consequently, I cannot use encoding/gob. 

Please note, this is about saving and retrieving state in the same version of the same program.  It is not about transferring data from one program to a different program. Neither of these packages have a dependency on external state at run time.

Are there alternative methods that I should consider?  Approaches using unsafe are acceptable to me. If something fails, I can alway recover by making a full run as I am currently doing.

The only alternative that I have come up with at this point if to make a pull request and add the GobEncode/Decode functionality myself. While this is an option, the effort required to do so is likely significant.

Thank you in advance for your guidance!

lbe 

Jason E. Aten

unread,
Jun 22, 2025, 12:02:25 PMJun 22
to golang-nuts
Hi Ibe,

gob is unsupported and not optimal in many ways.

I would invite you to try my serialization package. https://github.com/glycerine/greenpack
You can serialize unexported fields if you wish to with the -unexported flag, though I generally don't.

For the slow down, you might look at using non-quadratic approximate nearest neighbors methods
like HNSW or faiss from facebook.

Best,
Jason

William Gilmore

unread,
Jun 22, 2025, 12:32:27 PMJun 22
to golang-nuts
Jason,

Thank you for your response.  I will take a look at greenpack and the other algorithms that you mention.

Thank you!

lbe

Jason E. Aten

unread,
Jun 22, 2025, 1:19:59 PMJun 22
to golang-nuts
You can quickly play with HNSW in R to see if it does anything close to what you want.


Note that you might consider examining the small world links that it constructs
as they might approximate or be a close-enough proxy for your cliques in the first place;
although obviously the results are unlikely to match your current process exactly of course.

The HNSW theory paper is https://arxiv.org/abs/1603.09320

Faiss links:

Note it has GPU support which might give you speed up too.

Dan Kortschak

unread,
Jun 22, 2025, 4:32:58 PMJun 22
to golan...@googlegroups.com
On Sat, 2025-06-21 at 09:03 -0700, Learned Byerror wrote:
> Both gonum/vptree an gonum/graph/simple contain private fields.

vptree is entirely exported, being a tree of vptree.Node which has no
unexported field, rooted in vptree.Tree, also with no unexported field.
You can implement a serialiser for this by walking the tree and
deserialise by reversing this.

graph/simple graphs can be serialised via a variety of packages in the
graph hierarchy (these are in graph/encoding and graph/formats.

Note also that the graph implementations in simple and multi may not be
what you want. This is discussed in www.gonum.org/post/word_ladder/;
they are provided for OOTB work and for our testing needs. They may
also be used as a starting point for a copy/paste implementation where
you can add or adjust behaviour.

Dan

Learned Byerror

unread,
Jun 23, 2025, 4:36:02 PMJun 23
to golang-nuts
Dan,

Thank you very, very much!!!  I don't know what I was looking at when I thought that i saw private fields in vptree. WRT to graph, I knew about graph/encoding however for some reason I remembered it as being about to convert to dot for example, not convert dot to gonum/graph.  I still have some work to do before I can implement this in my app, but I think the light you shined into the dark portion(s) of my brain lit the path me.

Thank you again!

lbe 
Reply all
Reply to author
Forward
0 new messages