[Infra] Add generic CSV/coord graph loaders + Andorra OSM template for JMH benches

1 view

Skip to first unread message

Shai Eilat

unread,

May 19, 2026, 2:58:41 PM (2 days ago) May 19

to jgrapht-dev

Hi all,

Heads-up before opening a test-infrastructure PR. Test-scope only — no main-source change, no new public API in jgrapht-core, no behavioural difference to any algorithm.

The gap

Recent perf PRs have benefited a lot from real-graph benches (m2m PR #1340 and AllDirectedPaths PR #1341 both shipped JMH cells against the Andorra OSM road network), but every contributor has been rolling their own CSV loader and A* heuristic from scratch. As I'm queueing more perf work — batch point-to-point shortest paths, parallel coreness, incremental MST — that pattern needs to stop being one-off.

What's in the PR

Three commits in seilat/jgrapht:chore/perf-bench-graph-loaders, all under jgrapht-core/src/test/ (plus one test-scope dep in the parent pom.xml).

Commit 1 — generic utilities (org.jgrapht.perf.util)

WeightedEdgeListCsvReader<V, E> — reads src,dst,weight CSV (gzip or plain) into a provided Graph. Fluent options for vertex parser, header, delimiter, gzip-wrapping, and an addMissingVertices(boolean) toggle for callers that pre-add the vertex set in a specific order. Entry points: read(InputStream), readResource(Class, String), readFile(Path).
CoordinatesCsvReader<V> — same shape, reads id,lat,lon CSV into Map<V, double[]>.
HaversineHeuristic<V> — generic AStarAdmissibleHeuristic over a Function<V, double[]> coordinate lookup, default Earth radius with an override for non-Earth bodies.
JmhBenchRunner — small helper that runs a JMH bench class with forks=0 so it inherits the surefire JVM's argLine, and writes the text summary to a caller-supplied path. Useful when running benches via -Dtest=MyBenchSuite#runFoo rather than the command-line org.openjdk.jmh.Main entry point.

24 JUnit 5 tests cover the readers and heuristic (in-memory streams, gzip handling, line-numbered parse errors, custom parsers, missing-resource behaviour, etc.).

Commit 2 — Andorra OSM template (org.jgrapht.perf.shortestpath.osm)

AndorraGraphLoader — thin wrapper using the generic readers; returns the loaded graph plus a Map<Integer, double[]> of node coordinates that pairs with HaversineHeuristic. Adds isFixtureAvailable() and a helpful IllegalStateException pointing at the README when the CSV fixtures aren't on the classpath.
AndorraGraphLoaderSmokeTest — skips via Assumptions.assumeTrue when the fixture is absent, so the suite stays green on clean checkouts.
AndorraDijkstraManyToManyShortestPathsBench and AndorraAllDirectedPathsNonSimpleBench — AverageTime benchmark templates demonstrating the loader + JMH setup on the 36,618-vertex / 67,354-edge Andorra strongly-connected component.
AndorraBenchSuite — surefire @Test entry points dispatched through JmhBenchRunner.
src/test/resources/perf/osm/README.md — download URL, preprocessor invocation, and the "fixtures absent → tests skip" behaviour.

The raw 700 KB of Andorra CSV.gz fixtures are not committed: the data is freely re-derivable from Geofabrik under the Open Database License, changes upstream daily, and shipping a snapshot was the wrong default for a library repo. Contributors run the preprocessor once and drop the CSVs into src/test/resources/perf/osm/; the existing .gitignore *.gz rule keeps them from accidentally being committed.

Commit 3 — GPKG → CSV preprocessor (org.jgrapht.perf.util.GpkgRoadGraphPreprocessor)

The preprocessor that produces the CSVs is itself a Java class on the test classpath, not a Python or shell script:

Reads the Geofabrik GPKG via org.xerial:sqlite-jdbc (test scope, added to parent dependencyManagement).
Parses GPKG-wrapped WKB LINESTRING blobs by hand (the only geometry type the road layer uses — about 30 lines of ByteBuffer).
Snaps coordinates at 1e-7 deg to dedupe shared endpoints between adjacent line-strings.
Runs KosarajuStrongConnectivityInspector to find the largest Strongly Connected Component (SCC).
Dedupes parallel edges keeping the shortest Haversine weight.
Writes both gzipped CSVs in the schema the loader expects.

java.sql is a JDK module not currently required by org.jgrapht.core. Rather than polluting the production module-info.java for a test-only need, the additions live in <testCompilerArgs> and the surefire argLine (--add-modules java.sql --add-reads org.jgrapht.core=java.sql). The same class works against any Geofabrik free-tier region; I verified the byte stream produces a graph identical (to the level of vertex / edge counts and Dijkstra-reachability) to the smoke-test expectations on Andorra.

Why bother

The same utilities scale beyond Andorra. I've validated the readFile(Path) path locally against a Geofabrik Israel extract (2,338,873 vertices / 4,342,430 edges in the largest SCC, ~64× Andorra). Load time on this hardware: coords 0.7 s, vertex pre-add 0.3 s, edges 5.7 s, single full-target Dijkstra 1.9 s. The same scripts/andorra_to_csv.py works on any Geofabrik free-tier extract; the docstring spells out the schema assumptions.

Tests

24 new tests in org.jgrapht.perf.util (in-memory parsing, gzip, resource lookup, file I/O, error reporting).
AndorraGraphLoaderSmokeTest validates the worked example end-to-end when the fixture is present and skips cleanly when absent.
Full mvn -pl jgrapht-core test is green (6926 pass, 15 unrelated upstream skips).

Questions for the dev list

Package placement. org.jgrapht.perf.util for the generic readers / heuristic / runner / preprocessor; org.jgrapht.perf.shortestpath.osm for the Andorra worked example. Is that the right home, or do you want me to put the readers under a different package — org.jgrapht.test.util, org.jgrapht.bench.util, etc.?

New test dep + JPMS plumbing. sqlite-jdbc (test scope) is what the GPKG preprocessor needs; java.sql is reached via <testCompilerArgs> + surefire argLine so the production module-info.java stays unchanged. Comfortable with that pattern, or would you rather see requires static java.sql; (or requires java.sql;) declared in module-info.java?

Sample data policy. I deliberately ship zero raw data; the README points at Geofabrik. If the project would rather vendor a tiny sample fixture (~50-vertex hand-built graph, say) so the bench templates can run on a clean checkout without external setup, I'm happy to add one — but I didn't want to assume that on my own.

Anything else worth folding in before opening the PR — extra reader options (readDirected vs readUndirected toggles?), additional sample bench templates, naming nits?

I'll open the PR shortly unless someone wants me to hold for discussion.

Thanks, Shai

P.S. When I ran the new Java preprocessor on a Geofabrik Israel extract to validate it at scale, vertex 0 of the resulting graph landed at (29.5512804, 34.9536767) — the southernmost city in Israel, Eilat. Geofabrik exports gis_osm_roads_free in OSM way_id ascending, so vertex 0 is always the first node of the lowest-numbered way in the cut; in Israel that happens to be an early-digitized road in Eilat. The surname Eilat is held by ~835 people in Israel (and not many more globally), so the odds of this preprocessor landing your namesake at vertex 0 — given an Israeli developer wrote it — are around 1 in a million. I'll take it as the universe signing off on the PR.

John Sichi

unread,

May 20, 2026, 1:47:43 AM (2 days ago) May 20

to Shai Eilat, jgrapht-dev

Before answering your questions, it feels to me like the OSM portion wants to become its own new top-level jgrapht-osm module. Whether via preprocessing or otherwise, this would allow an application (not just a test) to import OSM data into a JGraphT graph. We've generally tried to keep those kinds of integrations out of jgrapht-core itself. The downside is that your new performance tests would live in the new module, rather than in core, but I don't think that's the end of the world.

That approach would also allow you to create a dependency on the existing CSV reader in the jgrapht-io module.

What do you think?

(A search led to https://github.com/NGXAUT/OSMToJGraphT but it's very old.)

--
You received this message because you are subscribed to the Google Groups "jgrapht-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgrapht-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgrapht-dev/b72dc594-6a5c-4661-aba1-ee8a934a6b82n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages