Hi all,
Heads-up before opening a test-infrastructure PR. Test-scope only — no main-source change, no new public API in jgrapht-core, no behavioural difference to any algorithm.
Recent perf PRs have benefited a lot from real-graph benches (m2m PR #1340 and AllDirectedPaths PR #1341 both shipped JMH cells against the Andorra OSM road network), but every contributor has been rolling their own CSV loader and A* heuristic from scratch. As I'm queueing more perf work — batch point-to-point shortest paths, parallel coreness, incremental MST — that pattern needs to stop being one-off.
Three commits in seilat/jgrapht:chore/perf-bench-graph-loaders, all under jgrapht-core/src/test/ (plus one test-scope dep in the parent pom.xml).
Commit 1 — generic utilities (org.jgrapht.perf.util)
WeightedEdgeListCsvReader<V, E> — reads src,dst,weight CSV (gzip or plain) into a provided Graph. Fluent options for vertex parser, header, delimiter, gzip-wrapping, and an addMissingVertices(boolean) toggle for callers that pre-add the vertex set in a specific order. Entry points: read(InputStream), readResource(Class, String), readFile(Path).
CoordinatesCsvReader<V> — same shape, reads id,lat,lon CSV into Map<V, double[]>.
HaversineHeuristic<V> — generic AStarAdmissibleHeuristic over a Function<V, double[]> coordinate lookup, default Earth radius with an override for non-Earth bodies.
JmhBenchRunner — small helper that runs a JMH bench class with forks=0 so it inherits the surefire JVM's argLine, and writes the text summary to a caller-supplied path. Useful when running benches via -Dtest=MyBenchSuite#runFoo rather than the command-line org.openjdk.jmh.Main entry point.
24 JUnit 5 tests cover the readers and heuristic (in-memory streams, gzip handling, line-numbered parse errors, custom parsers, missing-resource behaviour, etc.).
Commit 2 — Andorra OSM template (org.jgrapht.perf.shortestpath.osm)
AndorraGraphLoader — thin wrapper using the generic readers; returns the loaded graph plus a Map<Integer, double[]> of node coordinates that pairs with HaversineHeuristic. Adds isFixtureAvailable() and a helpful IllegalStateException pointing at the README when the CSV fixtures aren't on the classpath.
AndorraGraphLoaderSmokeTest — skips via Assumptions.assumeTrue when the fixture is absent, so the suite stays green on clean checkouts.
AndorraDijkstraManyToManyShortestPathsBench and AndorraAllDirectedPathsNonSimpleBench — AverageTime benchmark templates demonstrating the loader + JMH setup on the 36,618-vertex / 67,354-edge Andorra strongly-connected component.
AndorraBenchSuite — surefire @Test entry points dispatched through JmhBenchRunner.
src/test/resources/perf/osm/README.md — download URL, preprocessor invocation, and the "fixtures absent → tests skip" behaviour.
The raw 700 KB of Andorra CSV.gz fixtures are not committed: the data is freely re-derivable from Geofabrik under the Open Database License, changes upstream daily, and shipping a snapshot was the wrong default for a library repo. Contributors run the preprocessor once and drop the CSVs into src/test/resources/perf/osm/; the existing .gitignore *.gz rule keeps them from accidentally being committed.
Commit 3 — GPKG → CSV preprocessor (org.jgrapht.perf.util.GpkgRoadGraphPreprocessor)
The preprocessor that produces the CSVs is itself a Java class on the test classpath, not a Python or shell script:
Reads the Geofabrik GPKG via org.xerial:sqlite-jdbc (test scope, added to parent dependencyManagement).
Parses GPKG-wrapped WKB LINESTRING blobs by hand (the only geometry type the road layer uses — about 30 lines of ByteBuffer).
Snaps coordinates at 1e-7 deg to dedupe shared endpoints between adjacent line-strings.
Runs KosarajuStrongConnectivityInspector to find the largest Strongly Connected Component (SCC).
Dedupes parallel edges keeping the shortest Haversine weight.
Writes both gzipped CSVs in the schema the loader expects.
java.sql is a JDK module not currently required by org.jgrapht.core. Rather than polluting the production module-info.java for a test-only need, the additions live in <testCompilerArgs> and the surefire argLine (--add-modules java.sql --add-reads org.jgrapht.core=java.sql). The same class works against any Geofabrik free-tier region; I verified the byte stream produces a graph identical (to the level of vertex / edge counts and Dijkstra-reachability) to the smoke-test expectations on Andorra.
The same utilities scale beyond Andorra. I've validated the readFile(Path) path locally against a Geofabrik Israel extract (2,338,873 vertices / 4,342,430 edges in the largest SCC, ~64× Andorra). Load time on this hardware: coords 0.7 s, vertex pre-add 0.3 s, edges 5.7 s, single full-target Dijkstra 1.9 s. The same scripts/andorra_to_csv.py works on any Geofabrik free-tier extract; the docstring spells out the schema assumptions.
24 new tests in org.jgrapht.perf.util (in-memory parsing, gzip, resource lookup, file I/O, error reporting).
AndorraGraphLoaderSmokeTest validates the worked example end-to-end when the fixture is present and skips cleanly when absent.
Full mvn -pl jgrapht-core test is green (6926 pass, 15 unrelated upstream skips).
Package placement. org.jgrapht.perf.util for the generic readers / heuristic / runner / preprocessor; org.jgrapht.perf.shortestpath.osm for the Andorra worked example. Is that the right home, or do you want me to put the readers under a different package — org.jgrapht.test.util, org.jgrapht.bench.util, etc.?
New test dep + JPMS plumbing. sqlite-jdbc (test scope) is what the GPKG preprocessor needs; java.sql is reached via <testCompilerArgs> + surefire argLine so the production module-info.java stays unchanged. Comfortable with that pattern, or would you rather see requires static java.sql; (or requires java.sql;) declared in module-info.java?
Sample data policy. I deliberately ship zero raw data; the README points at Geofabrik. If the project would rather vendor a tiny sample fixture (~50-vertex hand-built graph, say) so the bench templates can run on a clean checkout without external setup, I'm happy to add one — but I didn't want to assume that on my own.
Anything else worth folding in before opening the PR — extra reader options (readDirected vs readUndirected toggles?), additional sample bench templates, naming nits?
I'll open the PR shortly unless someone wants me to hold for discussion.
Thanks, Shai
--
You received this message because you are subscribed to the Google Groups "jgrapht-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgrapht-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgrapht-dev/b72dc594-6a5c-4661-aba1-ee8a934a6b82n%40googlegroups.com.