Hi all,
Heads-up before opening a test-infrastructure PR. Test-scope only — no main-source change, no new public API in jgrapht-core, no behavioural difference to any algorithm.
Recent perf PRs have benefited a lot from real-graph benches (m2m PR #1340 and AllDirectedPaths PR #1341 both shipped JMH cells against the Andorra OSM road network), but every contributor has been rolling their own CSV loader and A* heuristic from scratch. As I'm queueing more perf work — batch point-to-point shortest paths, parallel coreness, incremental MST — that pattern needs to stop being one-off.
Three commits in seilat/jgrapht:chore/perf-bench-graph-loaders, all under jgrapht-core/src/test/ (plus one test-scope dep in the parent pom.xml).
Commit 1 — generic utilities (org.jgrapht.perf.util)
WeightedEdgeListCsvReader<V, E> — reads src,dst,weight CSV (gzip or plain) into a provided Graph. Fluent options for vertex parser, header, delimiter, gzip-wrapping, and an addMissingVertices(boolean) toggle for callers that pre-add the vertex set in a specific order. Entry points: read(InputStream), readResource(Class, String), readFile(Path).
CoordinatesCsvReader<V> — same shape, reads id,lat,lon CSV into Map<V, double[]>.
HaversineHeuristic<V> — generic AStarAdmissibleHeuristic over a Function<V, double[]> coordinate lookup, default Earth radius with an override for non-Earth bodies.
JmhBenchRunner — small helper that runs a JMH bench class with forks=0 so it inherits the surefire JVM's argLine, and writes the text summary to a caller-supplied path. Useful when running benches via -Dtest=MyBenchSuite#runFoo rather than the command-line org.openjdk.jmh.Main entry point.
24 JUnit 5 tests cover the readers and heuristic (in-memory streams, gzip handling, line-numbered parse errors, custom parsers, missing-resource behaviour, etc.).
Commit 2 — Andorra OSM template (org.jgrapht.perf.shortestpath.osm)
AndorraGraphLoader — thin wrapper using the generic readers; returns the loaded graph plus a Map<Integer, double[]> of node coordinates that pairs with HaversineHeuristic. Adds isFixtureAvailable() and a helpful IllegalStateException pointing at the README when the CSV fixtures aren't on the classpath.
AndorraGraphLoaderSmokeTest — skips via Assumptions.assumeTrue when the fixture is absent, so the suite stays green on clean checkouts.
AndorraDijkstraManyToManyShortestPathsBench and AndorraAllDirectedPathsNonSimpleBench — AverageTime benchmark templates demonstrating the loader + JMH setup on the 36,618-vertex / 67,354-edge Andorra strongly-connected component.
AndorraBenchSuite — surefire @Test entry points dispatched through JmhBenchRunner.
src/test/resources/perf/osm/README.md — download URL, preprocessor invocation, and the "fixtures absent → tests skip" behaviour.
The raw 700 KB of Andorra CSV.gz fixtures are not committed: the data is freely re-derivable from Geofabrik under the Open Database License, changes upstream daily, and shipping a snapshot was the wrong default for a library repo. Contributors run the preprocessor once and drop the CSVs into src/test/resources/perf/osm/; the existing .gitignore *.gz rule keeps them from accidentally being committed.
Commit 3 — GPKG → CSV preprocessor (org.jgrapht.perf.util.GpkgRoadGraphPreprocessor)
The preprocessor that produces the CSVs is itself a Java class on the test classpath, not a Python or shell script:
Reads the Geofabrik GPKG via org.xerial:sqlite-jdbc (test scope, added to parent dependencyManagement).
Parses GPKG-wrapped WKB LINESTRING blobs by hand (the only geometry type the road layer uses — about 30 lines of ByteBuffer).
Snaps coordinates at 1e-7 deg to dedupe shared endpoints between adjacent line-strings.
Runs KosarajuStrongConnectivityInspector to find the largest Strongly Connected Component (SCC).
Dedupes parallel edges keeping the shortest Haversine weight.
Writes both gzipped CSVs in the schema the loader expects.
java.sql is a JDK module not currently required by org.jgrapht.core. Rather than polluting the production module-info.java for a test-only need, the additions live in <testCompilerArgs> and the surefire argLine (--add-modules java.sql --add-reads org.jgrapht.core=java.sql). The same class works against any Geofabrik free-tier region; I verified the byte stream produces a graph identical (to the level of vertex / edge counts and Dijkstra-reachability) to the smoke-test expectations on Andorra.
The same utilities scale beyond Andorra. I've validated the readFile(Path) path locally against a Geofabrik Israel extract (2,338,873 vertices / 4,342,430 edges in the largest SCC, ~64× Andorra). Load time on this hardware: coords 0.7 s, vertex pre-add 0.3 s, edges 5.7 s, single full-target Dijkstra 1.9 s. The same scripts/andorra_to_csv.py works on any Geofabrik free-tier extract; the docstring spells out the schema assumptions.
24 new tests in org.jgrapht.perf.util (in-memory parsing, gzip, resource lookup, file I/O, error reporting).
AndorraGraphLoaderSmokeTest validates the worked example end-to-end when the fixture is present and skips cleanly when absent.
Full mvn -pl jgrapht-core test is green (6926 pass, 15 unrelated upstream skips).
Package placement. org.jgrapht.perf.util for the generic readers / heuristic / runner / preprocessor; org.jgrapht.perf.shortestpath.osm for the Andorra worked example. Is that the right home, or do you want me to put the readers under a different package — org.jgrapht.test.util, org.jgrapht.bench.util, etc.?
New test dep + JPMS plumbing. sqlite-jdbc (test scope) is what the GPKG preprocessor needs; java.sql is reached via <testCompilerArgs> + surefire argLine so the production module-info.java stays unchanged. Comfortable with that pattern, or would you rather see requires static java.sql; (or requires java.sql;) declared in module-info.java?
Sample data policy. I deliberately ship zero raw data; the README points at Geofabrik. If the project would rather vendor a tiny sample fixture (~50-vertex hand-built graph, say) so the bench templates can run on a clean checkout without external setup, I'm happy to add one — but I didn't want to assume that on my own.
Anything else worth folding in before opening the PR — extra reader options (readDirected vs readUndirected toggles?), additional sample bench templates, naming nits?
I'll open the PR shortly unless someone wants me to hold for discussion.
Thanks, Shai
--
You received this message because you are subscribed to the Google Groups "jgrapht-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgrapht-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgrapht-dev/b72dc594-6a5c-4661-aba1-ee8a934a6b82n%40googlegroups.com.
Hi John,
Thanks — that framing is a real improvement and I went with it. Pushed the rework to the same branch (seilat/jgrapht:chore/perf-bench-graph-loaders); the head is now 97a73df4cd chore(osm): refactor OSM integration into new jgrapht-osm module.
There is now a top-level jgrapht-osm module:
jgrapht-osm/
├── pom.xml # jgrapht-core (compile), jgrapht-io (compile), sqlite-jdbc (compile), JMH (test)
├── src/main/java/module-info.java # requires java.sql properly -- no --add-reads gymnastics
├── src/main/java/org/jgrapht/osm/
│ ├── GpkgRoadGraphPreprocessor.java # GPKG -> headerless CSVs (production, callable as main() or library)
│ ├── OsmCsvGraphLoader.java # CSV -> Graph<Integer, DefaultWeightedEdge>, wraps jgrapht-io CSVImporter
│ ├── OsmCoordinatesReader.java # CSV -> Map<Integer, double[]>
│ └── HaversineHeuristic.java # generic AStarAdmissibleHeuristic
└── src/test/java/org/jgrapht/osm/
├── AndorraGraphLoader.java # convenience fixture helper
├── AndorraGraphLoaderSmokeTest.java
├── OsmCsvGraphLoaderTest.java
├── OsmCoordinatesReaderTest.java
├── HaversineHeuristicTest.java
└── perf/ # JMH templates (m2m, ADP) + AndorraBenchSuite + JmhBenchRunner
Concretely:
The edge-list reader is now org.jgrapht.nio.csv.CSVImporter from jgrapht-io. The preprocessor writes headerless CSVs (src,dst,weight) so the importer consumes them with CSVFormat.EDGE_LIST + EDGE_WEIGHTS=true and vertexFactory=Integer::valueOf. My old WeightedEdgeListCsvReader is gone.
OsmCoordinatesReader is the only ad-hoc reader I kept: the coordinate CSV is a flat key-value table, not graph-shaped, so the existing importers do not apply. Twenty lines of BufferedReader plus parse-error line tagging.
HaversineHeuristic is now production code in jgrapht-osm (was test-scope perf-util in core). The module-info declares requires java.sql; so the <testCompilerArgs> + surefire argLine hack is gone from jgrapht-core/pom.xml.
sqlite-jdbc is now a compile dep of jgrapht-osm only. Version still managed in the parent dependencyManagement. Nothing in jgrapht-core depends on it.
Perf benches (Andorra m2m, AllDirectedPaths walk-enumeration, the batch shortest-paths bench queued for the follow-up PR) live in jgrapht-osm/src/test/java/org/jgrapht/osm/perf/. Surefire's <exclude>**/perf/**</exclude> keeps them out of the default test run, same as the existing org.jgrapht.perf.* packages in jgrapht-core.
mvn -pl jgrapht-osm test with locally-produced Andorra fixtures: 16 tests pass.
mvn -pl jgrapht-osm test without fixtures: 15 pass, 1 skip on the Andorra smoke test via Assumptions.assumeTrue.
mvn -pl jgrapht-core test: 6914 pass, 15 unrelated upstream skips. Net delta from master is the deletion of the moved perf-util / osm test classes.
Andorra round-trip end-to-end through the new module: GpkgRoadGraphPreprocessor → fixtures → AndorraGraphLoader.load() → DijkstraShortestPath on 36,618 V / 67,354 E.
Smoke of AndorraDijkstraManyToManyShortestPathsBench and the queued AndorraBatchShortestPathsBench via org.openjdk.jmh.Main from the new module's test classpath: both run end-to-end.
GpkgRoadGraphPreprocessor is the only piece where I want explicit sign-off, because it is what brings sqlite-jdbc into the module's compile classpath. Right now it lives in src/main so users can wire it into their own apps, not only into JMH benches — that matches the "import OSM data into JGraphT graphs" framing from your earlier note.
If the project would rather not have a sqlite-jdbc compile dependency on a JGraphT module, the easy fallback is to move the preprocessor into src/test or an examples source root, and ship jgrapht-osm proper as CSV-loader + coordinates + Haversine only. That keeps the module dependency-light at the cost of forcing users to either fetch pre-produced CSVs or run the preprocessor as a one-shot tool outside the main module API.
My preference is to keep the current shape, because without the preprocessor the module is mostly a CSV loader for a format users must produce externally — but this is the part I am least attached to.
module-info has requires transitive org.jgrapht.core and requires org.jgrapht.io. jgrapht.io is non-transitive because the loader wraps CSVImporter and consumers do not need to see the importer directly. Happy to flip if there is a project convention I am missing.
AndorraBenchSuite (@Test methods that dispatch JMH via JmhBenchRunner) is convenience only — a way to run a bounded smoke cell from Maven without remembering a long JMH command. The canonical benchmark path stays org.openjdk.jmh.Main. If the project prefers not to mix JUnit dispatch with JMH at all, dropping the suite and the runner is a one-class delete and does not touch the module's public API.
I will wait on this thread before opening the PR. I will keep the follow-up BatchShortestPathAlgorithm PR separate so this PR stays focused on OSM/test-infra plumbing.
Thanks again for the steer.
Shai
To view this discussion visit https://groups.google.com/d/msgid/jgrapht-dev/a65c57ba-4184-45c0-92bc-dc10876e2c3dn%40googlegroups.com.