--
You received this message because you are subscribed to the Google Groups "MapDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi,
First thanks. It is nice to hear stories like this.
Would you be willing to share your code? Even confidentially would do. Or perhaps describe your configuration algorithms.
I just started optimizing last week, and there is still lot of space for improvements. It would be nice to have real world use case. I think in-memory import buffer and indexless store could boost performance bit more.
We already talked about your use case bit in Czech language while ago.
Thanks
Jan
To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+unsubscribe@googlegroups.com.
Interesting Thread. I stumbled across MapBD just the other day for the sole purpose of processing OSM data. I am blown away by the results that Martin and Andrew are reporting.
Last night I crammed all the nodes from osm planet into a simple TreeMap. It took 18 hours to generate a map of 2.7 billion nodes on my Windows 7 box (6 cores, 8GB RAM, x64, 7200 RPM SATA drive).
Here's the code I used to create the node map:
nodeDB = DBMaker.newFileDB(file)
.transactionDisable()
.mmapFileEnableIfSupported()
.closeOnJvmShutdown()
.make();
java.util.Map<Long, Double[]> nodes = nodeDB.getTreeMap("nodes");
Here's how I'm populating the map:
nodes.put(node.getID(), new Double[]{node.getLat(), node.getLon()});
I'm totally new to MapDB but my first impression was the 18 hours was not bad. I'm sure if I threw in an SSD, added more RAM, and ran Linux I would get a better results. Barring that, perhaps there's something I should do in my code to make things faster? For example, use a different map? Or store strings instead of longs and doubles? Or maybe a different configuration setting?
Regardless of whether the write performance can be improved, my real concern is retrieving values from the map.
It takes my computer approximately 30 minutes to parse 274 million ways. For each of these ways, I need to get a coordinate from the node map (MapDB).
Unfortunately, looking up specific nodes in the map is painfully slow. The lookup is simple:
for (long nodeID : way.getNodes()){
Double[] coord = nodes.get(nodeID);
}
I have tried to find nodes using a single thread and multiple threads. I have also experimented with various cache settings to no avail:
DBMaker maker = DBMaker.newFileDB(file);
maker.transactionDisable();
maker.mmapFileEnableIfSupported();
maker.cacheLRUEnable();
maker.cacheSize(1000000);
if (readOnly) maker.readOnly();
maker.closeOnJvmShutdown();
After running for 6 hours my app processed less than 1% of the 274 million ways.
Any suggestions on how to perform these lookups faster?
--
You received this message because you are subscribed to the Google Groups "MapDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapdb+un...@googlegroups.com.
Hi Peter,
MapDB 1.0 has read amplification performance bug, that is probably making your code slow. It is fixed (or improved a lot in 2.0).
> new Double[]{
Use primitive array, that makes lot of difference. Construct btreemap with something like:
nodeDB.createTreeMap("nodes")
.keySerializer(BTreeKeySerializer.POSITIVE_LONG)
.valueSerializer(Serializer.LONG_ARRAY)
.makeOrGet()
Also use data pump to create BTreeMap. Resulting BTree is already compacted and probably has better performance. Other option is to call db.compact() before running reads.
Also drop LRU cache. It is not designed for this stuff and probably slows it down.
Jan
Thanks to Eric's suggestion, I was able to create an HTreeMap with 2.8 billion nodes under 7 hours. That's 2x faster than my first attempt using a simple TreeMap. Also, the overall file size is almost 2x smaller than the original TreeMap. Very impressive!
Unfortunately, retrieving nodes from the map is still too slow. With 2.8 billion nodes in one HTreeMap, I estimate that it's going to take weeks to look up coordinates for all the ways.
I suspect smaller maps might be better than 1 large HTreeMap.
To test this theory, I created 28 HTreeMaps with 100 million nodes per map. It took almost 10 hours to create 28 HTreeMaps with 2.8 billion nodes. That's 3 hours more than a creating a single HTreeMap. The reason for this, is because it takes approximately 10 minutes to close each 100 million node HTreeMap. 10 minutes x 28 maps = 3 extra hours.
Next, I created an HTreeMap for all the way members.
HTreeMap<Long, double[]> wayMembers = wayMembersDB.createHashMap("way_members")
.keySerializer(Serializer.LONG).valueSerializer(Serializer.DOUBLE_ARRAY).makeOrGet();
The keys in the map represent Way IDs and values represent nodes. Each node is represented with 3 doubles (Node ID, Lat, Lon).
Way way = (Way) element;
long wayID = way.getID();
Long[] nodeIDs = way.getNodes();
double[] arr = new double[nodeIDs.length*3];
int idx = 0;
for (int i=0; i<arr.length; i++){
arr[i] = nodeIDs[idx];
arr[i+1] = nullVal; //lat
arr[i+2] = nullVal; //lon
i=i+2;
idx++;
}
wayMembers.put(wayID, arr);
Way Members with null coordinates took 2 hours to generate with 274 million ways.
The final step is to populate the coordinates in the Way Members map. I kicked off that process last night but I had to kill it this morning because I have to do some work today :-)
Anyway, at the rate it was going last night, I estimate that it's going to take approximately 5 days to look up all the coordinates for all the ways. I'm going to run a full test later this week on a dedicated production server and find out for sure. My goal is to get this process down to less than 30 hours.
Why 30 hours? Well, for smaller OSM extracts the process I outlined above works great. I was able to ingest 100 million nodes and look up coordinates for 10 million ways in <1 hour on my machine using the Africa OSM dataset (africa-latest.osm.pbf) from geofabrik. The entire global dataset is 30x larger so I'm hoping my workflow will scale linearly with the data. The initial test I ran last night isn't promising but the only way to know for sure is to do a full run.
In summary, HTreeMaps are great for inserting/storing large data. Inserts are fast. Looks-ups are painfully slow but manageable with smaller tables (e.g. 100 million records/map).
I'll post more updates when I can.
Thanks,
Peter
> Do you think BTreeMap look ups will be faster HTreeMap?
BTreeMap is faster in 2.0, not sure about 1.0.
jan
Hi,
just a small note, one of my customers is using MapDB for graphs and map data. So I expect there will be some improvements from that direction coming into MapDB soon.
Jan