We have a ~20TB (and growing) installation of cdx-server here at Stanford Library. We're running into some scaling problems that we'd like some feedback on.
What is the best configuration for large (>100GB) CDX files? We're currently using a single CDX file for our instance and each time we want to add more content, we have to sort/merge the whole thing again. Is there another configuration that supports incremental indexing, like WatchedCDXSource
?
Does anyone have some rough performance characteristics for the CDX generation code (bin/cdx-indexer
)? Is it CPU or IO intensive?
What are other institutions using for their filesystem storage of WARC files? And, how are you able to grow that over time? We are limited in our options since our NetApp storage is shared by many stakeholders here. So, we're looking at having to deal with multiple NFS mounts.
--
You received this message because you are subscribed to the Google Groups "openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-d...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-dev+unsubscribe@googlegroups.com.
What is the best configuration for large (>100GB) CDX files? We're currently using a single CDX file for our instance and each time we want to add more content, we have to sort/merge the whole thing again. Is there another configuration that supports incremental indexing, like
WatchedCDXSource
?
Does anyone have some rough performance characteristics for the CDX generation code (
bin/cdx-indexer
)? Is it CPU or IO intensive?
What are other institutions using for their filesystem storage of WARC files? And, how are you able to grow that over time? We are limited in our options since our NetApp storage is shared by many stakeholders here. So, we're looking at having to deal with multiple NFS mounts.