--
You received this message because you are subscribed to the Google Groups "hydra-oss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hydra-oss+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thanks Ian! I tried setting up parallel job chains (one without multiplex), since I don't have a quick way to pull files out of muxy directories. However, now I can't reproduce the issue (in either multiplexed or non-multiplexed scenarios). I'll see if I can re-create it & report back.To your point about the task.queue.depth/buffers/workers tweaks, we are actively trying to avoid OOM errors that occur in this archiving job. I'm not sure what the definition of "super giant bundles" is though? We have bundles ranging from 1K to a few MB (gzipped size) - at least at this archive step where each input file to the multiplexing job contains a single-line JSON payload that we haven't parsed yet.So far, the only way I've found to prevent OOM errors occurring during this job step is to effectively disabling buffering & output bundle cache (the params I mentioned previously), plus maxBundles: 1, bufferSizeRatio: 1 in the output section. I've also enabled waitForDiskFlushThread. Are there any other levers to pull to control job memory usage?
--Rory Douglas
Senior Software Developer | NewBrand
o: 619.630.9179
Connect with me on LinkedIn
Kui Zhang
Senior Software Developer | NewBrand
703.867.4229
It might not make sense to have a 20MB block size and a 5MB total buffer allowance. Also need to be careful when lowering these values as they can dramatically increase the stream fragmentation on disk (impacting later read efficiency).
In theory, 150MB max cache size should be well under the amount in your OOM example, so there is probably another thing going on. That said, if your read speeds seem fine for your needs, then go nuts I guess.
Well, it depends how many files you're writing simultaneously per directory and what your hard drive setup is. If the former is single digits or the latter is "all ssds", then you're probably fine. If not, then it I guess it scales along those lines up to where "100 files per directory at once" means probably not fine.
Having to stop at 5MB makes me think your writes are some strange cases like a bunch of directories per task and bursty writes or something. Newer versions of muxy / netty might be better at handling those. On the other hand, if you're writing to a lot of directories, maybe you aren't writing a lot of files per directory -- if your shards each have their own directory or something, that could easily be the case.