Hi all,
I have trouble scaling the Universal Recommender to a dataset with 250M events (purchase, view, atb). It trains ok on a couple of million events, but the training time becomes very long (>48h) on the large dataset.
Hardware specs:
Input data size format:
Train command:
pio train -- --driver-memory 64G --executor-memory 8G --executor-cores 2
I have used various variations with driver, executor memory and number of cores, but the training time does not seem to be affected by this.
Spark UI tells me the save method (collect > $plus$plus) in URModel.scala takes a very long time. See attached dumps of the Spark UI for details.
Any suggestions?
Thanks, Bolmo