Tez doesn’t provide a distributed cache. it relies on the YARN equivalent of local resources (or something, the naming is very confusing).
Cascading does go through great pains to emulate the distributed cache behavior by adding the things that would be in the MR distcache to the YARN resource interface. fwiw, it also pre-configures YARN to recognize the ‘lib’ folder of the job jar — if any. we do this so users moving from MR to Tez don’t have to use YARN apis do get the same behaviors.
not to say there there aren’t bugs, but for external libraries (not stuck into the lib folder of the job jar) to show up on disk in the remote CLASSPATH this mechanism would need to work. there is no other way external libs will show up ‘local’ to the job jar once on the cluster.
so if we weren’t pushing jars into the cluster to be locally loaded into the local CLASSPATH, the jobs would fail. not go slow.
maybe i’m missing the issue.
or its an issue with YARN.
ckw