setting the tmp directory

394 views
Skip to first unread message

Elliot Chow

unread,
Oct 27, 2012, 4:14:36 PM10/27/12
to scoob...@googlegroups.com
Would it be possible to have the ability to set the location of the tmp directory?  Right now, it is hard-coded to .scoobi-tmp under my home directory.  On our cluster, we have a relatively small quota in each user's home directory, but we have a more space in our group's directory.  I think I am running into problems for jobs with a large of intermediate data.

Right now, it seems like this is not configurable:

  private lazy val scoobiTmpDir = FileSystem.get(configuration).getHomeDirectory.toUri.getPath + "/.scoobi-tmp/"


Eric Springer

unread,
Oct 27, 2012, 9:47:38 PM10/27/12
to scoob...@googlegroups.com
Thanks Elliot, looks like a regression. The base directory should be
overridable by setting "scoobi.workdir". I created an issue for it:

https://github.com/NICTA/scoobi/issues/163

Thanks!

Eric Torreborre

unread,
Oct 29, 2012, 3:07:39 AM10/29/12
to scoob...@googlegroups.com
Hi Elliot,

I fixed this issue today, but the fix relies on adding a new method to the ScoobiConfiguration object (setWorkingDir). Since this information is Scoobi specific, I want to abstract the way it is stored from the api client (even if it is effectively stored in the Hadoop Configuration object for now). So, with the latest SNAPSHOT you will be able to write:

object WordCount extends ScoobiApp {
  configuration.setScoobiDir("~/shared-drive")

  def run() {
     ....
  }
 
}

The new SNAPSHOT jar should be published tomorrow as we still have some issues on our Jenkins server which might be too loaded at the moment (this is a shared resource among several teams).

Cheers,

Eric.

Elliot Chow

unread,
Oct 29, 2012, 8:59:03 PM10/29/12
to scoob...@googlegroups.com
awesome. thanks!
Message has been deleted
Message has been deleted

Elliot Chow

unread,
Oct 31, 2012, 2:16:53 AM10/31/12
to scoob...@googlegroups.com
Hi,

I am getting a new error - is it related to this change (see below)?  For this particular case, I did not try to set the scoobi directory.


Exception in thread "main" java.io.IOException: Failed to set permissions of path: file:/user/ellchow/.scoobi/scoobi-20121030-203155-ABTestBidData$-e5cb455e-ee00-4e18-ae0f-d3e6d8839250/staging/ellchow-1001159255/.staging to 0700
        at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:499)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
        at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:798)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:792)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:792)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
        at com.nicta.scoobi.impl.exec.MapReduceJob$$anonfun$executeJob$1.apply(MapReduceJob.scala:238)
        at com.nicta.scoobi.impl.exec.MapReduceJob$$anonfun$executeJob$1.apply(MapReduceJob.scala:232)
        at scalaz.syntax.IdOps$class.$bar$greater(IdOps.scala:15)
        at scalaz.syntax.ToIdOps$$anon$1.$bar$greater(IdOps.scala:68)
        at com.nicta.scoobi.impl.exec.MapReduceJob.run(MapReduceJob.scala:90)
        at com.nicta.scoobi.impl.exec.Executor$.executeMSCR(Executor.scala:140)
        at com.nicta.scoobi.impl.exec.Executor$.executeArr(Executor.scala:123)
        at com.nicta.scoobi.impl.exec.Executor$.com$nicta$scoobi$impl$exec$Executor$$executeOnce(Executor.scala:108)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at scala.collection.immutable.Set$Set4.foreach(Set.scala:149)
        at com.nicta.scoobi.impl.exec.Executor$.executeMSCR(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$.executeArr(Executor.scala:123)
        at com.nicta.scoobi.impl.exec.Executor$.com$nicta$scoobi$impl$exec$Executor$$executeOnce(Executor.scala:106)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at scala.collection.immutable.Set$Set4.foreach(Set.scala:149)
        at com.nicta.scoobi.impl.exec.Executor$.executeMSCR(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$.executeArr(Executor.scala:123)
        at com.nicta.scoobi.impl.exec.Executor$.com$nicta$scoobi$impl$exec$Executor$$executeOnce(Executor.scala:106)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at scala.collection.immutable.Set$Set4.foreach(Set.scala:149)
        at com.nicta.scoobi.impl.exec.Executor$.executeMSCR(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$.executeArr(Executor.scala:123)
        at com.nicta.scoobi.impl.exec.Executor$.com$nicta$scoobi$impl$exec$Executor$$executeOnce(Executor.scala:106)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$$anonfun$executeMSCR$1.apply(Executor.scala:132)
        at scala.collection.immutable.Set$Set2.foreach(Set.scala:106)
        at com.nicta.scoobi.impl.exec.Executor$.executeMSCR(Executor.scala:132)
        at com.nicta.scoobi.impl.exec.Executor$.executeArr(Executor.scala:123)
        at com.nicta.scoobi.impl.exec.Executor$.executeArrOutput(Executor.scala:119)
        at com.nicta.scoobi.application.HadoopMode$$anonfun$executeDListPersister$1.apply(HadoopMode.scala:119)
        at com.nicta.scoobi.application.HadoopMode$$anonfun$executeDListPersister$1.apply(HadoopMode.scala:116)
        at scalaz.package$State$$anon$1.apply(package.scala:127)
        at scalaz.package$State$$anon$1.apply(package.scala:126)
        at scalaz.StateT$class.eval(StateT.scala:26)
        at scalaz.package$State$$anon$1.eval(package.scala:126)
        at com.nicta.scoobi.application.Persister$$anon$2.apply(Persister.scala:76)
        at com.nicta.scoobi.application.Persister$.persist(Persister.scala:58)
        at com.nicta.scoobi.Persist$class.persist(Scoobi.scala:52)
        at com.nicta.scoobi.Scoobi$.persist(Scoobi.scala:23)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.run(ABTestBidData.scala:98)
        at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply$mcV$sp(ScoobiApp.scala:75)
        at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply(ScoobiApp.scala:72)
        at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply(ScoobiApp.scala:72)
        at com.nicta.scoobi.application.Hadoop$class.runOnCluster(Hadoop.scala:81)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.runOnCluster(ABTestBidData.scala:31)
        at com.nicta.scoobi.application.Hadoop$class.executeOnCluster(Hadoop.scala:55)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.executeOnCluster(ABTestBidData.scala:31)
        at com.nicta.scoobi.application.Hadoop$$anonfun$onCluster$1.apply(Hadoop.scala:41)
        at com.nicta.scoobi.application.InMemoryHadoop$class.withTimer(InMemory.scala:49)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.withTimer(ABTestBidData.scala:31)
        at com.nicta.scoobi.application.InMemoryHadoop$class.showTime(InMemory.scala:57)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.showTime(ABTestBidData.scala:31)
        at com.nicta.scoobi.application.Hadoop$class.onCluster(Hadoop.scala:41)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.onCluster(ABTestBidData.scala:31)
        at com.nicta.scoobi.application.Hadoop$class.onHadoop(Hadoop.scala:47)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.onHadoop(ABTestBidData.scala:31)
        at com.nicta.scoobi.application.ScoobiApp$class.main(ScoobiApp.scala:72)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData$.main(ABTestBidData.scala:31)
        at com.ebay.mlr.dp.hadoop.jobs.ABTestBidData.main(ABTestBidData.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I then tried setting the directory as you described, but it does not seem to set it properly, resulting in the same error.  When I set it on the command line with -Dscoobi.dir=/tmp/directory/of/choice, it seems to set the directory correctly and the job starts, but then crashes (see below).  For some reason, it appears to be running in local mode.

[INFO] LocalJobRunner -
[INFO] Step - Map  95%    Reduce   0%
[INFO] LocalJobRunner -
[INFO] Step - Map  97%    Reduce   0%
[INFO] MapTask - Starting flush of map output
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop01/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop02/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop03/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop04/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop05/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop06/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop07/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop08/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop09/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop10/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop11/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[WARN] LocalDirAllocator$AllocatorPerContext - Failed to create /hadoop12/scratch/taskTracker/ellchow/jobcache/job_local_0001/attempt_local_0001_m_000000_0
[INFO] LocalJobRunner -
[INFO] Step - Map 100%    Reduce   0%
[INFO] LocalJobRunner -
[WARN] LocalJobRunner - job_local_0001 <java.lang.IllegalArgumentException: n must be positive>java.lang.IllegalArgumentException: n must be positive
        at java.util.Random.nextInt(Random.java:250)
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:283)
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:323)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
        at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1274)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1182)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:608)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:676)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)


On Monday, October 29, 2012 12:07:39 AM UTC-7, Eric Torreborre wrote:

Ben Lever

unread,
Nov 6, 2012, 5:09:16 AM11/6/12
to scoob...@googlegroups.com
Hi Elliot,

I'm currently getting an error that could be related to the one you've posted about - i.e. trying to run locally when it should be running on the cluster. It's occurring on a different cluster to the one we test against as part of continuous integration, which is why this bug has snuck through, but my suspicion is the problem entered at 75d849.

We're currently debugging this one so will let you know once we've got to the bottom of it, and hopefully it fixes your issue too.

Cheers,
Ben.

Alex Cozzi

unread,
Nov 6, 2012, 1:59:21 PM11/6/12
to scoob...@googlegroups.com
I was also thinking that it would be good for scoobi to conform to most other hadoop languages (pig/hive etc) and put temporary results under hdfs:/tmp instead of inside the user directory: on our cluster, for example, /tmp is exempt from user's quota restrictions, while the user's home directory is limited. 
I know that thanks to the latest change the tmp is now configurable, but having a different default will help new users on most clusters.

Alex

Ben Lever

unread,
Nov 7, 2012, 7:47:51 PM11/7/12
to scoob...@googlegroups.com
Hi Elliot - we've pushed a change to master now that may fix this problem for you. Would you be able to try that out?


On Wednesday, October 31, 2012 5:16:53 PM UTC+11, Elliot Chow wrote:

Elliot Chow

unread,
Nov 8, 2012, 11:04:37 PM11/8/12
to scoob...@googlegroups.com
Were these changes also pushed to the cdh3 branch?  I tried compiling and using that branch and it looks like it still tries to run locally.

I'll try again to make sure I didn't mess anything up.

Ben Lever

unread,
Nov 8, 2012, 11:15:07 PM11/8/12
to scoob...@googlegroups.com
The commit that fixed my problem was 217ac9. This commit has been merged in with the cdh3 branch.

If that commit doesn't fix your problem, it would be really handy to run it with debugging on and share the trace with us. Just add "-- scoobi verbose.all.[scoobi]" to the end of your command line parameters.

Cheers!

Elliot Chow

unread,
Nov 11, 2012, 1:24:14 PM11/11/12
to scoob...@googlegroups.com
Looks like it works fine.  Must not have published properly to my local repository.  Thanks!
Reply all
Reply to author
Forward
0 new messages