Strange naming of steps/targets when Drake asks user to confirm y/n

20 views
Skip to first unread message

Alza

unread,
Mar 5, 2015, 9:43:26 AM3/5/15
to drake-w...@googlegroups.com
Hi there,

I am trying out Drake for a project, I have a question around the display of the step/target names when running Drake..

I have a simple Drakefile with one step, with two outputs as follows:

1) to hdfs, e.g. "hdfs://localhost:8020/tmp/somefile" 
2) to /tmp, e.g. /tmp/somefile.md5

However, when I run Drake and it asks me to confirm the step running order, the step is displayed as:

1: /my/current/directory/././hdfs://localhost:8020/tmp/somefile, /my/current/directory/././/tmp/somefile.md5 <-  [missing output]

Is there something I am doing wrong? I was expecting something easier to read like:

1: hdfs://localhost:8020/tmp/somefile, /tmp/somefile.md5 <-  [missing output]

Thanks

Alex.

Alan Malloy

unread,
Mar 5, 2015, 3:28:24 PM3/5/15
to drake-w...@googlegroups.com
In order to specify files which are not relative to your $BASE directory, you have to precede them with a ! character. So, !/tmp/somefile, and !hdfs://localhost:8020/tmp/somefile. I agree it seems bizarre that drake can't figure this out itself, but that seems to be how it is implemented, and it's in the spec that any file that doesn't have ! in front of it is interpreted relative to your BASE - see the section "special variable: BASE". So, I think it's working as intended, but the intention is kinda silly and I wouldn't mind a pull request to fix this (in which case I'd amend the spec).

Artem Boytsov

unread,
Mar 5, 2015, 4:26:11 PM3/5/15
to Alan Malloy, drake-w...@googlegroups.com
Hardcoding absolutely filenames into the workflow is bad practice - it makes it very hard to run two versions of it, run the same workflow with different data, share the workflow etc. similarly to how we'd all frown at hardcoding absolute file paths into a Makefile. 

I feel like the problem is not as much $BASE idea, but multiple filesystems, and the right solution would be to add support for multiple filesystems into $BASE, for example:

BASE=/tmp (sets base for local file system)
BASE=hdfs://localhost:8028/tmp (sets base for HDFS file system)

a <- hdfs:b (translated into /tmp/a <- hdfs://localhost:8028/tmp/b)

To make this work nicely with workflows that don't run across multiple filesystems, we might need to add another variable to indicate the base filesystem. For example:

BASE=hdfs://localhost:8028/tmp

hdfs:a <- hdfs:b (inconvenient)

BASE_FS=hdfs

a <- b (same)

Having said that, this particular example doesn't actually need extended use of "!" even if we want to treat all filenames as absolute, because Drake could identify these filenames as absolute (i.e. starting with "/" for local, or filesystem prefix "hdfs:/"). This looks like a bug/oversight which can be easily fixed.

The intention of "!" is not to allow specifying absolute paths, but specifying off-base path, which can be relative to CWD. But since absolute paths can be detected automatically, describing it as "off-base" is not optimal - it seems better to describe is as CWD-base (rather than off-base), and say that whenever a filename is prepended with "!", it's relative to CWD rather than BASE. Since the default BASE is CWD, this should make no difference in this case. And it only works with local filesystems, so we could issue an error in other cases.

Let me know what you think.

Yours,
  Artem.

--
You received this message because you are subscribed to the Google Groups "drake-workflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to drake-workflo...@googlegroups.com.
Visit this group at http://groups.google.com/group/drake-workflow.
For more options, visit https://groups.google.com/d/optout.



--
Artem.

Alza

unread,
Mar 6, 2015, 10:06:38 AM3/6/15
to drake-w...@googlegroups.com, al...@factual.com
Thanks all for the responses.

After hitting the problem I looked in the documentation, and after a while I realised I was running into the $BASE behaviour and so I used '!' to have Drake understand that my inputs and outputs are absolute.

As a newcomer, I would say that it would be nice if the defaults were intuitive, so if inputs or outputs use an absolute url or absolute path, these are treated as such.

I think this would greatly help beginners who are just trying to find their way around and experiment etc, since it's probably the behaviour they would expect. Any steps initially written using absolute urls or paths could easily be changed at a later point to be relative and use $BASE, if needed.

I understand and agree with the point about absolute urls and paths being bad practice, but I don't think what is being suggested would conflict with the $BASE behaviour, since that would surely only apply if inputs and outputs were not absolute? As such, it would be great if this change could be introduced for absolute urls and paths.

Thanks

Alex.

Artem Boytsov

unread,
Mar 12, 2015, 1:09:19 AM3/12/15
to Alza, drake-w...@googlegroups.com, Alan Malloy
I agree with your sentiment. I strongly feel that default Drake behavior should be CWD, and I think the proposal I outlined in my previous email should fix the existing problems with $BASE. What do you think? Would you still find it unintuitive, and why?

Aaron, Alan, please chime in!

Yours,
  Artem.

Alan Malloy

unread,
Mar 16, 2015, 9:08:52 PM3/16/15
to drake-w...@googlegroups.com
I just committed https://github.com/Factual/drake/commit/35a144a34c29275e02b076c58cf5789e91436456, which auto-detects absolute paths and scheme-specific paths. It doesn't do anything fancier like per-backend BASE, just fixes the confusing behavior of absolute paths.


On Thursday, March 5, 2015 at 6:43:26 AM UTC-8, Alza wrote:

Alza

unread,
Mar 17, 2015, 10:34:00 AM3/17/15
to drake-w...@googlegroups.com
Great, thanks! I'm hoping to try it out this week and report back

Alza

unread,
Mar 17, 2015, 8:31:32 PM3/17/15
to drake-w...@googlegroups.com
Ok I just did some quick testing, and seems to be working as described earlier in the thread, i.e. absolute paths and absolute hdfs urls are treated as such, and '!' is no longer required as a workaround.

Great stuff :)
Reply all
Reply to author
Forward
0 new messages