Rebuilding tup database to account for pre-existing files?

53 views
Skip to first unread message

Sam Sam

unread,
May 18, 2021, 10:20:09 AM5/18/21
to tup-users
HI everyone,

I'm trying to use tup to model a data-oriented pipeline where several programs are pipelined to generate large datasets. So far tup looked great, but I was wondering how could I init a project where some target are already generated.

Simply doing a tup init then running tup expectedly fails with tup complaining that a generated node already exists. 

Is this a use case that is definitely out of scope for tup ?

Thanks a lot,
Sam

Mike Shal

unread,
May 19, 2021, 1:28:49 PM5/19/21
to tup-...@googlegroups.com
Hi Sam,

I'm not sure I fully understand your workflow. Are you looking for a way of turning the "generated node already exists" error into a warning message and allow tup to overwrite the file? I think I had plans to add a flag for that, but it doesn't exist yet. The workaround for this case is to remove the files yourself (or start with a fresh checkout of your tree before experimenting with tup). Since the file is removed this way, tup then needs to re-generate the output, of course. Adding the flag should be pretty straightforward to help with cases where it's impractical to remove a large number of files.

If you're trying to find a way to get tup to recognize a file as an output in the database *and* avoid having to run the command with tup to generate it, I think that would be problematic. Without running the command in tup, you wouldn't allow discovery of all the normal inputs to the command that are determined from the FUSE or DLL/shared library wrappers. Tup then wouldn't know when it needs to re-run the command.

Some background on sticky inputs (those you list in Tupfiles) and normal inputs (those discovered by wrapping and running the program) are here: http://gittup.org/tup/ex_dependencies.html and http://gittup.org/tup/ex_generated_header.html

Let me know if I misunderstood your use case. If you can show it with a small example Tupfile that could help, too.

Thanks,
-Mike

Sam Sam

unread,
May 19, 2021, 1:58:38 PM5/19/21
to tup-...@googlegroups.com
Hi Mike,

You got my use case perfectly well, I’m trying to avoid re-running some time consuming tasks by storing their targets on a distant volume.
After checking out my pipeline source code (including the tup file) from a version system, I perform a tup init and an rsync to fetch some of these large targets; and I was wondering if tup could continue from here. 

I definitely understand that recognising those pre-existing target would defeat tup core features, so I’ll just stick with some more standard use cases :) 

(Still just for fun, I’ll try to rsync the generated files and the tup database, I’ll see how it goes)

@mike: In all cases, thanks a lot for your work, I got several build systems running under tup after just browsing the documentation a couple of times, that’s really a well thought software. 



--
--
tup-users mailing list
email: tup-...@googlegroups.com
unsubscribe: tup-users+...@googlegroups.com
options: http://groups.google.com/group/tup-users?hl=en
---
You received this message because you are subscribed to the Google Groups "tup-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tup-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tup-users/CA%2B6x0LWeNtkSvFJYiKRKp%3D3fA9ru%2BY30aRUrVCeCdfaPSc8mnw%40mail.gmail.com.

Mike Shal

unread,
May 19, 2021, 6:34:36 PM5/19/21
to tup-...@googlegroups.com
On Wed, May 19, 2021 at 10:58 AM Sam Sam <mas...@gmail.com> wrote:
Hi Mike,

You got my use case perfectly well, I’m trying to avoid re-running some time consuming tasks by storing their targets on a distant volume.
After checking out my pipeline source code (including the tup file) from a version system, I perform a tup init and an rsync to fetch some of these large targets; and I was wondering if tup could continue from here. 

I definitely understand that recognising those pre-existing target would defeat tup core features, so I’ll just stick with some more standard use cases :) 

(Still just for fun, I’ll try to rsync the generated files and the tup database, I’ll see how it goes)

I believe that syncing the database as well should work, so long as you make sure mtimes on generated files are kept, which rsync with -a should do.

Is there a reason you are wanting to re-create new .tup directories frequently rather than just re-using the same one each time?
 

@mike: In all cases, thanks a lot for your work, I got several build systems running under tup after just browsing the documentation a couple of times, that’s really a well thought software. 


Thanks for the feedback! That is refreshing to hear :)

-Mike
Reply all
Reply to author
Forward
0 new messages