Putting timestamps in pkg_tar

360 views
Skip to first unread message

Sean Suchter

unread,
Oct 5, 2016, 1:49:00 PM10/5/16
to bazel-dev
Hi folks -

I ran into the issue referenced in issue 1299. My solution was a trivial diff:

index 650b901..2bf6b4d 100644
--- a/tools/build_defs/pkg/build_tar.py
+++ b/tools/build_defs/pkg/build_tar.py
@@ -90,7 +90,8 @@ class TarFile(object):
     # If mode is unspecified, derive the mode from the file's mode.
     if mode is None:
       mode = 0o755 if os.access(f, os.X_OK) else 0o644
-    self.tarfile.add_file(dest, file_content=f, mode=mode)
+    mtime = os.path.getmtime(f)
+    self.tarfile.add_file(dest, file_content=f, mode=mode, mtime=mtime)

   def add_tar(self, tar):
     """Merge a tar file into the destination tar file.

I'm working on a submission per https://bazel.io/contributing.html. Only thing that I can't find is where I'd add a test case for this. Where are the test cases for pkg_tar in general?

I'd welcome any feedback on where to put tests or whether this fix is appropriate.

Thanks, Sean


Sean Suchter

unread,
Oct 5, 2016, 2:01:37 PM10/5/16
to bazel-dev
Hi folks -

In trying to submit a code review to Gerrit, I ran into this error message:

Seans-MacBook-Pro:~/bazel/tools/build_defs% git push https://bazel.googlesource.com/bazel HEAD:refs/for/master
Total 0 (delta 0), reused 0 (delta 0)
remote: Processing changes: refs: 1, done
 ! [remote rejected] HEAD -> refs/for/master (no new changes)
error: failed to push some refs to 'https://bazel.googlesource.com/bazel'

I wonder if it is because of something related to this step (from the Contributing page):

  • Have an automatically generated "Change Id" line in your commit message. If you haven't used Gerrit before, it will print a bash command to create the git hook and then you will need to run `git commit --amend` to add the line.
I didn't get any bash command suggested to create the git hook.

I'd appreciate any advice on how I could properly submit the code review. (And also anything about where to work on unit tests.)

Thanks, Sean

Damien Martin-guillerez

unread,
Oct 5, 2016, 2:30:00 PM10/5/16
to Sean Suchter, bazel-dev

Hello, you usually get that error when you try to push a branch that has no change (either compared to master or to an already uploaded patch). Are you sure you are on the good branch? What does `git log -1` says?

First round of review in the same time: we don't want to always keep the timestamp, this should be an option of the rule (like a stamp attribute).


--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To post to this group, send email to baze...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/d0593faf-009f-4059-927a-9c2885182e0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sean Suchter

unread,
Oct 5, 2016, 2:40:32 PM10/5/16
to Damien Martin-guillerez, bazel-dev
Thanks Damien!

Seans-MacBook-Pro:~/bazel/tools/build_defs% git log -1 --color=never
commit fc475658721b86a81351cb8841aa8f5ad57d828f
Author: Damien Martin-Guillerez <dmar...@google.com>
Date:   Wed Oct 5 15:24:59 2016 +0000

    Add jdk8 tags to java_integration_test

    We test some JDK8 features only and it is not worth the bucket
    to separate them now.

    --
    MOS_MIGRATED_REVID=135234490

Seans-MacBook-Pro:~/bazel/tools/build_defs% git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

modified:   pkg/build_tar.py

On your review comment:

I take it you’re suggesting that I make a keep_mtime option to the rule? Should its default be true or false? I understand it’s a changed behavior to make it be ‘true’ but I wonder if that’s a reasonable default? (When would one ever want to lose the mtimes? That’s different than the defaults of typical package managers and tar.) I could make it false, if you prefer. (I can always set it to true in my BUILD)

Thanks,
Sean

Damien Martin-guillerez

unread,
Oct 5, 2016, 2:43:25 PM10/5/16
to Sean Suchter, bazel-dev
On Wed, Oct 5, 2016 at 8:40 PM Sean Suchter <ssuc...@pepperdata.com> wrote:
Thanks Damien!

Seans-MacBook-Pro:~/bazel/tools/build_defs% git log -1 --color=never
commit fc475658721b86a81351cb8841aa8f5ad57d828f
Author: Damien Martin-Guillerez <dmar...@google.com>
Date:   Wed Oct 5 15:24:59 2016 +0000

    Add jdk8 tags to java_integration_test

    We test some JDK8 features only and it is not worth the bucket
    to separate them now.

    --
    MOS_MIGRATED_REVID=135234490

Seans-MacBook-Pro:~/bazel/tools/build_defs% git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

modified:   pkg/build_tar.py


So you haven't committed your change that's why :)
 
On your review comment:

I take it you’re suggesting that I make a keep_mtime option to the rule?

I would call it "stamp" :)
 
Should its default be true or false? I understand it’s a changed behavior to

Default should be false to be backward compatible. It could also allow to be set to a certain value to keep caching while having reasonable value for the mtime.
 
make it be ‘true’ but I wonder if that’s a reasonable default? (When would 
one ever want to lose the mtimes? That’s different than the defaults of

Almost every time, it kills caching to keep the mtimes

Sean Suchter

unread,
Oct 5, 2016, 4:33:40 PM10/5/16
to Damien Martin-guillerez, bazel-dev
Thanks for the git commit pointer. I successfully pushed, but had a merge conflict. I will rebase and fix. (I’m a daily user of mercurial, so don’t always get the git commands right.) (https://bazel-review.googlesource.com/#/c/6470/)

On your review comment:

I take it you’re suggesting that I make a keep_mtime option to the rule?

I would call it "stamp" :)

You don’t think this will be confused with the ‘stamp’ option in bazel that encodes build information that is different than the mtime?

Should its default be true or false? I understand it’s a changed behavior to

Default should be false to be backward compatible. It could also allow to be set to a certain value to keep caching while having reasonable value for the mtime.

Understood, will make the default false.

make it be ‘true’ but I wonder if that’s a reasonable default? (When would 
one ever want to lose the mtimes? That’s different than the defaults of

Almost every time, it kills caching to keep the mtimes

Thanks for pointing this out. I’ll admit, I don’t quite understand, but will start with a false default.

For my edification - you’re talking about a case that one of the files to be included has a spurious mtime modification and so would have to be re-included unnecessarily? I’d think that the mtime of the file to be included would likely only be changed if it was somehow actually a new change (in the case of a source file) or a new generated file (in the case of a rule output).

Thanks again for this very helpful interaction.

Sean

You received this message because you are subscribed to a topic in the Google Groups "bazel-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-dev/iwOPKRhf_wE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-dev+...@googlegroups.com.

To post to this group, send email to baze...@googlegroups.com.

pcj...@gmail.com

unread,
Oct 5, 2016, 7:31:43 PM10/5/16
to bazel-dev, dmar...@google.com


On Wednesday, October 5, 2016 at 2:33:40 PM UTC-6, Sean Suchter wrote:
Thanks for the git commit pointer. I successfully pushed, but had a merge conflict. I will rebase and fix. (I’m a daily user of mercurial, so don’t always get the git commands right.) (https://bazel-review.googlesource.com/#/c/6470/)

On your review comment:

I take it you’re suggesting that I make a keep_mtime option to the rule?

I would call it "stamp" :)

You don’t think this will be confused with the ‘stamp’ option in bazel that encodes build information that is different than the mtime?

Should its default be true or false? I understand it’s a changed behavior to

Default should be false to be backward compatible. It could also allow to be set to a certain value to keep caching while having reasonable value for the mtime.

Understood, will make the default false.

make it be ‘true’ but I wonder if that’s a reasonable default? (When would 
one ever want to lose the mtimes? That’s different than the defaults of

Almost every time, it kills caching to keep the mtimes

Thanks for pointing this out. I’ll admit, I don’t quite understand, but will start with a false default.


Chiming in here.  The main point of the code is to make a tar file deterministic.  The reproducibility characteristic here is not really about avoiding bugs, but making sure that a hash of the tarfile is dependent only on the file content.  Bazel may use this hash to help determine if the inputs or outputs change.  By making all tar layers of an image deterministic, you get a deterministic image.

BTW: this is how git works also (does not store timestamps).

Sean Suchter

unread,
Oct 5, 2016, 8:03:51 PM10/5/16
to bazel-dev, pcj...@gmail.com, dmar...@google.com
On October 5, 2016 at 4:31:44 PM, pcj...@gmail.com (pcj...@gmail.com) wrote:


make it be ‘true’ but I wonder if that’s a reasonable default? (When would 
one ever want to lose the mtimes? That’s different than the defaults of

Almost every time, it kills caching to keep the mtimes

Thanks for pointing this out. I’ll admit, I don’t quite understand, but will start with a false default.

Chiming in here.  The main point of the code is to make a tar file deterministic.  The reproducibility characteristic here is not really about avoiding bugs, but making sure that a hash of the tarfile is dependent only on the file content.  Bazel may use this hash to help determine if the inputs or outputs change.  By making all tar layers of an image deterministic, you get a deterministic image.

BTW: this is how git works also (does not store timestamps).

Hm, I thought that Git does store the commit timestamp. (Not necessarily your computer’s mtime on the file, but the timestamp of the commit. You could use this to reconstruct the “important” time of when a file last changed.)

Do you think this change in general is bad, or are you just suggesting that we keep the option to use the mtime as default-off? I definitely understand the reasoning of why we should not have the default be preserve mtime (thanks to both of you for educating me!).

Would it be useful to also (probably a separate CL, I don’t even know if it’s really possible) make the hashing code ignore timestamps specifically when hashing a tarball? It’s probably a bad idea, but I thought I’d mention it.

I will point out that there is a case where the file timestamp (not in my project, but definitely theoretically possible) *is* an important part of information. (So a file with the same content but a new timestamp is not the same as a file with an old timestamp.) Having the option to preserve (but not the requirement, I get your points now) would be useful in those contexts.

Thanks,

Sean


Austin Schuh

unread,
Oct 6, 2016, 11:59:54 AM10/6/16
to Sean Suchter, bazel-dev, pcj...@gmail.com, dmar...@google.com
On Wed, Oct 5, 2016 at 5:03 PM Sean Suchter <ssuc...@pepperdata.com> wrote:
On October 5, 2016 at 4:31:44 PM, pcj...@gmail.com (pcj...@gmail.com) wrote:


make it be ‘true’ but I wonder if that’s a reasonable default? (When would 
one ever want to lose the mtimes? That’s different than the defaults of

Almost every time, it kills caching to keep the mtimes

Thanks for pointing this out. I’ll admit, I don’t quite understand, but will start with a false default.

Chiming in here.  The main point of the code is to make a tar file deterministic.  The reproducibility characteristic here is not really about avoiding bugs, but making sure that a hash of the tarfile is dependent only on the file content.  Bazel may use this hash to help determine if the inputs or outputs change.  By making all tar layers of an image deterministic, you get a deterministic image.

BTW: this is how git works also (does not store timestamps).

Hm, I thought that Git does store the commit timestamp. (Not necessarily your computer’s mtime on the file, but the timestamp of the commit. You could use this to reconstruct the “important” time of when a file last changed.)

Do you think this change in general is bad, or are you just suggesting that we keep the option to use the mtime as default-off? I definitely understand the reasoning of why we should not have the default be preserve mtime (thanks to both of you for educating me!).

Would it be useful to also (probably a separate CL, I don’t even know if it’s really possible) make the hashing code ignore timestamps specifically when hashing a tarball? It’s probably a bad idea, but I thought I’d mention it.

I will point out that there is a case where the file timestamp (not in my project, but definitely theoretically possible) *is* an important part of information. (So a file with the same content but a new timestamp is not the same as a file with an old timestamp.) Having the option to preserve (but not the requirement, I get your points now) would be useful in those contexts.

Thanks,

Sean


Making the hashing code ignore the timestamps sounds dangerous.  Any targets that depend on the output would be built using the old timestamps, and not have the same hash.  That does not sound worth the risk.

Bazel has a flag already to control enabling or disabling of timestamps.  It is the --stamp flag.  Is there any way you can conditionally enable timestamping tied to that flag?

Austin

Damien Martin-guillerez

unread,
Oct 6, 2016, 12:03:52 PM10/6/16
to Austin Schuh, Sean Suchter, bazel-dev, pcj...@gmail.com
No there is not because you don't get access to that flag from Skylark. The hashing code will still take the timestamp but people will have to be careful in using that option.
Reply all
Reply to author
Forward
0 new messages