Hello up there,
I'm preparing software+data archive to put onto dvd as bare git
repositories. For convenience I decided to also put msysgit onto the
disk as well as simple read-only git-filesystem mounter through
fuse/dokan. Everything fits tightly, so I'm short of space.
It is known, that because hardlinks on windows are not easy, the files
in libexec/git-core/*.exe are duplicating and wasting space on a
fat/ntfs filesystem. For the record, unpacked portable 1.8.1.2 takes
233M, 153M of which occupies libexec/git-core/.
But since iso9660 supports hardlinks, and I'm preparing the disk on
Linux, I was going to save that space through filtering files through
`hardlink` utility [1], which basically compares files and links pairs
with equal content. Then the result will be passed to genisoimage and
voila.
BUT
experience showed that filtering msysgit portable through `hardlink`
reduces space only to
/ libexec/git-core/
before 233M 153M
after 119M 42M
"only", because on Linux, du -sh for libexec/git-core/ is ~ 14M.
I've studied the case, and it turned out, files in libexec/git-core/ in
msysgit portable are not all the same, e.g.:
$ cd libexec/git-core/
$ md5sum *.exe |sort | tail -20
e5334a495e2ef49a625257f71d91004d git-status.exe
e5334a495e2ef49a625257f71d91004d git-stripspace.exe
e5334a495e2ef49a625257f71d91004d git-symbolic-ref.exe
e70488a6ef8b6acf110105fbe4f99f17 git-revert.exe
e70488a6ef8b6acf110105fbe4f99f17 git-rev-list.exe
e70488a6ef8b6acf110105fbe4f99f17 git-rev-parse.exe
e70488a6ef8b6acf110105fbe4f99f17 git-rm.exe
e8f77f297a3944933e1b7f40a1bd5e57 git-push.exe
e8f77f297a3944933e1b7f40a1bd5e57 git-read-tree.exe
e8f77f297a3944933e1b7f40a1bd5e57 git-receive-pack.exe
e8f77f297a3944933e1b7f40a1bd5e57 git-reflog.exe
eb2ac41905f27d0ef63b3eb1fd11069e git-ls-files.exe
eb2ac41905f27d0ef63b3eb1fd11069e git-ls-remote.exe
eb2ac41905f27d0ef63b3eb1fd11069e git-ls-tree.exe
eb2ac41905f27d0ef63b3eb1fd11069e git-mailinfo.exe
eb2ac41905f27d0ef63b3eb1fd11069e git-mailsplit.exe
fee6e18f42ae9d24f7c15bce81c14c1b git-index-pack.exe
fee6e18f42ae9d24f7c15bce81c14c1b git-init-db.exe
fee6e18f42ae9d24f7c15bce81c14c1b git-init.exe
fee6e18f42ae9d24f7c15bce81c14c1b git-log.exe
$ hexdump -C git-init.exe >1
$ hexdump -C git-status.exe >2
$ diff -u 1 2
--- 1 2013-04-22 18:49:46.000000000 +0400
+++ 2 2013-04-22 18:49:54.000000000 +0400
@@ -6,12 +6,12 @@
00000050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f |is program canno|
00000060 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20 |t be run in DOS |
00000070 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00 |mode....$.......|
-00000080 50 45 00 00 4c 01 06 00 25 b6 0b 51 00 00 00 00 |PE..L...%..Q....|
+00000080 50 45 00 00 4c 01 06 00 30 b6 0b 51 00 00 00 00 |PE..L...0..Q....|
00000090 00 00 00 00 e0 00 2f 03 0b 01 02 38 00 1e 11 00 |....../....8....|
000000a0 00 e8 14 00 00 02 02 00 80 12 00 00 00 10 00 00 |................|
000000b0 00 30 11 00 00 00 40 00 00 10 00 00 00 02 00 00 |.0....@.........|
000000c0 04 00 00 00 01 00 00 00 04 00 00 00 00 00 00 00 |................|
-000000d0 00 30 17 00 00 04 00 00 64 92 15 00 03 00 00 00 |.0......d.......|
+000000d0 00 30 17 00 00 04 00 00 6f 92 15 00 03 00 00 00 |.0......o.......|
000000e0 00 00 20 00 00 10 00 00 00 00 10 00 00 10 00 00 |.. .............|
000000f0 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 00 00 17 00 fc 17 00 00 00 20 17 00 a4 02 00 00 |......... ......|
The first difference, in position 0x88 is TimeDateStamp of struct
_IMAGE_FILE_HEADER [2], i.e. files were produced in different times.
The second difference in pos=0xd8 is probably related.
Strangely, because all those "builtin" exe's should be copied, if not
linked from main git.exe:
---- 8< ---- git cat-file blob 4msysgit/master:Makefile
...
BUILT_INS += git-init$X
BUILT_INS += git-status$X
...
$(BUILT_INS): git$X
$(QUIET_BUILT_IN)$(RM) $@ && \
ln git$X $@ 2>/dev/null || \
ln -s git$X $@ 2>/dev/null || \
cp git$X $@
and in fact after installing Git-1.8.1.2-preview20130201.exe under wine
the files in "Program Files/Git/libexec/git-core/" are almost all the
same except small number of separate-program helpers:
Program Files/Git/libexec/git-core$ md5sum *.exe |sort |tail -20
9e8db36fe71fc99edbdd429967df222d git-show-ref.exe
9e8db36fe71fc99edbdd429967df222d git-stage.exe
9e8db36fe71fc99edbdd429967df222d git-status.exe
9e8db36fe71fc99edbdd429967df222d git-stripspace.exe
9e8db36fe71fc99edbdd429967df222d git-symbolic-ref.exe
9e8db36fe71fc99edbdd429967df222d git-tag.exe
9e8db36fe71fc99edbdd429967df222d git-tar-tree.exe
9e8db36fe71fc99edbdd429967df222d git-unpack-file.exe
9e8db36fe71fc99edbdd429967df222d git-unpack-objects.exe
9e8db36fe71fc99edbdd429967df222d git-update-index.exe
9e8db36fe71fc99edbdd429967df222d git-update-ref.exe
9e8db36fe71fc99edbdd429967df222d git-update-server-info.exe
9e8db36fe71fc99edbdd429967df222d git-upload-archive.exe
9e8db36fe71fc99edbdd429967df222d git-var.exe
9e8db36fe71fc99edbdd429967df222d git-verify-pack.exe
9e8db36fe71fc99edbdd429967df222d git-verify-tag.exe
9e8db36fe71fc99edbdd429967df222d git-whatchanged.exe
9e8db36fe71fc99edbdd429967df222d git-write-tree.exe
c4c3b9e4c25be92f4b03263645bd4220 git-http-push.exe
f6ff08965609c8303002855bef1cf326 git-fast-import.exe
Program Files/Git/libexec/git-core$ md5sum *.exe |sort |awk '{print $1}' | uniq --count
1 01c8ab52a56f8e69cdd6a2f450d9c9cd
1 029d1a96aac352299a9e22e217e28d0a
1 0daffe44f7244b18283e9ff713b9f20f
1 0e7909aa44250f16dfebef1c4fbdfe31
2 16a0324ac55d288e8b93860c6adbe034
1 195749301ad47b35b938cce51e476557
1 1f37b05044ed832aad1acf778ac8e218
1 2ada68594c0e989f7f78746a472f8d7f
2 48b26163f4c09b8158cecb5dfb1cc1d0
1 4a025edcead3ecd769ce6d04b8cfc6fb
1 70a43b7fd81b59317a57d8fc93ae0d49
1 8720f9c263b4a73e951fc96611ee8afe
108 9e8db36fe71fc99edbdd429967df222d
1 c4c3b9e4c25be92f4b03263645bd4220
1 f6ff08965609c8303002855bef1cf326
and after hardlinkify'ing installed non-portable msysgit, du goes to 11M.
So I'd like to ask:
Why are those files in libexec/git-core/*.exe are all different? How
were they built ...
... oh, I see, it's maybe that strip in share/WinGit/copy-files.sh:
strip bin/{[a-fh-z],g[a-oq-z]}*.exe libexec/git-core/*.exe &&
and while it processes files, it injects current time into them, which
is why they get different...
Please find a patch to fix this problem attached. Tested only under Wine
and only for generating portable version.
Thanks,
Kirill
[1]
http://jak-linux.org/projects/hardlink/
[2]
http://www.csn.ul.ie/~caolan/publink/winresdump/winresdump/doc/pefile2.html
---- 8< ----
From 8e43e5931d16724702db47d4aa8903325b77ece9 Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <
ki...@mns.spb.ru>
Date: Mon, 22 Apr 2013 21:28:29 +0400
Subject: [PATCH] copy-files.sh: Preserve bit-exactness of git-*.exe for DONT_REMOVE_BUILTINS
Currently, if client (i.e. share/WinGit/portable-release.sh) asks
copy-files.sh not to remove git builtins, we just keep them in bin/ and
libexec/git-core/ . The problem is, later, when those executables are
stripped, though initially they were identical, they all become
different, because:
For PE, strip (and everything from binutils) puts current time into
produced object in PE header into TimeDateStamp [1]:
---- 8< ---- bfd/peXXigen.c
H_PUT_32 (abfd, time (0), filehdr_out->f_timdat);
and that hurts, if, for example, later a user wants to hardlink'ify
portable git install.
Though strip has --preserve-dates, as said above, it does not affect the
_contents_ of the generated object file - only mtime/ctime are
preserved.
Let's preserve builtins ourselves - we already keep list of them in
etc/fileList-builtins.txt for msysgit installer (it relinks them at
install time), and for portable version, lets first remove all builtins
except git.exe, then strip, and then restore builtins bit-to-bit equal
to git.exe . This way, later hardlinkifying will work.
[1]
http://www.csn.ul.ie/~caolan/publink/winresdump/winresdump/doc/pefile2.html
Signed-off-by: Kirill Smelkov <
ki...@mns.spb.ru>
---
share/WinGit/copy-files.sh | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/share/WinGit/copy-files.sh b/share/WinGit/copy-files.sh
index 6bc9427..b94f460 100755
--- a/share/WinGit/copy-files.sh
+++ b/share/WinGit/copy-files.sh
@@ -69,12 +69,9 @@ rm -rf bin/cvs.exe &&
test -f lib/perl5/site_perl/Git.pm &&
gitmd5=$(md5sum bin/git.exe | cut -c 1-32) &&
mkdir etc &&
-if test -z "$DONT_REMOVE_BUILTINS"
-then
- md5sum {bin,libexec/git-core}/git-*.exe libexec/git-core/git.exe |
- sed -n -r "s/^$gitmd5\s+\*?(.*)/\1/p" > etc/fileList-builtins.txt &&
- rm $(cat etc/fileList-builtins.txt)
-fi &&
+md5sum {bin,libexec/git-core}/git-*.exe libexec/git-core/git.exe |
+sed -n -r "s/^$gitmd5\s+\*?(.*)/\1/p" > etc/fileList-builtins.txt &&
+rm $(cat etc/fileList-builtins.txt) && # rm builtins - if needed we'll restore them after strip
(cd $MSYSGITROOT/mingw && tar cf - \
bin/*{tcl,tk,wish,gpg,msmtp,curl.exe,*.crt}* bin/connect.exe \
bin/*{libcurl,libcrypto,libssl,libgsasl,libiconv}* \
@@ -85,6 +82,16 @@ fi &&
tar xf - &&
cp $MSYSGITROOT/mingw/bin/hd2u.exe bin/dos2unix.exe &&
strip bin/{[a-fh-z],g[a-oq-z]}*.exe libexec/git-core/*.exe &&
+if test -n "$DONT_REMOVE_BUILTINS"
+then
+ # restore builtins after git.exe was stripped
+ # (for PE, strip embeds current time into file header, and if we just
+ # pass all git builtins to strip the result will be lots of
+ # not-bit-exact exe's)
+ for b in $(cat etc/fileList-builtins.txt); do
+ ln bin/git.exe $b
+ done
+fi &&
cp $MSYSGITROOT/git/contrib/completion/git-completion.bash etc/ &&
cp $MSYSGITROOT/git/contrib/completion/git-prompt.sh etc/ &&
cp $MSYSGITROOT/etc/termcap etc/ &&
--
1.8.2.1.744.gffa8c7b