Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Proposal for shell-patch-format [was: Re: more git updates..]
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 50 of 187 - Collapse all  -  Translate all to Translated (View all originals) < Older  Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Rutger Nijlunsing  
View profile  
 More options Apr 10 2005, 7:30 am
Newsgroups: linux.kernel
From: Rutger Nijlunsing <rut...@nospam.com>
Date: Sun, 10 Apr 2005 13:30:13 +0200
Local: Sun, Apr 10 2005 7:30 am
Subject: Proposal for shell-patch-format [was: Re: more git updates..]

On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> Listing the file paths and their sigs included in a tree to make
> a snapshot of a tree state sounds fine, and diffing two trees by
> looking at the sigs between two such files sounds fine as well.

> But I am wondering what your plans are to handle renames---or
> does git already represent them?

git doesn't represent transitions (or deltas), but only state. So it's
not (much) more then a .tar file from version-management perspective;
the only difference being that a git-tree has a comment field and a
predecessor-reference, which are currently not used in determining the
'patch' between two trees.

Deltas are derived by comparing different versions and determining
the difference by reverse-engineering the differences which got us
from version A to version B.

Deltas are currently described as patch(1)es. Patches don't have the
concept of 'renaming', so even after determining that file X has been
renamed to Y, we have no container for this fact. A patch(1) only
contains local-file-edits: substitute lines by other lines.

Deltas are not needed to follow a tree; deltas are useful for merging
branches of versions, and for reviewing purposes. This is comparable
to using tar for version-management: it is very common to weekly tar
your current version of your project as a poor-mans-version management
for one-person one-project.

So what is needed is a way to represent deltas which can contain more
than only traditional patches. I would propose a simple format:
the shell-script in a fixed-format.

Shell-patch format in EBNF:
  <shellpatch> ::= ( <comment>? <command>* )*
  <comment> ::= <commentline>+
    The comments contains the text describing the function of the
    patch following it.
  <commentline> ::= "# " <text>
  <command> ::=
    "mv " <pathname> " " <pathname> "\n" |
    "cp " <filename> " " <filename> "\n" |
    "chmod " <mode> <pathname> "\n" |
    "patch <<__UNIQUE_STRING__\n" <patch> "__UNIQUE_STRING__\n"
      (where UNIQUE_STRING must not be contained in patch)
  <filename> ::= <pathname>
    (but pointing to a file)
  <pathname> ::= a pathname relative to '.';
    escaping special characters the shell-way;
    may not contain '..'.

Example:
  # Rename file b to a1, and change a line.
  mv b a1
  patch <<__END__
  *** a1  Sun Apr 10 11:43:37 2005
  --- a2  Sun Apr 10 11:43:41 2005
  ***************
  *** 1,4 ****
    1
    2
  ! from
    3
  --- 1,4 ----
    1
    2
  ! to
    3
  __END__

Advantages:
  - ASCII!
  - a shell-patch is executable without extra tooling
  - a shell-patch is readable and therefore reviewable
  - a shell-patch is forward-compatible: a shell-patch acts
    like a patch (since patch(1) ignores garbage around patch :),
    but not backwards-compatible.
  - extensible
  - the heavy-lifting is done by 'patch'
Disadvantages:
  - no deltas for binary files

Open issues:
  - <comment> could be made more structured; maybe containing fields
    like Sujbect:, Author:, Signed-By:, certificates, ...
    (BitKeeper seems to be using "# " <field> ":" <value> "\n" lines)
  - patch(1) doesn't know any directories. Should shell-patch
    know directories? This implies commands working on directories to
    (like directory renaming, mode changing, ...). Otherwise directories
    are implicit (a file in a directories implies the existance of that
    directory). Also implies mkdir and rmdir as shell-patch commands.
  - extra commands might be useful to conserve more state(changes):
    ln -s  -- for symbolic links;
    ln     -- for hard links;
    chown  -- for permissions;
    chattr -- for storing extended attributes
    touch  -- for setting timestamps (probably creation time only,
              since mtime is something git relies on)
    ...and for the really adventurous:
    sed 's,<fromstring>,<tostring>,' -- for substitutions
      (this is something darcs supports, but which I think is too
       bothersome to use since it is difficult to reverse engineere
       from two random trees)
Why a fixed format at all?
  - This way, the executable shell-patch can be proven to be
    harmless to the machine: 'rm -rf /' is a valid shell-script,
    but not a valid shell-patch (since 'rm' is not valid command,
    random flags like '-rf' are not supported, and '/' is an absolute
    pathname.
  - A fixed format enables tooling to support such a patch format;
    for example creating the reverse-patch, merging patches (yeah,
    'cat' also merges patches...).

...what has this to do with git?  Not much and everything, depending
on how you look onto it. 'git' is 'tar', and 'shell-patch' is 'patch';
both orthogonal concepts but very usable in combination. We'll look at
getting from two git trees to a shell-patch.

Diffing the trees would not only look at the file and per file at the
hashes, but also the other way around: which hash values are used more
than once. For files with the same hash value, compare the contents
(and rest of attributes); this is needed since the mapping from file
contents to sha1 is one-way. When the contents is the same, the
shell-patch-command to generate is obviously a 'cp'.

For example, we have got two trees in git (pathname -> hash value):
  tree1/file1 -> 1234
  tree1/file2 -> 4567
and
  tree2/file1 -> 3456
  tree2/file3 -> 4567
  tree2/file4 -> 4567

..this could generate shell-patch:

  # Comments-go-here
  mv tree2/file2 tree2/file3
  cp tree2/file3 tree2/file4
  patch tree1/file1 <<__FILE_PATCH__
  (patch-goes-here)
  __FILE_PATCH__

...by an algorithm which starts by determining all renames, then all
copies, and finally all patches.

Comments?

--
Rutger Nijlunsing ---------------------- linux-kernel at tux.tmfweb.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by tony.l...@intel.com
tony.l...@intel.com  
View profile  
 More options Apr 10 2005, 7:50 am
Newsgroups: linux.kernel
From: tony.l...@intel.com
Date: Sun, 10 Apr 2005 13:50:07 +0200
Local: Sun, Apr 10 2005 7:50 am
Subject: Re: more git updates..

>handle by pure rename only plus the extra delta. The current git don't
>have per file change history. From git's point of view some file deleted
>and the other file appeared with same content.

>It is the top level SCM to handle that correctly.
>Rename a directory will be even more fun.

But from a git perspective it will be very efficient.  Imagine that
Linus decides to rename arch/i386 as arch/x86 ... at the git repository
level this just requires a changeset, a new top level tree, and a new
tree for the arch directory showing that i386 changed to x86.  That's
all ... every files below that didn't change, so the blobs for the files
are all the same.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ralph Corderoy  
View profile  
 More options Apr 10 2005, 8:00 am
Newsgroups: linux.kernel
From: Ralph Corderoy <ra...@inputplus.co.uk>
Date: Sun, 10 Apr 2005 14:00:17 +0200
Local: Sun, Apr 10 2005 8:00 am
Subject: Re: more git updates..

Hi,

Christopher Li wrote:
> On Sat, Apr 09, 2005 at 04:31:10PM -0700, Linus Torvalds wrote:
> > NOTE! This means that each "tree" file basically tracks just a
> > single directory. The old style of "every file in one tree file"
> > still works, but fsck-cache will warn about it. Happily, the git
> > archive itself doesn't have any subdirectories, so git itself is not
> > impacted by it.

> That is really cool stuff. My way to read it, correct me if I am
> wrong, git is a user space version file system. "tree" <--> directory
> and "blob" <--> file.  "commit" to describe the version history.

See the Venti filesystem in Bell Labs's Plan 9 OS.  It too uses SHA-1.

    http://www.cs.bell-labs.com/sys/doc/venti/venti.pdf

    Abstract

    This paper describes a network storage system, called Venti,
    intended for archival data. In this system, a unique hash of a
    block's contents acts as the block identifier for read and write
    operations. This approach enforces a write-once policy, preventing
    accidental or malicious destruction of data. In addition, duplicate
    copies of a block can be coalesced, reducing the consumption of
    storage and simplifying the implementation of clients. Venti is a
    building block for constructing a variety of storage applications
    such as logical backup, physical backup, and snapshot file systems.

    We have built a prototype of the system and present some preliminary
    performance results. The system uses magnetic disks as the storage
    technology, resulting in an access time for archival data that is
    comparable to non-archival data. The feasibility of the write-once
    model for storage is demonstrated using data from over a decade's
    use of two Plan 9 file systems.

Cheers,

Ralph.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tony.l...@intel.com  
View profile  
 More options Apr 10 2005, 8:10 am
Newsgroups: linux.kernel
From: tony.l...@intel.com
Date: Sun, 10 Apr 2005 14:10:05 +0200
Local: Sun, Apr 10 2005 8:10 am
Subject: Re: more git updates..

>In other words, each "commit" file is very small and cheap, but since
>almost every commit will also imply a totally new tree-file, "git" is
>going to have an overhead of half a megabyte per commit. Oops.

>Damn, that's painful. I suspect I will have to change the format somehow.

Having dodged that bullet with the change to make tree files point at
other tree files ... here's another (potential) issue.

A changeset that touches just one file a few levels down from the top
of the tree (say arch/i386/kernel/setup.c) will make six new files in
the git repository (one for the changeset, four tree files, and a new
blob for the new version of the file). More complex changes make more
files ... but say the average is ten new files per changeset since most
changes touch few files.  With 60,000 changesets in the current tree, we
will start out our git repository with about 600,000 files.  Assuming
the first byte of the SHA1 hash is random, that means an average of 2343
files in each of the objects/xx directories.  Give it a few more years at
the current pace, and we'll have over 10,000 files per directory.  This
sounds like a lot to me ... but perhaps filesystems now handle large
directories enough better than they used to for this to not be a problem?

Or maybe the files should be named objects/xx/yy/zzzzzzzzzzzzzzzz?

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linus Torvalds  
View profile  
 More options Apr 10 2005, 11:50 am
Newsgroups: linux.kernel
From: Linus Torvalds <torva...@osdl.org>
Date: Sun, 10 Apr 2005 17:50:11 +0200
Local: Sun, Apr 10 2005 11:50 am
Subject: Re: more git updates..

On Sun, 10 Apr 2005, Junio C Hamano wrote:

> But I am wondering what your plans are to handle renames---or
> does git already represent them?

You can represent renames on top of git - git itself really doesn't care.  
In many ways you can just see git as a filesystem - it's content-
addressable, and it has a notion of versioning, but I really really
designed it coming at the problem from the viewpoint of a _filesystem_
person (hey, kernels is what I do), and I actually have absolutely _zero_
interest in creating a traditional SCM system.

So to take renaming a file as an example - why do you actually want to
track renames? In traditional SCM's, you do it for two reasons:

 - space efficiency. Most SCM's are based on describing changes to a file,
   and compress the data by doing revisions on the same file. In order to
   continue that process past a rename, such an SCM _has_ to track
   renames, or lose the delta-based approach.

   The most trivial example of this is "diff", ie a rename ends up
   generating a _huge_ diff unless you track the rename explicitly.

   GIT doesn't care. There is _zero_ space efficiency in trying to track
   renames. In fact, it would add overhead to the system, not lessen it.
   That's because GIT fundamentally doesn't do the "delta-within-a-file"  
   model.

 - annotate/blame. This is a valid concern, but the fact is, I never use
   it. It may be a deficiency of mine, but I simply don't do the per-line
   thing when I debug or try to find who was responsible. I do "blame" on
   a much bigger-picture level, and I personally believe (pretty strongly)
   that per-line annotations are not actually a good thing - they come not
   because people _want_ to do things at that low level, but because
   historically, you didn't _have_ the bigger-picture thing.

   In other words, pretty much every SCM out there is based on SCCS
   "mentally", even if not in any other model. That's why people think
   per-line blame is important - you have that mental model.

So consider me deficient, or consider me radical. It boils down to the
same thing. Renames don't matter.

That said, if somebody wants to create a _real_ SCM (rather than my notion
of a pure content tracker) on top of GIT, you probably could fairly easily
do so by imposing a few limitations on a higher level. For example, most
SCM's that track renames require that the user _tell_ them about the
renames: you do a "bk mv" or a "svn rename" or something.

If you want to do the same on top of GIT, then you should think of GIT as
what it is: GIT just tracks contents. It's a filesystem - although a
fairly strange one. How would you track renames on top of that? Easy: add
your own fields to the GIT revision messages: GIT enforces the header, but
you can add anything you want to the "free-form" part that follows it.

Same goes for any other information where you care about what happens
"within" a file. GIT simply doesn't track it. You can build things on top
of GIT if you want to, though. They may not be as efficient as they would
be if they were built _into_ GIT, but on the other hand GIT does a lot of
other things a hell of a lot faster thanks to it's design.

So whether you agree with the things that _I_ consider important probably
depends on how you work. The real downside of GIT may be that _my_ way of
doing things is quite possibly very rare.

But it clearly is the only right way. The fact that everybody else does it
some other way only means that they are wrong.

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linus Torvalds  
View profile  
 More options Apr 10 2005, 12:10 pm
Newsgroups: linux.kernel
From: Linus Torvalds <torva...@osdl.org>
Date: Sun, 10 Apr 2005 18:10:10 +0200
Local: Sun, Apr 10 2005 12:10 pm
Subject: Re: more git updates..

On Sat, 9 Apr 2005 tony.l...@intel.com wrote:

> With 60,000 changesets in the current tree, we will start out our git
> repository with about 600,000 files.  Assuming the first byte of the
> SHA1 hash is random, that means an average of 2343 files in each of the
> objects/xx directories.  Give it a few more years at the current pace,
> and we'll have over 10,000 files per directory.  This sounds like a lot
> to me ... but perhaps filesystems now handle large directories enough
> better than they used to for this to not be a problem?

The good news is that git itself doesn't really care. I think it's
literally _one_ function ("get_sha1_filename()") that you need to change,
and then you need to write a small script that moves files around, and
you're really much done.

Also, I did actually debate that issue with myself, and decided that even
if we do have tons of files per directory, git doesn't much care. The
reason? Git never _searches_ for them. Assuming you have enough memory to
cache the tree, you just end up doing a "lookup", and inside the kernel
that's done using an efficient hash, which doesn't actually care _at_all_
about how many files there are per directory.

So I was for a while debating having a totally flat directory space, but
since there are _some_ downsides (linear lookup for cold-cache, and just
that "ls -l" ends up being O(n**2) and things), I decided that a single
fan-out is probably a good idea.

> Or maybe the files should be named objects/xx/yy/zzzzzzzzzzzzzzzz?

Hey, I may end up being wrong, and yes, maybe I should have done a
two-level one. The good news is that we can trivially fix it later (even
dynamically - we can make the "sha1 object tree layout" be a per-tree
config option, and there would be no real issue, so you could make small
projects use a flat version and big projects use a very deep structure
etc). You'd just have to script some renames to move the files around..

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Petr Baudis
Petr Baudis  
View profile  
 More options Apr 10 2005, 12:30 pm
Newsgroups: linux.kernel
From: Petr Baudis <pa...@ucw.cz>
Date: Sun, 10 Apr 2005 18:30:12 +0200
Local: Sun, Apr 10 2005 12:30 pm
Subject: [ANNOUNCE] git-pasky-0.1
  Hello,

  so I "released" git-pasky-0.1, my set of patches and scripts upon
Linus' git, aimed at human usability and to an extent a SCM-like usage.

  You can get it at

        http://pasky.or.cz/~pasky/dev/git/git-pasky-base.tar.bz2

and after unpacking and building (make) do

        git pull pasky

to get the latest changes from my branch. If you already have some git
from my branch which can do pulling, you can bring yourself up to date
by doing just

        gitpull.sh pasky

(but this style of usage is deprecated now). Please see the README for
some details regarding usage etc. You can find the changes from the last
announcement in the ChangeLog (the previous announcement corresponds to
commit id 5125d089ad862f16a306b4942155092e1dce1c2d). The most important
change is probably recursive diff addition, and making git ignore the
nsec of ctime and mtime, since it is totally unreliable and likes to
taint random files as modified.

  My near future plans include especially some merge support; I think it
should be rather easy, actually. I'll also add some simple tagging
mechanism. I've decided to postpone the file moving detection, since
there's no big demand for it now. ;-)

  I will also need to do more testing on the linux kernel tree.
Committing patch-2.6.7 on 2.6.6 kernel and then diffing results in

        $ time gitdiff.sh `parent-id` `tree-id` >p
        real    5m37.434s
        user    1m27.113s
        sys     2m41.036s

which is pretty horrible, it seems to me. Any benchmarking help is of
course welcomed, as well as any other feedback.

  BTW, what would be the best (most complete) source for the BK tree
metadata? Should I dig it from the BKCVS gateway, or is there a better
source? Where did you get the sparse git database from, Linus? (BTW, it
would be nice to get sparse.git with the directories as separate.)

  Have fun,

--
                                Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
98% of the time I am right. Why worry about the other 3%.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linus Torvalds  
View profile  
 More options Apr 10 2005, 1:00 pm
Newsgroups: linux.kernel
From: Linus Torvalds <torva...@osdl.org>
Date: Sun, 10 Apr 2005 19:00:13 +0200
Local: Sun, Apr 10 2005 1:00 pm
Subject: Re: [ANNOUNCE] git-pasky-0.1

On Sun, 10 Apr 2005, Petr Baudis wrote:

> Where did you get the sparse git database from, Linus? (BTW, it
> would be nice to get sparse.git with the directories as separate.)

When we were trying to figure out how to avert the BK disaster, and one of
Tridges concerns (and, in my opinion, the only really valid one) was that
you couldn't get the BK data in some SCM-independent way.

So I wrote some very preliminary scripts (on top of BK itself) to extract
the data, to show that BK could generate a SCM-neutral file format (a very
stupid one and horribly useless for anything but interoperability, but
still...). I was hoping that that would convince Tridge that trying to
muck around with the internal BK file format was not worth it, and avert
the BK trainwreck.

Larry was ok with the idea to make my export format actually be natively
supported by BK (ie the same way you have "bk export -tpatch"), but Tridge
wanted to instead get at the native data and be difficult about it. As a
result, I can now not only use BK any more, but we also don't have a nice
export format from BK.

Yeah, I'm a bit bitter about it.

Anyway, the sparse data came out of my hack. It's very inefficient, and I
estimated that doing the same for the kernel would have taken ten solid
days of conversion, mainly because my hack was really just that: a quick
hack to show that BK could do it. Larry could have done it a lot better.

I'll re-generate the sparse git-database at some point (and I'll probably
do so from the old GIT database itself, rather than re-generating it from
my old BK data).

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by Rutger Nijlunsing
Rutger Nijlunsing  
View profile  
 More options Apr 10 2005, 1:10 pm
Newsgroups: linux.kernel
From: Rutger Nijlunsing <rut...@nospam.com>
Date: Sun, 10 Apr 2005 19:10:12 +0200
Local: Sun, Apr 10 2005 1:10 pm
Subject: Re: more git updates..

[snip]

- merging.
  When the parent tree renames a file, it's easier for an out-of-tree
  patch to get up-to-date.

- reviewing.
  A huge patch with 2000 added lines and 1990 removed lines is more
  difficult to review then a rename + 10 lines patch.

> So consider me deficient, or consider me radical. It boils down to the
> same thing. Renames don't matter.

When you've got no out-of-tree patches since you've got the
parent-of-all-trees, then they don't matter, that's true :)

> So whether you agree with the things that _I_ consider important probably
> depends on how you work. The real downside of GIT may be that _my_ way of
> doing things is quite possibly very rare.

--
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Ingo Molnar
Ingo Molnar  
View profile  
 More options Apr 10 2005, 1:40 pm
Newsgroups: linux.kernel
From: Ingo Molnar <mi...@elte.hu>
Date: Sun, 10 Apr 2005 19:40:07 +0200
Local: Sun, Apr 10 2005 1:40 pm
Subject: Re: [ANNOUNCE] git-pasky-0.1

* Petr Baudis <pa...@ucw.cz> wrote:

>   I will also need to do more testing on the linux kernel tree.
> Committing patch-2.6.7 on 2.6.6 kernel and then diffing results in

>    $ time gitdiff.sh `parent-id` `tree-id` >p
>    real    5m37.434s
>    user    1m27.113s
>    sys     2m41.036s

> which is pretty horrible, it seems to me. Any benchmarking help is of
> course welcomed, as well as any other feedback.

it seems from the numbers that your system doesnt have enough RAM for
this and is getting IO-bound?

        Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by Rik van Riel
Rik van Riel  
View profile  
 More options Apr 10 2005, 1:40 pm
Newsgroups: linux.kernel
From: Rik van Riel <r...@redhat.com>
Date: Sun, 10 Apr 2005 19:40:14 +0200
Local: Sun, Apr 10 2005 1:40 pm
Subject: Re: more git updates..

On Sat, 9 Apr 2005, Linus Torvalds wrote:
> I've rsync'ed the new git repository to kernel.org, it should all be there
> in /pub/linux/kernel/people/torvalds/git.git/ (and it looks like the
> mirror scripts already picked it up on the public side too).

GCC 4 isn't very happy.  Mostly sign changes, but also something
that looks like a real error:

gcc -g -O3 -Wall   -c -o fsck-cache.o fsck-cache.c
fsck-cache.c: In function 'main':
fsck-cache.c:59: warning: control may reach end of non-void function 'fsck_tree' being inlined
fsck-cache.c:62: warning: control may reach end of non-void function 'fsck_commit' being inlined

I assume that fsck_tree and fsck_commit should complain loudly
if they ever get to that point - but since I'm not quite sure
there's no patch, sorry.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Jackson  
View profile  
 More options Apr 10 2005, 1:40 pm
Newsgroups: linux.kernel
From: Paul Jackson <p...@engr.sgi.com>
Date: Sun, 10 Apr 2005 19:40:12 +0200
Local: Sun, Apr 10 2005 1:40 pm
Subject: Re: more git updates..

Ralph wrote:
> but good enough for
> most uses that people will get caught out when it fails.

Exactly.

If Linus persists in this diff-tree output format, using two lines for
changed files, then I will have to add the following sed script to my
arsenal:

  sed '/^</ { N; s/\n>/ / }'

It collapses pairs of lines:

<100664 4870bcf91f8666fc788b07578fb7473eda795587 Makefile

>100664 5493a649bb33b9264e8ed26cc1f832989a307d3b Makefile

to the single line:

<100664 4870bcf91f8666fc788b07578fb7473eda795587 Makefile 100664 5493a649bb33b9264e8ed26cc1f832989a307d3b Makefile

However, more people will get bit by this git glitch than know sed.

--
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <p...@engr.sgi.com> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Ingo Molnar
Ingo Molnar  
View profile  
 More options Apr 10 2005, 1:50 pm
Newsgroups: linux.kernel
From: Ingo Molnar <mi...@elte.hu>
Date: Sun, 10 Apr 2005 19:50:07 +0200
Local: Sun, Apr 10 2005 1:50 pm
Subject: Re: [ANNOUNCE] git-pasky-0.1

* Willy Tarreau <wi...@w.ods.org> wrote:

probably not the only problem - but if we are lucky then his system was
just trashing within the kernel repository and then most of the overhead
is the _unnecessary_ IO that happened due to that (which causes CPU
overhead just as much). The dominant system time suggests so, to a
certain degree. Maybe this is wishful thinking.

        Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by Ingo Molnar
Ingo Molnar  
View profile  
 More options Apr 10 2005, 1:50 pm
Newsgroups: linux.kernel
From: Ingo Molnar <mi...@elte.hu>
Date: Sun, 10 Apr 2005 19:50:08 +0200
Local: Sun, Apr 10 2005 1:50 pm
Subject: Re: more git updates..

* Rik van Riel <r...@redhat.com> wrote:

> GCC 4 isn't very happy.  Mostly sign changes, but also something that
> looks like a real error:

> gcc -g -O3 -Wall   -c -o fsck-cache.o fsck-cache.c
> fsck-cache.c: In function 'main':
> fsck-cache.c:59: warning: control may reach end of non-void function 'fsck_tree' being inlined
> fsck-cache.c:62: warning: control may reach end of non-void function 'fsck_commit' being inlined

> I assume that fsck_tree and fsck_commit should complain loudly if they
> ever get to that point - but since I'm not quite sure there's no
> patch, sorry.

i sent a patch for most of the sign errors, but the above is a case gcc
not noticing that the function can never ever exit the loop, and thus
cannot get to the 'return' point.

        Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Willy Tarreau
Willy Tarreau  
View profile  
 More options Apr 10 2005, 1:50 pm
Newsgroups: linux.kernel
From: Willy Tarreau <wi...@w.ods.org>
Date: Sun, 10 Apr 2005 19:50:08 +0200
Local: Sun, Apr 10 2005 1:50 pm
Subject: Re: [ANNOUNCE] git-pasky-0.1

Not the only problem, without I/O, he will go down to 4m8s (u+s) which
is still in the same order of magnitude.

willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by Paul Jackson
Paul Jackson  
View profile  
 More options Apr 10 2005, 2:30 pm
Newsgroups: linux.kernel
From: Paul Jackson <p...@engr.sgi.com>
Date: Sun, 10 Apr 2005 20:30:17 +0200
Local: Sun, Apr 10 2005 2:30 pm
Subject: Re: more git updates..

Tony wrote:
> Or maybe the files should be named objects/xx/yy/zzzzzzzzzzzzzzzz?

I tend to size these things with the square root of the number of
leaf nodes.  If I have 2,560,000 leaves (your 10,000 files in each
of 16*16 directories), then I will aim for 1600 directories of
1600 leaves each.

My backup is sized for about this number of leaves, and it uses:

        xxx/xxxzzzzzzzzzzzzzzzz

(I repeat the xxx in the leaf name - easier to code.)

I don't think there is any need for two levels.  There are 4096
different values of three digit hex numbers.  That's ok in one
directory.

The only question would be 'xx' or 'xxx' - two or three digits.

This one is on the cusp in my view - either works.

--
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <p...@engr.sgi.com> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Petr Baudis
Petr Baudis  
View profile  
 More options Apr 10 2005, 3:00 pm
Newsgroups: linux.kernel
From: Petr Baudis <pa...@ucw.cz>
Date: Sun, 10 Apr 2005 21:00:09 +0200
Local: Sun, Apr 10 2005 3:00 pm
Subject: Re: Re: [ANNOUNCE] git-pasky-0.1
Dear diary, on Sun, Apr 10, 2005 at 07:45:12PM CEST, I got a letter
where Ingo Molnar <mi...@elte.hu> told me that...

It turns out to be the forks for doing all the cuts and such what is
bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about
15 forks per change, I guess, and for some reason cut takes a long of
time on its own.

I've rewritten the cuts with the use of bash arrays and other smart
stuff. I somehow don't feel comfortable using this and prefer the
old-fashioned ways, but it would be plain unusable without this.

Now I'm down to

        real    1m21.440s
        user    0m32.374s
        sys     0m42.200s

and I kinda doubt if it is possible to cut this much down. Almost no
disk activity, I have almost everything cached by now, apparently.

Anyway, you can git pull to get the optimized version.

Thanks for the help,

--
                                Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
98% of the time I am right. Why worry about the other 3%.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by Paul Jackson
Paul Jackson  
View profile  
 More options Apr 10 2005, 3:10 pm
Newsgroups: linux.kernel
From: Paul Jackson <p...@engr.sgi.com>
Date: Sun, 10 Apr 2005 21:10:09 +0200
Local: Sun, Apr 10 2005 3:10 pm
Subject: Re: more git updates..

Linus wrote:
>  It's a filesystem - although a
> fairly strange one.

Ah ha - that explains the read-tree and write-tree names.

The read-tree pulls stuff out of this file system into
your working files, clobbering local edits.  This is like
the read(2) system call, which clobbers stuff in your
read buffer.

The write-tree pushes stuff down into the file system,
just like write(2) pushes data into the kernel.

I was getting all kind of frustrated yesterday trying
to use Linus's git commands, coming at these names with my
SCM hat on.

That way of thinking really doesn't work well here.

I will have to look more closely at pasky's GIT toolkit
if I want to see an SCM style interface.

--
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <p...@engr.sgi.com> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Willy Tarreau
Willy Tarreau  
View profile  
 More options Apr 10 2005, 3:20 pm
Newsgroups: linux.kernel
From: Willy Tarreau <wi...@w.ods.org>
Date: Sun, 10 Apr 2005 21:20:12 +0200
Local: Sun, Apr 10 2005 3:20 pm
Subject: Re: Re: [ANNOUNCE] git-pasky-0.1

On Sun, Apr 10, 2005 at 08:45:22PM +0200, Petr Baudis wrote:
> It turns out to be the forks for doing all the cuts and such what is
> bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about
> 15 forks per change, I guess, and for some reason cut takes a long of
> time on its own.

> I've rewritten the cuts with the use of bash arrays and other smart
> stuff. I somehow don't feel comfortable using this and prefer the
> old-fashioned ways, but it would be plain unusable without this.

I've encountered the same problem in a config-generation script a while
ago. Fortunately, bash provides enough ways to remove most of the forks,
but the result is less portable.

I've downloaded your code, but it does not compile here because of the
tv_nsec fields in struct stat (2.4, glibc 2.2), so I cannot use it to
get the most up to date version to take a look at the script. Basically,
all the 'cut' and 'sed' can be removed, as well as the 'dirname'. You
can also call mkdir only if the dirs don't exist. I really think you
should end up with only one fork in the loop to call 'diff'.

> Now I'm down to

>    real    1m21.440s
>    user    0m32.374s
>    sys     0m42.200s

> and I kinda doubt if it is possible to cut this much down. Almost no
> disk activity, I have almost everything cached by now, apparently.

It is very common to cut times by a factor of 10 or more when replacing
common unix tools by pure shell. Dynamic library initialization also
takes a lot of time nowadays, and probably you have localisation which
is big too. Sometimes, just wiping a few variables at the top of the
shell might remove some useless overhead.

> Anyway, you can git pull to get the optimized version.

> Thanks for the help,

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by Paul Jackson
Paul Jackson  
View profile  
 More options Apr 10 2005, 3:30 pm
Newsgroups: linux.kernel
From: Paul Jackson <p...@engr.sgi.com>
Date: Sun, 10 Apr 2005 21:30:12 +0200
Local: Sun, Apr 10 2005 3:30 pm
Subject: Re: more git updates..

> Some thing like the following patch, may be turn off able.

Take out an old envelope and compute on it the odds of this
happening.

Say we have 10,000 kernel hackers, each producing one
new file every minute, for 100 hours a week.  And we've
cloned a small army of Andrew Morton's to integrate
the resulting tsunamai of patches.  And Linus is well
cared for in the state funny farm.

What is the probability that this check will fire even
once, between now and 10 billion years from now, when
the Sun has become a red giant destroying all life on
planet Earth?

--
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <p...@engr.sgi.com> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Sean
Sean  
View profile  
 More options Apr 10 2005, 4:00 pm
Newsgroups: linux.kernel
From: "Sean" <seanl...@sympatico.ca>
Date: Sun, 10 Apr 2005 22:00:20 +0200
Local: Sun, Apr 10 2005 4:00 pm
Subject: Re: [ANNOUNCE] git-pasky-0.1
On Sun, April 10, 2005 12:55 pm, Linus Torvalds said:

> Larry was ok with the idea to make my export format actually be natively
> supported by BK (ie the same way you have "bk export -tpatch"), but
> Tridge wanted to instead get at the native data and be difficult about
> it. As a result, I can now not only use BK any more, but we also don't
> have a nice export format from BK.

> Yeah, I'm a bit bitter about it.

Linus,

With all due respect, Larry could have dealt with this years ago and
removed the motivation for Tridge and others to pursue reverse
engineering.   Instead he chose to insult and question the motives of
everyone that wanted open-source access to the Linux history data.  The
blame for the current situation falls firmly on the choice to use a
closed-source SCM for Linux and the actions of the company that owned it.

Sean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Jackson  
View profile  
 More options Apr 10 2005, 4:50 pm
Newsgroups: linux.kernel
From: Paul Jackson <p...@engr.sgi.com>
Date: Sun, 10 Apr 2005 22:50:10 +0200
Local: Sun, Apr 10 2005 4:50 pm
Subject: Re: [ANNOUNCE] git-pasky-0.1
Good lord - you don't need to use arrays for this.

The old-fashioned ways have their ways.  Both the 'set'
command and the 'read' command can split args and assign
to distinct variable names.

Try something like the following:

  diff-tree -r $id1 $id2 |
        sed -e '/^</ { N; s/\n>/ / }' -e 's/./& /' |
        while read op mode1 sha1 name1 mode2 sha2 name2
        do
                ... various common stuff ...
                case "$op" in
                "+")
                        ...
                        ;;
                "-")
                        ...
                        ;;
                "<")
                        test $name1 = $name2 || die mismatched names
                        label1=$(mkbanner "$loc1" $id1 "$name1" $mode1 $sha1)
                        label2=$(mkbanner "$loc2" $id2 "$name1" $mode2 $sha2)
                        diff -L "$label1" -L "$label2" -u "$loc1" "$loc2"
                        ;;
                esac
        done

--
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <p...@engr.sgi.com> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linus Torvalds  
View profile  
 More options Apr 10 2005, 4:50 pm
Newsgroups: linux.kernel
From: Linus Torvalds <torva...@osdl.org>
Date: Sun, 10 Apr 2005 22:50:10 +0200
Local: Sun, Apr 10 2005 4:50 pm
Subject: Re: Re: [ANNOUNCE] git-pasky-0.1

On Sun, 10 Apr 2005, Petr Baudis wrote:

> It turns out to be the forks for doing all the cuts and such what is
> bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about
> 15 forks per change, I guess, and for some reason cut takes a long of
> time on its own.

Heh.

Can you pull my current repo, which has "diff-tree -R" that does what the
name suggests, and which should be faster than the 0.48 sec you see..

It may not matter a lot, since actually generating the diff from the file
contents is what is expensive, but remember my goal: I want the expense of
a diff-tree to be relative to the size of the diff, so that implies that
small diffs haev to be basically instantaenous. So I care.

So I just tried the 2.6.7->2.6.8 diff, and for me the new recursive
"diff-tree" can generate the _list_ of files changed in zero time:

        real    0m0.079s
        user    0m0.067s
        sys     0m0.024s

but then _doing_ the diff is pretty expensive (in this case 3800+ files
changed, so you have to unpack 7600+ objects - and even unpacking isn't
the expensive part, the expense is literally in the diff operation
itself).

Me, the stuff I automate is the small steps. Doing a single checkin. So
that's the case I care about going fast, when a "diff-tree" will likely
have maybe five files or something. That's why I want the small
incremental cases to go fast - it it takes me a minute to generate a diff
for a _release_, that's not a big deal. I make one release every other
month, but I work with lots of small patches all the time.

Anyway, with a fast diff-tree, you should be able to generate the list of
objects for a fast "merge". That's next.

(And by "merge", I of course mean "suck". I'm talking about the old CVS
three-way merge, and you have to specify the common parent explicitly and
it won't handle any renames or any other crud. But it would get us to
something that might actually be useful for simple things. Which is why
"diff-tree" is important - it gives the information about what to tell
merge).

                                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "more git updates.." by Linus Torvalds
Linus Torvalds  
View profile  
 More options Apr 10 2005, 5:00 pm
Newsgroups: linux.kernel
From: Linus Torvalds <torva...@osdl.org>
Date: Sun, 10 Apr 2005 23:00:13 +0200
Local: Sun, Apr 10 2005 5:00 pm
Subject: Re: more git updates..

On Sun, 10 Apr 2005, Paul Jackson wrote:

> Ah ha - that explains the read-tree and write-tree names.

> The read-tree pulls stuff out of this file system into
> your working files, clobbering local edits.  This is like
> the read(2) system call, which clobbers stuff in your
> read buffer.

Yes. Except it's a two-stage thing, where the staging area is always the
"current directory cache".

So a "read-tree" always reads the tree information into the directory
cache, but does not actually _update_ any of the files it "caches". To do
that, you need to do a "checkout-cache" phase.

Similarly, "write-tree" writes the current directory cache contents into a
set of tree files. But in order to have that match what is actually in
your directory right now, you need to have done a "update-cache"  phase
before you did the "write-tree".

So there is always a staging area between the "real contents" and the
"written tree".

> That way of thinking really doesn't work well here.

> I will have to look more closely at pasky's GIT toolkit
> if I want to see an SCM style interface.

Yes. You really should think of GIT as a filesystem, and of me as a
_systems_ person, not an SCM person. In fact, I tend to detest SCM's. I
think the reason I worked so well with BitKeeper is that Larry used to do
operating systems. He's also a systems person, not really an SCM person.
Or at least he's in between the two.

My operations are like the "system calls". Useless on their own: they're
not real applications, they're just how you read and write files in this
really strange filesystem. You need to wrap them up to make them do
anything sane.

For example, take "commit-tree" - it really just says that "this is the
new tree, and these other trees were its parents". It doesn't do any of
the actual work to _get_ those trees written.

So to actually do the high-level operation of a real commit, you need to
first update the current directory cache to match what you want to commit
(the "update-cache" phase).

Then, when your directory cache matches what you want to commit (which is
NOT necessarily the same thing as your actual current working area - if
you don't want to commit some of the changes you have in your tree, you
should avoid updating the cache with those changes), you do stage 2, ie
"write-tree". That writes a tree node that describes what you want to
commit.

Only THEN, as phase three, do you do the "commit-tree". Now you give it
the tree you want to commit (remember - that may not even match your
current directory contents), and the history of how you got here (ie you
tell commit what the previous commit(s) were), and the changelog.

So a "commit" in SCM-speak is actually three totally separate phases in my
filesystem thing, and each of the phases (except for the last
"commit-tree" which is the thing that brings it all together) is actually
in turn many smaller parts (ie "update-cache"  may have been called
hundreds of times, and "write-tree" will write several tree objects that
point to each other).

Similarly, a "checkout" really is about first finding the tree ID you want
to check out, and then bringing it into the "directory cache" by doing a
"read-tree" on it. You can then actually update the directory cache
further: you might "read-tree" _another_ project, or you could decide that
you want to keep one of the files you already had.

So in that scneario, after doing the read-tree you'd do an "update-cache"
on the file you want to keep in your current directory structure, which
updates your directory cache to be a _mix_ of the original tree you now
want to check out _and_ of the file you want to use from your current
directory. Then doing a "checkout-cache -a" will actually do the actual
checkout, and only at that point does your working directory really get
changed.

Btw, you don't even have to have any working directory files at all. Let's
say that you have two independent trees, and you want to create a new
commit that is the join of those two trees (where one of the trees take
precedence). You'd do a "read-tree <a> <b>", which will create a directory
cache (but not check out) that is the union of the <a> and <b> trees (<b>
will overrride). And then you can do a "write-tree" and commit the
resulting tree - without ever having _any_ of those files checked out.

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "git-pasky-0.1" by Petr Baudis
Petr Baudis  
View profile  
 More options Apr 10 2005, 5:30 pm
Newsgroups: linux.kernel
From: Petr Baudis <pa...@ucw.cz>
Date: Sun, 10 Apr 2005 23:30:19 +0200
Local: Sun, Apr 10 2005 5:30 pm
Subject: Re: Re: Re: [ANNOUNCE] git-pasky-0.1
Dear diary, on Sun, Apr 10, 2005 at 09:13:19PM CEST, I got a letter
where Willy Tarreau <wi...@w.ods.org> told me that...

Ok, I decided to stop this nsec madness (since it broke show-diff
anyway at least on my ext3), and you get it only if you pass -DNSEC
to CFLAGS now. Hope this fixes things for you. :-)

BTW, I regularly update the public copy as accessible on the web.

> all the 'cut' and 'sed' can be removed, as well as the 'dirname'. You
> can also call mkdir only if the dirs don't exist. I really think you
> should end up with only one fork in the loop to call 'diff'.

You still need to extract the file by cat-file too. ;-) And rm the files
after it compares them (so that we don't fill /tmp with crap like
certain awful programs like to do). But I will conditionalize the mkdir
calls, thanks for the suggestion - I think that's the last bit to be
squeezed from this loop (I'll yet check on the read proposal - I
considered it before and turned down for some reason, can't remember why
anymore, though).

Thanks,

--
                                Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
98% of the time I am right. Why worry about the other 3%.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 26 - 50 of 187 < Older  Newer >
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google