Hello all,
I’ve been using Mercurial on Windows via TortoiseHg on a large repository converted from CVS and for the most part it works well. We have a central repo hosted on Linux which we push to over ssh and we develop on Windows. However, there are persistent problems with line endings, and possibly whitespace generally, which myself and my colleagues don’t seem to readily be able to resolve.
Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings. This never happened before with CVS so it would appear to be something Mercurial was doing to the files, since nothing else has really changed. We still edit exclusively in Visual Studio, with the only other tool performing modifications being Mercurial/TortoiseHg. We couldn’t find why this occurred so we installed the hgeol extension, and set it to make all our source files convert to ‘native’ format, in the hope that this might help. Unfortunately this doesn’t seem to have fixed the problem as we are still getting the inconsistency message from Visual Studio on occasion, and occasionally being prompted to check in files with whitespace changes to each line in the file, indicating the line ending has been altered. (As you can imagine, this tends to muddy the change log and makes annotation difficult.)
Secondly, change detection seems not to work the way I would expect, and I can’t help but feel that this is related to whitespace or line endings too. One of our source generation tools regenerated a file with identical content, but it shows up in TortoiseHg as having been altered. A check with the command line tool shows that the file is indeed there as modified in “hg status”. When I try “hg diff” on that file, there’s no change reported. (I spotted FAQ 4.9: “hg status shows changed files but hg diff doesn't!”, and tried the --git flag, but it still reports there as being no difference.) It can’t be going on the timestamp, surely?
So, does anybody have any idea why:
- Mercurial appears to do something funny to the line endings
- The eol extension doesn’t appear to be converting things consistently
- Hg status says a file has changes but hg diff suggests that it doesn’t?
Any help or suggestions appreciated!
--
Ben Sizer
Any help or suggestions appreciated!
On 9/2/2010 6:37 PM, Ben Sizer wrote:Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings. This never happened before with CVS so it would
_______________________________________________
Mercurial mailing list
Merc...@selenic.com
http://selenic.com/mailman/listinfo/mercurial
> Ben Sizer wrote:
> > Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings.
> We used to have this problem, but it was Visual Studio causing the problem (also, we were using subversion).
How could you tell it was caused by Visual Studio?
It would seem unlikely that, given several developers all using Visual Studio on Windows, that it would be somehow inserting anything other than \r\n endings. Indeed when we used CVS, there were never any such warnings. We checked in consistent files and got consistent files out. Even when CVS had to do merges, the line endings would be fine. Obviously it's impossible to rule out the possibility that somehow we never saw the inconsistency and that CVS fixed it all up for us in flight, but it seems more likely that Mercurial is getting something wrong at the merge stage, possibly to do with applying Windows-format changesets on a Linux system when we push, I don't know.
> Christian Boos wrote:
>
>> The problem, if it is considered to be one, can be reproduced using
>> the following script, with the eol extension active, on Windows (so
>> native == CRLF):
>>
>> $ hg init eol-issue
>> $ cd eol-issue
>> $ echo -e "CRLF\r" > crlf
>> $ hg ci -Am "file with crlf in repos"
>> adding crlf
>> $ hg status
>> $ echo "[patterns]" > .hgeol
>> $ echo "** = native" >> .hgeol
>> $ hg status
>> M crlf
>> ? .hgeol
>>
>> But:
>>
>> $ hg diff
>> diff -r 0b6d36c04da6 crlf
>> --- a/crlf Thu Sep 02 20:48:09 2010 +0200
>> +++ b/crlf Thu Sep 02 20:51:48 2010 +0200
>> @@ -1,1 +1,1 @@
>> -CRLF
>> +CRLF
Is this diff now just showing you that the crlf file will be modified by
the next commit? I know you cannot really see the change in line endings
but I think that is more a problem with the diff format.
When you say '** = native', you are asking for files to have native line
endings in the working copy and *LF* line endings in the repository. Use
[repository]
native = CRLF
in the .hgeol file if you want to override what the repository-native
line endings should be.
> Interesting test case. I expect my idea of what end of line handling
> should do may differ from other people's, but I would have thought
> that setting up a system to handle line ending conversions should
> actually result in fewer commits, not more of them, ie. it becomes
> more tolerant of differing line endings. This seems to be the
> opposite, where setting a conversion preference can actually make the
> system believe a file has outstanding changing when you've not touched
> it. Something like this could possibly explain some of the issues I've
> been seeing.
The eol extension will *normalize* the line endings stored in the
repository, so yes, it can certainly be the case that a file now has
outstanding changes when you enable the eol extension.
It would be nice if the extension would take note of the existing line
endings and automatically convert back and forth between them -- see
this issue which was opened just yesterday:
http://mercurial.selenic.com/bts/issue2355
--
Martin Geisler
Mercurial links: http://mercurial.ch/
> The eol extension will *normalize* the line endings stored in the repository,
> so yes, it can certainly be the case that a file now has outstanding changes
> when you enable the eol extension.
Ok. I appreciate that once you enable this option, a conversion needs to take place. However I would suggest that ideally (a) the conversion shouldn't occur until you have a changeset to commit involving that file, and (b) the conversion should not form part of a changeset itself, but rather be something that occurs after the change has been applied to the data and before it hits the disk in the repository. After all, what I want from line-ending handling is for it to be handled transparently, not for there to be changesets with nothing but line-ending alterations there.
Perhaps this was not your use case when developing this extension? Do you have any advice on achieving this or working around it? Or indeed, any insight into what exactly Mercurial is doing with the line-endings? I know the eol extension is not the root cause of our issues as we had the problem before we used it. But it doesn't appear to solve them in a way that we can use either. :)
> Martin Geisler wrote:
>
>> The eol extension will *normalize* the line endings stored in the
>> repository, so yes, it can certainly be the case that a file now has
>> outstanding changes when you enable the eol extension.
>
> Ok. I appreciate that once you enable this option, a conversion needs
> to take place. However I would suggest that ideally (a) the conversion
> shouldn't occur until you have a changeset to commit involving that
> file, and (b) the conversion should not form part of a changeset
> itself, but rather be something that occurs after the change has been
> applied to the data and before it hits the disk in the repository.
You cannot have a change that is not part of a changeset -- we have no
"room" to store such a change.
> After all, what I want from line-ending handling is for it to be
> handled transparently, not for there to be changesets with nothing but
> line-ending alterations there.
If you have file with the "wrong" line endings in the repository, then
there will be one such changeset after you enable the extension. That
should be all.
> Perhaps this was not your use case when developing this extension?
The use case is projects involving people on different platforms where
everybody wants to have native line endings.
> Do you have any advice on achieving this or working around it? Or
> indeed, any insight into what exactly Mercurial is doing with the
> line-endings? I know the eol extension is not the root cause of our
> issues as we had the problem before we used it. But it doesn't appear
> to solve them in a way that we can use either. :)
Well, standard Mercurial wont touch your line endings at all. Mercurial
treats all files as binary and so it gives you back the bytes you gave
it originally.
Perhaps it would be simpler for you to just not use the eol extension
and commit the files with CRLF line endings in the repository? After
all, it's no crime to have CRLF files in a repository :)
[snip]
> other than \r\n endings. Indeed when we used CVS, there were never
> any such warnings. We checked in consistent files and got consistent
> files out. Even when CVS had to do merges, the line endings would be
> fine. Obviously it's impossible to rule out the possibility that
> somehow we never saw the inconsistency and that CVS fixed it all up
> for us in flight, but it seems more likely that Mercurial is getting
Depends on what CVS you had been using. CVSNT was very good at that
point. Internally, they used the Linux line ending, but on Windows (or
even Mac IIRC), they changed them while checking out or updating...
Best regards
Andreas
--
("`-''-/").___..--''"`-._
`o_ o ) `-. ( ).`-.__.`)
(_Y_.)' ._ ) `._ `. ``-..-'
_..`--'_..-_/ /--'_.' .'
(il).-'' (li).' ((!.-'
Andreas Tscharner an...@vis.ethz.ch ICQ-No. 14356454
> Ben Sizer wrote:
> Ok. I appreciate that once you enable this option, a conversion needs
> to take place. However I would suggest that ideally (a) the conversion
> shouldn't occur until you have a changeset to commit involving that
> file, and (b) the conversion should not form part of a changeset
> itself, but rather be something that occurs after the change has been
> applied to the data and before it hits the disk in the repository.
> You cannot have a change that is not part of a changeset -- we have no
> "room" to store such a change.
Yeah, that makes sense. In that case, it seems like this is the sort of change that I'd like to happen automatically after an update and before a commit, so that there is no change to what is in the repository, just to what happens to your working copy. At least, that's the use case that I would want.
> > After all, what I want from line-ending handling is for it to be
> > handled transparently, not for there to be changesets with nothing but
> > line-ending alterations there.
> If you have file with the "wrong" line endings in the repository, then there will be one such changeset after you enable the extension. That should be all.
Unfortunately just one such changeset is enough to ruin hg annotate, polluting it with a visible change to every line that carries little useful information.
Additionally, we're finding there are lots and lots of individual changesets for line ending changes, as we gradually work through various files. This makes tracking the important changes somewhat less convenient.
(This isn't specific to the eol extension. It's just a problem we're finding with the line ending issues we have.)
> > Do you have any advice on achieving this or working around it? Or
> > indeed, any insight into what exactly Mercurial is doing with the
> > line-endings? I know the eol extension is not the root cause of our
> > issues as we had the problem before we used it. But it doesn't appear
> > to solve them in a way that we can use either. :)
> Well, standard Mercurial wont touch your line endings at all. Mercurial
> treats all files as binary and so it gives you back the bytes you gave it
> originally.
I'm having trouble seeing how that is true in practice, since we are pushing Windows format files to a Linux repository, and when others pull them back to Windows, Visual Studio claims the line endings are inconsistent. Now, I certainly won't claim that Visual Studio is perfect, but this is a problem that only appeared when we migrated to Mercurial from CVS.
Perhaps there is something else messing things up, like Kdiff3. Nothing else has changed in our tool chain except we converted the repository and moved from WinCVS to TortoiseHg for individual developers.
> Perhaps it would be simpler for you to just not use the eol extension and
> commit the files with CRLF line endings in the repository? After all, it's
> no crime to have CRLF files in a repository :)
That's exactly what we were doing, but somehow the line endings were (and still are) coming back inconsistent. Harvey Chapman wrote yesterday that he felt that Visual Studio was the cause of the problem, and maybe he's right, but it's hard to see how that would be the case given that there were no such problems with CVS and that they appeared as soon as we moved to Mercurial/TortoiseHg with no other changes.
--
Ben Sizer
> Ben Sizer wrote:
> > Even when CVS had to do merges, the line endings would be fine.
> > Obviously it's impossible to rule out the possibility that somehow we
> > never saw the inconsistency and that CVS fixed it all up for us in
> > flight, but it seems more likely that Mercurial is getting
> Depends on what CVS you had been using. CVSNT was very good at that
> point. Internally, they used the Linux line ending, but on Windows
> (or even Mac IIRC), they changed them while checking out or updating...
That definitely makes some sort of sense. That's the sort of behaviour I'd like to see from Mercurial, doing that sort of change at that stage, if you explicitly configure it to do so of course.
It doesn't explain how the inconsistent line endings are arriving in the first place however. I suppose I need some way of diagnosing exactly what state each version of the file is in: working copy, local repo copy, and remote repo copy.
--
Ben Sizer
Ah yes, that's right, if I do 'hg ci', there will indeed be a commit
created, which is also not completely intuitive as both the repository
and the working directory had exactly the same byte content (CRLF).
But after the commit, the repository content has been "normalized" to
LF. Good!
> When you say '** = native', you are asking for files to have native line
> endings in the working copy and *LF* line endings in the repository.
Ok, that's what I wanted. More precisely, I wanted to have native line
endings in the working copy and actually didn't care what's in the
repository, as I didn't expect it would make a difference. But, my
fault, I overlooked the [repository] section of .hgeol and indeed the
extension help states that the default storage for "native" files is
"LF". I guess I expected the "auto" value suggested in issue2355 to be
the default...
> The eol extension will *normalize* the line endings stored in the
> repository, so yes, it can certainly be the case that a file now has
> outstanding changes when you enable the eol extension.
>
> It would be nice if the extension would take note of the existing line
> endings and automatically convert back and forth between them -- see
> this issue which was opened just yesterday:
>
> http://mercurial.selenic.com/bts/issue2355
Thanks for the detailed reply, and a +1 for the "auto" feature which
would make an useful default (assuming mixed line endings are not
allowed, detecting the repository format on the fly shouldn't be a problem).
-- Christian
> Martin Geisler wrote:
>
>> You cannot have a change that is not part of a changeset -- we have
>> no "room" to store such a change.
>
> Yeah, that makes sense. In that case, it seems like this is the sort
> of change that I'd like to happen automatically after an update and
> before a commit, so that there is no change to what is in the
> repository, just to what happens to your working copy. At least,
> that's the use case that I would want.
That is actually also what the eol extension does: it installs a set of
filters that are applied on all bytes read and written to the working
copy.
So on Windows, when you read LF bytes from the repository (the actual
history) it will write CRLF bytes to the working copy. And when you
commit your CRLF bytes from the working copy, they are filtered back to
LF bytes in the repository.
>> If you have file with the "wrong" line endings in the repository,
>> then there will be one such changeset after you enable the extension.
>> That should be all.
>
> Unfortunately just one such changeset is enough to ruin hg annotate,
> polluting it with a visible change to every line that carries little
> useful information.
I don't think it will ruin anything -- use annotate in TortoiseHg and
when you see line ending change, just right-click on the line and choose
'Annotate Parent'. That way it is super easy to "peel off" each change
until you reach the one you are looking for.
> Additionally, we're finding there are lots and lots of individual
> changesets for line ending changes, as we gradually work through
> various files. This makes tracking the important changes somewhat less
> convenient.
It should be just one change. If you in a clean working copy do
hg update null
hg update
then you should ensure that the filters are run on all files and so all
relevant changes can be committed in one changeset.
The problem I worry about is that the eol extension does indeed work on
a file-by-file basis, so if you enable it while you have lots of files
checked out, then they wont be converted.
>> Well, standard Mercurial wont touch your line endings at all.
>> Mercurial treats all files as binary and so it gives you back the
>> bytes you gave it originally.
>
> I'm having trouble seeing how that is true in practice, since we are
> pushing Windows format files to a Linux repository, and when others
> pull them back to Windows, Visual Studio claims the line endings are
> inconsistent.
Please check what extensions you have enabled and please try to
reproduce with with a small file for which you can make a hexdump.
Mercurial is really no touching the bytes by default -- you have to go
out of your way to make it do that by enabling extensions such as the
eol or the keyword extensions.
>> Perhaps it would be simpler for you to just not use the eol extension
>> and commit the files with CRLF line endings in the repository? After
>> all, it's no crime to have CRLF files in a repository :)
>
> That's exactly what we were doing, but somehow the line endings were
> (and still are) coming back inconsistent. Harvey Chapman wrote
> yesterday that he felt that Visual Studio was the cause of the
> problem, and maybe he's right, but it's hard to see how that would be
> the case given that there were no such problems with CVS and that they
> appeared as soon as we moved to Mercurial/TortoiseHg with no other
> changes.
Strange... I think older versions of TortoiseHg came with the win32text
extension enabled by default. However, the eol extension aborts if it
sees that win32text is loaded so I don't think that is your problem.
> Harvey Chapman wrote:
>
>> Ben Sizer wrote:
>>> Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings.
>
>> We used to have this problem, but it was Visual Studio causing the problem (also, we were using subversion).
>
> How could you tell it was caused by Visual Studio?
I'm working from foggy memory here. It wasn't consistent. I think it happened when we copy/pasted code from a webpage or some other CRLF source into a LF-only file in a VS editor window. I think VS would preserve the CRs. Actually, this kind of makes sense to me now since we used to copy/paste unicode characters into source files for testing as well because that was the easiest way to throw some foreign characters into our test code. In other words, I'd be upset if VS modified a "CRLF" that is actually part of some foreign characters. Also, we often use VS in VMWare machines and we do copy/paste from one operating system to another. I could be completely wrong about this, but I think that was the problem.
Even if it wasn't VS, it wasn't mercurial in our case because we were using SVN and had never even heard of Hg (oh, how I wish we had).
>Ben Sizer wrote:
>>
>> Yeah, that makes sense. In that case, it seems like this is the sort
>> of change that I'd like to happen automatically after an update and
>> before a commit, so that there is no change to what is in the
>> repository, just to what happens to your working copy. At least,
>> that's the use case that I would want.
> That is actually also what the eol extension does: it installs a set
> of filters that are applied on all bytes read and written to the
> working copy.
Ideally, that would be enough for us. But in practice it's not worked out. We don't really care how the data is stored, just as long as the endings stay consistent, but for some reason they're not.
>>> If you have file with the "wrong" line endings in the repository,
>>> then there will be one such changeset after you enable the extension.
>>> That should be all.
>>
>> Unfortunately just one such changeset is enough to ruin hg annotate,
>> polluting it with a visible change to every line that carries little
>> useful information.
>
> I don't think it will ruin anything -- use annotate in TortoiseHg and
> when you see line ending change, just right-click on the line and choose
> 'Annotate Parent'. That way it is >super easy to "peel off" each change
> until you reach the one you are looking for.
Ah, thanks for the tip. That definitely makes it a bit more manageable.
>> Additionally, we're finding there are lots and lots of individual
>> changesets for line ending changes, as we gradually work through
>> various files. This makes tracking the important changes somewhat less
>> convenient.
>
>It should be just one change. If you in a clean working copy do
>
> hg update null
> hg update
>
>then you should ensure that the filters are run on all files and so all
>relevant changes can be committed in one changeset.
That's not the experience we had, unfortunately. We had clean repositories after enabling the eol extension, and then the next commit did indeed change a large number of files in the way you are describing, typically replacing the whole file. But it didn't convert every file, and some of the remaining ones have given us problems since then.
>>> Well, standard Mercurial wont touch your line endings at all.
>>> Mercurial treats all files as binary and so it gives you back the
>>> bytes you gave it originally.
>>
>> I'm having trouble seeing how that is true in practice, since we are
>> pushing Windows format files to a Linux repository, and when others
>> pull them back to Windows, Visual Studio claims the line endings are
>> inconsistent.
>
>Please check what extensions you have enabled and please try to
>reproduce with with a small file for which you can make a hexdump.
Locally, our copies of TortoiseHg just have eol enabled, with our source files set to 'native' locally. Remotely, our central push repo on a virtual Linux box appears to not have any extensions enabled at all. I wonder if this asymmetry is part of the problem.
Unfortunately we really don't know what causes this to happen so deliberately producing a test case is going to be difficult. But when it next arises, I'll try and get a copy of the file.
--
Ben Sizer
I can confirm older versions did load the win32text extension in the
site-wide Mercurial.ini file, but none of the conversion hooks were
enabled so it was mostly harmless. We quit enabling win32text at all
a long time ago. The first version of Mercurial.ini checked into our
repo in the 0.7 time frame (17 months ago) already had it disabled.
--
Steve Borho
> Martin Geisler wrote:
>
>> Please check what extensions you have enabled and please try to
>> reproduce with with a small file for which you can make a hexdump.
>
> Locally, our copies of TortoiseHg just have eol enabled, with our
> source files set to 'native' locally. Remotely, our central push repo
> on a virtual Linux box appears to not have any extensions enabled at
> all. I wonder if this asymmetry is part of the problem.
No, extensions on the server is not the problem. All changesets in
Mercurial contain a hash value which defines their "identity" and that
would change drastically if even a single bit is flipped by the server.
So if you see a changeset with hash value 41c42b69055b on the server and
you pull that into your own clone, then you can be certain that they are
identical, bit for bit.
> Unfortunately we really don't know what causes this to happen so
> deliberately producing a test case is going to be difficult. But when
> it next arises, I'll try and get a copy of the file.
Okay. Until then, please disable the eol extension too and just commit
the files with CRLF into the repository. That seems like the most
sensible way to manage your project since I don't think you need to
convert the files the LF files at all.
Yeah, sorry, that idea did not make much sense now that I think about it
again.
> On Sep 3, 2010, at 5:53 AM, Ben Sizer wrote:
>
> > Harvey Chapman wrote:
> >
> >> Ben Sizer wrote:
> >>> Firstly, we were finding that Visual Studio kept telling us that files had inconsistent line endings.
> >
> >> We used to have this problem, but it was Visual Studio causing the problem (also, we were using subversion).
> >
> > How could you tell it was caused by Visual Studio?
>
> I'm working from foggy memory here. It wasn't consistent.
I know from personal experience you can get inconsistent line endings
from subversion without Visual Studio being part of the
equation. Subversion has a per-user setting that does what the eol
extension does for hg. Which just made things worse unless everyone
used it, as it created three types of users: two types who added their
native line endings when adding/editing text, and one type that
converted all the line endings to the canonical form whenever they
checked in a file.
<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
[...example snipped...]
>> Is this diff now just showing you that the crlf file will be modified
>> by the next commit? I know you cannot really see the change in line
>> endings but I think that is more a problem with the diff format.
>
>Ah yes, that's right, if I do 'hg ci', there will indeed be a commit created,
>which is also not completely intuitive as both the repository and the working
>directory had exactly the same byte content (CRLF).
>But after the commit, the repository content has been "normalized" to LF. Good!
The strange thing is, this is not the behaviour I see. Or at least, not the way I understand it.
My .hgeol contains this line:
**.cpp = native
Native is Windows in this case. We don't explicitly specify a repository format.
I have a file in my local repository that is in Unix format, for whatever reason. (Loads of them in fact, which I expect is the underyling issue for my problems.)
Mercurial - with eol enabled - does not believe this file has any amendments waiting to check in - is this because, since both the file type and the repository are both 'native', it assumes no conversion needs ever take place?
When I introduce an arbitrary modification to the file, the diff looks like normal - only the specific modification is being noted. It's making no attempt to convert the other lines. My non-native file is apparently going to be committed in non-native format.
Yet I have seen examples like the one Christian posted - where a file appears to have changes that you need to check in, purely because the eol extension spotted that the line endings weren't what it expected. I don't understand why it happens some times and not others.
In fact, if I shelve the change, it still shows up as modified - presumably because it has the hidden line changes to commit. If I do this with eol switched off, the file is unmodified, as expected.
(Note that it wouldn't do that otherwise, which is what I noted in an earlier email when I pointed out that you do not actually get a single commit changing all your line endings when you switch eol on, as was suggested - you appear to get individual ones when you make commits for other reasons.)
Once I committed this, the changeset contained no files, and no patch. This phantom changeset is very bizarre.
>> When you say '** = native', you are asking for files to have native
>> line endings in the working copy and *LF* line endings in the repository.
>
>> The eol extension will *normalize* the line endings stored in the
>> repository, so yes, it can certainly be the case that a file now has
>> outstanding changes when you enable the eol extension.
And this is what I find strange - normalisation, to me, implies converting to a known quantity. But in this case, it's as if it won't attempt to do so, because it's already 'outside' the normal conversion path. I would hope that an extension could make an attempt to convert any file with consistent endings to the 'normal' type. (And I would hope that it does this transparently, although it seems from previous answers that sometimes it does this, sometimes it needs an explicit change, and I don't quite understand which is which yet.)
--
Ben Sizer
I should add that it did actually change the file in my working copy from Unix to Windows/native, however.
> From: Christian Boos [mailto:cb...@neuf.fr]
>
>> Ah yes, that's right, if I do 'hg ci', there will indeed be a commit
>> created, which is also not completely intuitive as both the
>> repository and the working directory had exactly the same byte
>> content (CRLF). But after the commit, the repository content has been
>> "normalized" to LF. Good!
>
> The strange thing is, this is not the behaviour I see. Or at least,
> not the way I understand it.
>
> My .hgeol contains this line:
> **.cpp = native
>
> Native is Windows in this case. We don't explicitly specify a
> repository format.
Okay -- the repository format will then default to LF (Unix format).
> I have a file in my local repository that is in Unix format, for
> whatever reason. (Loads of them in fact, which I expect is the
> underyling issue for my problems.)
>
> Mercurial - with eol enabled - does not believe this file has any
> amendments waiting to check in - is this because, since both the file
> type and the repository are both 'native', it assumes no conversion
> needs ever take place?
It is probably because the file was added with Unix format before you
enabled the extension or before you added the .hgeol file in this
repository. What happens if you do
hg update null
hg update
That should empty your working copy and then checkout all files again.
It is during the *checkout* that the extension has a chance to modify
the file content that is written to disk. In this case, it should go in
and change the LF to CRLF when writing the file to the working copy.
After this 'hg status' and 'hg diff' should show no changes.
> When I introduce an arbitrary modification to the file, the diff looks
> like normal - only the specific modification is being noted.
Yes, this is because the diff is computed as follows:
1) the file is read from the working copy
2) a filter is run that turns the file into Unix format (repository
native format). In this case the filter is a no-op since you happen
to have the file in Unix format already.
3) the filtered working copy file is compared with the version stored in
the repository, which is also in Unix format. This results in a small
change.
> It's making no attempt to convert the other lines. My non-native file
> is apparently going to be committed in non-native format.
It is committed in LF format since tht is the repository-native format.
> Yet I have seen examples like the one Christian posted - where a file
> appears to have changes that you need to check in, purely because the
> eol extension spotted that the line endings weren't what it expected.
> I don't understand why it happens some times and not others.
That should happen when you have a file that is already committed with
Windows format in the repository. After you enable the rule in .hgeol,
that file will be converted -- hopefully it is converted after a new
fresh checkout.
> In fact, if I shelve the change, it still shows up as modified -
> presumably because it has the hidden line changes to commit. If I do
> this with eol switched off, the file is unmodified, as expected.
>
> (Note that it wouldn't do that otherwise, which is what I noted in an
> earlier email when I pointed out that you do not actually get a single
> commit changing all your line endings when you switch eol on, as was
> suggested - you appear to get individual ones when you make commits
> for other reasons.)
>
> Once I committed this, the changeset contained no files, and no patch.
> This phantom changeset is very bizarre.
That is strange... can you reproduce this with a small test case? That
would be very helpful.
>>> When you say '** = native', you are asking for files to have native
>>> line endings in the working copy and *LF* line endings in the
>>> repository.
>>
>>> The eol extension will *normalize* the line endings stored in the
>>> repository, so yes, it can certainly be the case that a file now has
>>> outstanding changes when you enable the eol extension.
>
> And this is what I find strange - normalisation, to me, implies
> converting to a known quantity. But in this case, it's as if it won't
> attempt to do so, because it's already 'outside' the normal conversion
> path. I would hope that an extension could make an attempt to convert
> any file with consistent endings to the 'normal' type. (And I would
> hope that it does this transparently, although it seems from previous
> answers that sometimes it does this, sometimes it needs an explicit
> change, and I don't quite understand which is which yet.)
I also wish it was transparent :( However, until we get a reproducible
test case there is not much I can do about it.
>> Mercurial - with eol enabled - does not believe this file has any
>> amendments waiting to check in - is this because, since both the file
>> type and the repository are both 'native', it assumes no conversion
>> needs ever take place?
>
>It is probably because the file was added with Unix format before you enabled the extension or before you added the .hgeol file in this repository. What happens if you do
>
> hg update null
> hg update
>
>That should empty your working copy and then checkout all files again.
Thanks for all your explanations, Martin.
I guess that's the problem - since we never cleared out all the files, and just did an update to the 'clean' copy, we assumed everything was ready to go, but in fact we have working directories full of files in the wrong format. I'll look into this, although I think it might be easier just to change it to stop converting anything to Windows format. I'd also like to avoid any additional changesets being added in the process.
I think the problem is that we've tried to use eol to fix our line ending consistency problem, the way that WinCVS did, but it doesn't look like eol is designed for that. As a result we've converted files that we didn't need to convert and caused ourselves more trouble as a result!
>> Yet I have seen examples like the one Christian posted - where a file
>> appears to have changes that you need to check in, purely because the
>> eol extension spotted that the line endings weren't what it expected.
>> I don't understand why it happens some times and not others.
>
>That should happen when you have a file that is already committed with
>Windows format in the repository. After you enable the rule in .hgeol,
>that file will be converted -- >hopefully it is converted after a new
>fresh checkout.
But since we didn't do a fresh checkout, that's why we didn't get the "one big fix" changeset, and is why it'll happen to any file we change in the future, right? Unless we switch eol off, or switch it to using Unix for the working copy?
>> In fact, if I shelve the change, it still shows up as modified -
>> presumably because it has the hidden line changes to commit. If I do
>> this with eol switched off, the file is unmodified, as expected.
>>
>> (Note that it wouldn't do that otherwise, which is what I noted in an
>> earlier email when I pointed out that you do not actually get a single
>> commit changing all your line endings when you switch eol on, as was
>> suggested - you appear to get individual ones when you make commits
>> for other reasons.)
>>
>> Once I committed this, the changeset contained no files, and no patch.
>> This phantom changeset is very bizarre.
>
>That is strange... can you reproduce this with a small test case? That
>would be very helpful.
Unfortunately not. I think it might be something specific to the TortoiseHg shelve system. Simply adding and deleting the change locally as a sort of fake shelve doesn't have the same effect and I don't have the normal shelve extension installed.
Here are the symptoms, anyway, of an empty changeset (which I can't seem to remove):
D:\Dev>hg log -r 33660 -v -p
changeset: 33660:166d3aae35bb
tag: tip
user: Ben Sizer <be...@monumentalgames.com>
date: Tue Sep 07 16:34:19 2010 +0100
description:
test (back me out!)
D:\Dev>hg backout 33660
nothing changed
changeset 33660:166d3aae35bb backs out changeset 33660:166d3aae35bb
D:\Dev>hg log -r 33660 -v -p
changeset: 33660:166d3aae35bb
tag: tip
user: Ben Sizer <be...@monumentalgames.com>
date: Tue Sep 07 16:34:19 2010 +0100
description:
test (back me out!)
D:\Dev>
To be honest it looks like a flaw either in TortoiseHg or Mercurial itself - although I am well aware that 90% of the time, the person who says that is wrong! :)
> From: Martin Geisler [mailto:m...@mgsys.dk] On Behalf Of Martin Geisler
>
> Thanks for all your explanations, Martin.
You're welcome :)
> I guess that's the problem - since we never cleared out all the files,
> and just did an update to the 'clean' copy, we assumed everything was
> ready to go, but in fact we have working directories full of files in
> the wrong format. I'll look into this, although I think it might be
> easier just to change it to stop converting anything to Windows
> format. I'd also like to avoid any additional changesets being added
> in the process.
Right -- if you don't have any need for files to be in Unix format, then
don't use the eol extension. That extension is only there for people who
has working copies on both Windows and Linux/Mac, *and* who has users
that dislike having non-native line endings in their files.
All editors I know on Linux/Mac have no problem with Windows line
endings and there are also plenty of editors on Winodws that understand
Unix line endings just fine.
>>> Yet I have seen examples like the one Christian posted - where a
>>> file appears to have changes that you need to check in, purely
>>> because the eol extension spotted that the line endings weren't what
>>> it expected. I don't understand why it happens some times and not
>>> others.
>>
>> That should happen when you have a file that is already committed
>> with Windows format in the repository. After you enable the rule in
>> .hgeol, that file will be converted -- >hopefully it is converted
>> after a new fresh checkout.
>
> But since we didn't do a fresh checkout, that's why we didn't get the
> "one big fix" changeset, and is why it'll happen to any file we change
> in the future, right? Unless we switch eol off, or switch it to using
> Unix for the working copy?
As long as you leave the eol extension enabled, if will try to convert
the files it come across.
Yeah, that looks like a bug -- normally, empty changesets make no sense,
so we try to prevent them. But as you see, Mercurial can obviously make
them via the internal API.
Anyway I have been reading through this thread and I don't think I see
a good answer to what should be done. So I'm going to ask:
If I have inconsistent line endings in a repository, what should I do
to get rid of the inconsistencies?
Matt Schulte
On windows:
1) Setup mercurial with the eol conversion of your choice (I use
cleverencode/cleverdecode).
2) Install cygwin (so you have the find and xargs commands).
3) Run this:
hg clone https://....whatever.../ mydir
cd mydir
# ... wait a few seconds ...
find . \( -name .hg -prune \) -o -type f -print0 | xargs -0 -n 50 touch
hg ci -m "Fix EOL characters"
hg push
This will change the timestamp on every file, causing cleverencode to
try re-encoding it (aka strip of the EOL characters), and causing a
corrected version of all files to be checked in.
eric
We use Mercurial at work with 150+ devs on Windows. My strong advice is
this: Don't use any source-altering extensions such as EOL. It's not
necessary if you're only on Windows.
We have used our current setup for a year or so, without problems,
developing with Visual Studio. I promise you, Mercurial doesn't change
anything in the files unless you tell it to.
Like Martin also said, Mercurial never changes anything in files checked
in or out on its own.
/Sune
Unfortunately we have a repository of Unix format files, which were migrated from CVS. Our Mercurial repository is on Linux, and we build on Linux too. It's just that the development takes place on Windows.
As such, we are going to be continuing with the eol extension, having cleaned out all the old Unix format copies, to try and get all the files into Windows format in future.
> We have used our current setup for a year or so, without problems, developing with Visual Studio. I promise you, Mercurial doesn't change anything in the files unless you tell > it to.
Yes, it appears that it is more a case that all the other source control systems perform invisible fixes that Mercurial doesn't.
--
Ben Sizer
> From: Sune Foldager [mailto:cr...@cyanite.org]
>>
>> We use Mercurial at work with 150+ devs on Windows. My strong advice
>> is this: Don't use any source-altering extensions such as EOL. It's not
>> necessary if you're only on Windows.
>
> Unfortunately we have a repository of Unix format files, which were
> migrated from CVS. Our Mercurial repository is on Linux, and we build
> on Linux too. It's just that the development takes place on Windows.
That does not immediately imply that you need the eol extension -- you
only need it if your build system cannot handle CRLF files on Linux. All
compilers I know of can handle Windows line endings just fine.
As long as bash/sh shell scripts are not part of the build system,
that's true.
--
Mark A. Flacy <mfl...@verizon.net>
[extensions]
hgext.win32text=
[encode]
# Encode files that don't contain NUL characters.
** = cleverencode:
** = cleverdecode:
Matt Schulte
It's more that we need the conversion that the "only-consistent = False" option appears to offer. Also, it's preferable for us to have native files on each platform for maximum compatibility with various editors and tools.
--
Ben Sizer
er... close, but not quite. You need a separate decode section.
[extensions]
hgext.win32text=
[encode]
** = cleverencode:
[decode]
** = cleverdecode:
I also added a hook to double check that things are working correctly:
[hooks]
pretxncommit.crlf = python:hgext.win32text.forbidcrlf
That hook is useful to avoid mistakes even w/o the cleverencode stuff if
you prefer to configure your editor to use unix line endings. It's not
always possible (or easy) to be sure that every editor you use on
windows will behave correctly, so I prefer to keep the cleverencode
stuff enabled.
btw, if you ever need to disable the cleverencode setting for a single
clone, you can put this in that repository's .hg/hgrc file:
# Disable the cleverencode extension for the current clone:
[extensions]
hgext.win32text=!
[encode]
** = !
[decode]
** = !