Janis Papanagnou wrote:
> The file(1) man page points to magic(5). Depending on the actual file
> characteristics there seem to be more than one entry possible for a
> file, independent of its extension. As a wild guess: maybe depending
> on byte order or any such file characteristic.
There are indeed multiple possible entries for a "file type". The point of
'file' and 'magic' is exactly that it tries to determine the type (and
possible sub-types/versions/etc) from the file's content (certain special
byte/string sequences near the beginning of the file, aka "magic numbers"),
instead of the extension, and actually, 'file' completely ignores the
extension.
Since there is an unlimited number of file formats, 'file' depends on a
heuristic which is easy to trick.
> It may be helpful if you inspect that 'magic' file on your system and
> see what entries are present for XML and what's the difference in the
> definition of the respective entries with those two text strings that
> you observed.
Unfortunately, there is no single "magic" text file on Ubuntu (and Debian)
anymore. It has been replaced with some binary format
(/usr/share/misc/magic.mgc). One has to install the source of the 'file' (or
libmagic1) package to view the different magic files that this magic.mgc is
compiled from). The relevant file (in the source) is "Magdir/sgml". I cannot
fluently read it, but didn't found an obvious explanation for the symptoms
described by the OP.
Still, my observation written my other post should a helpful start for
asking the file/libmagic devs about this. :-)
HTH