Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Detecting if a file is binary

1 view

Skip to first unread message

Nordlöw

unread,

Nov 24, 2009, 10:23:34 AM11/24/09

Is there a way in emacs-lisp code to detect if a file binary, that is
it does *not* contain a correct multi-character coding.
Or can every possible combination of bytes always be correctly decoded
by some character coding?

/Nordlöw

to...@tuxteam.de

unread,

Nov 24, 2009, 12:42:04 PM11/24/09

to Nordlöw, help-gn...@gnu.org

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yes, it can. For all one-byte encodings of the iso-8859-x family, each
byte represents a valid code point, for example. In utf-8 there are byte
sequences which can't (shouldn't) happen.

I think the only way to gain some confidence is by statistical analysis
of the text.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLDBrsBcgs9XrR2kYRAvtOAJ9wJZ1Q9oTHX7rJUCb/0G3IhbzzKwCfaqBt
2ZZsjoR0Skn0QwptSPQVH1A=
=/HfN
-----END PGP SIGNATURE-----

0 new messages