Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What's M$'s single-quote char ?

0 views
Skip to first unread message

Ali...@gmail.com

unread,
Jul 6, 2009, 8:27:39 AM7/6/09
to
WTF do we have to get these docs showing weird-stuff for the
simple [plain ascii] single quote char ?!

I wanted to see what a *.doc uses for 'single quote char',
so based on the default mc 'script' which shows most
file formats as plain-text:
... catdoc -w %f || word2x -f text %f - || strings %f
[which I think means, try catdoc OR word2x or strings
on the file, but since I've only got 'strings';
I tried:
cat <*.doc> | strings | fmt | > doc2text

Which AFAIK simply, extracts-only-ascii and formats it to
line-len < <about 74> & then saves to file: doc2text

BUT: the single quotes were all missing. Like in "man's hat".
And when I search in the original *.doc, which is problematic,
eg. # cat headsofargumentoffirstand.doc | grep tate
==Binary file (standard input) matches,
because most text utils are line based, and expect the
unix-line-terminator, I can't see the single-quote-char.

Eg. mc's edit finds: "...constitutes a breach of the States.."
where the missing single-quote-char is probably there between
the "e" & "s", but, being non-ascii/M$hit-style is unrendered.

BTW: cat x | fmt | grep State
cuts the lines to managable length by 'fmt', but still
# cat x | fmt | grep State
== Binary file (standard input) matches

And man grep ==
" -a, --text
Process a binary file as if it were text; this is equivalent to
the --binary-files=text option. "

prompting to try: # cat x | fmt | grep -a State
== shows eg. "..the States positive obligation.."

So here's the 2nd question: what common util will show the
hex/binary/octal of a byte/S in a specifiable position of
a file ?

In this particular case, perhaps sed could extract the known
'line' [after applying fmt] to a file, where mc would show
the ascii-value of all, including no-ascii chars ?

== Chris Glur.

PS. this the type of 'transparent': show the reasoning
behind; which *I'd* also like to get on Usenet, rather than
the common format:
"do wizz-bang-wow",
without any background explanation.


Bill Marcum

unread,
Jul 6, 2009, 1:23:40 PM7/6/09
to
["Followup-To:" header set to comp.os.linux.misc.]

On 2009-07-06, Ali...@gmail.com <Ali...@gmail.com> wrote:
>
>
> WTF do we have to get these docs showing weird-stuff for the
> simple [plain ascii] single quote char ?!
>
> I wanted to see what a *.doc uses for 'single quote char',
> so based on the default mc 'script' which shows most
> file formats as plain-text:
> ... catdoc -w %f || word2x -f text %f - || strings %f
> [which I think means, try catdoc OR word2x or strings
> on the file, but since I've only got 'strings';
> I tried:
> cat <*.doc> | strings | fmt | > doc2text
>
MS uses separate characters for opening and closing quotes. Try this:
cat *.doc | strings -e S | recode -f cp1252..ascii | fmt


Robert Heller

unread,
Jul 6, 2009, 1:55:32 PM7/6/09
to

MS uses their own exotic versions of many common *ASCII* punctionuation
characters. For some reason, known *only* to the people at MS, the
*standard* ASCII characters are not good enough. I have *heard* it has
something to do with variable with fonts, but I don't understand why
these fonts can't use suitable glyhs at the *standard* ASCII character
positions. Hotmail.com footers often use these exotic characters as
well, which does bad things when such messages are read by a plain ASCII
based E-Mail client.

>
>
>

--
Robert Heller -- 978-544-6933
Deepwoods Software -- Download the Model Railroad System
http://www.deepsoft.com/ -- Binaries for Linux and MS-Windows
hel...@deepsoft.com -- http://www.deepsoft.com/ModelRailroadSystem/

jellybean stonerfish

unread,
Jul 6, 2009, 3:26:45 PM7/6/09
to
On Mon, 06 Jul 2009 12:55:32 -0500, Robert Heller wrote:

> MS uses their own exotic versions of many common *ASCII* punctionuation
> characters. For some reason, known *only* to the people at MS, the
> *standard* ASCII characters are not good enough.

Purposeful un-interoperability is the reason.

Loki Harfagr

unread,
Jul 6, 2009, 5:35:56 PM7/6/09
to
Mon, 06 Jul 2009 19:26:45 +0000, jellybean stonerfish did cat :

>> MS uses their own exotic versions of many common *ASCII* punctionuation
>> characters. For some reason, known *only* to the people at MS, the
>> *standard* ASCII characters are not good enough.
>
> Purposeful un-interoperability is the reason.

/modquote

Blumf

unread,
Jul 7, 2009, 5:02:20 AM7/7/09
to
Robert Heller wrote:
> MS uses their own exotic versions of many common *ASCII* punctionuation
> characters. For some reason, known *only* to the people at MS, the
> *standard* ASCII characters are not good enough.

Not exactly true. They're using matched curling quote marks, have been
since Word 6 IIRC. Very irritating when you wanted just plain quotes but
it is a standard now, covered by Unicode, and the PHB likes it (always a
MS priority. What? You think they got rich by making good software? :) )

Quick google turned up this which seems to cover the basics:
http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

Should be easy enough to transcode the fancy quotes to old fashioned
neutral ones with a bit of sed or something.

Blumf

~kurt

unread,
Jul 7, 2009, 8:52:23 PM7/7/09
to
Blumf <bl...@hotSPAMmail.com> wrote:
>
> Should be easy enough to transcode the fancy quotes to old fashioned
> neutral ones with a bit of sed or something.

I'm not sure why it was expected that opening and closing quotes would
be the same in a Word document, or any other word processing, desktop
publishing, or document markup language. They are not the same
character in real formatted text.

- Kurt

Robert Heller

unread,
Jul 7, 2009, 10:53:25 PM7/7/09
to

True. But Microsoft doesn't even use ASCII apostrophe for apostrophes
in MS-Word documents! LaTeX uses ` and ' (`` and '' for doubles).

>
> - Kurt

~kurt

unread,
Jul 9, 2009, 12:05:57 AM7/9/09
to
Robert Heller <hel...@deepsoft.com> wrote:
> At Tue, 07 Jul 2009 19:52:23 -0500 ~kurt <actino...@earthlink.net> wrote:
>>
>> I'm not sure why it was expected that opening and closing quotes would
>> be the same in a Word document, or any other word processing, desktop
>> publishing, or document markup language. They are not the same
>> character in real formatted text.
>
> True. But Microsoft doesn't even use ASCII apostrophe for apostrophes
> in MS-Word documents! LaTeX uses ` and ' (`` and '' for doubles).

You got me curious - neither does LaTeX. The .tex input file is ASCII -
by definition....

The .dvi file uses 0x60 (`) and 0x27 (') for single quotes. It also
uses 0x22 (") for closing double quotes. But, the .dvi file appears to
use 0x5c for opening double quotes?

Either way, the MS codes for this situation seem to be fairly well
documented. No idea if MS supplied them, or if they had to be reverse
engineered. It is their format, they should be able to do whatever they
want with it. I really can't fault them here.

- Kurt

Maxwell Lol

unread,
Jul 9, 2009, 7:13:42 AM7/9/09
to
~kurt <actino...@earthlink.net> writes:

>> True. But Microsoft doesn't even use ASCII apostrophe for apostrophes
>> in MS-Word documents! LaTeX uses ` and ' (`` and '' for doubles).
>

> Either way, the MS codes for this situation seem to be fairly well
> documented. No idea if MS supplied them, or if they had to be reverse
> engineered. It is their format, they should be able to do whatever they
> want with it. I really can't fault them here.

people use them in email, and if you run a mailing list that converts
single messages into digest mode, it's one of those things you have to
convert....

~kurt

unread,
Jul 9, 2009, 9:47:03 PM7/9/09
to
Maxwell Lol <nos...@com.invalid> wrote:
>
> people use them in email, and if you run a mailing list that converts
> single messages into digest mode, it's one of those things you have to
> convert....

Well, people are stupid. I still can't understand what exactly people
are gaining where I work with all the fancy formatted email that gets
sent around now days. It is as bad as the people who feel the need to
send a jpg image inside of a PowerPoint presentation (no joke, seen it
happen), or an Excel spreadsheet for a short list of items. Hell, some
morons actually compose email in Word, and then send that as an
attachment!

- Kurt

Blumf

unread,
Jul 10, 2009, 4:51:17 AM7/10/09
to
~kurt wrote:
> Well, people are stupid. I still can't understand what exactly people
> are gaining where I work with all the fancy formatted email that gets
> sent around now days. It is as bad as the people who feel the need to
> send a jpg image inside of a PowerPoint presentation (no joke, seen it
> happen)

I can beat that; screen shot of a console window, embedded in a Word
doc. Punch line being, half the text I was interested in had scrolled
off the visible area.

Blumf

Logan Rathbone

unread,
Jul 10, 2009, 11:13:06 AM7/10/09
to

Wow, and I thought HTML email was annoying...

Honestly, I find that 99% of computer users don't think about file
formats at all. They don't think "ah, this is a Microsoft Word
document, MIME type 'application/msword', and here's the reason I have
chosen to use this format", but rather "this is a Word document. This
is what I use to type stuff."

Damnit, people. *Care* more!

jellybean stonerfish

unread,
Jul 10, 2009, 12:31:26 PM7/10/09
to
On Fri, 10 Jul 2009 15:13:06 +0000, Logan Rathbone wrote:

> Honestly, I find that 99% of computer users don't think about file
> formats at all. They don't think "ah, this is a Microsoft Word
> document, MIME type 'application/msword', and here's the reason I have
> chosen to use this format", but rather "this is a Word document. This
> is what I use to type stuff."
>
> Damnit, people. *Care* more!

Most users think a .doc file is text.

Dan C

unread,
Jul 10, 2009, 1:46:13 PM7/10/09
to

It was, back in the heyday of MSDOS...


--
"Ubuntu" -- an African word, meaning "Slackware is too hard for me".
The Usenet Improvement Project: http://improve-usenet.org
Ahhhhhhhh!: http://brandybuck.site40.net/pics/relieve.jpg

Auric__

unread,
Jul 10, 2009, 2:00:03 PM7/10/09
to
On Fri, 10 Jul 2009 17:46:13 GMT, Dan C wrote:

> On Fri, 10 Jul 2009 16:31:26 +0000, jellybean stonerfish wrote:
>
>> On Fri, 10 Jul 2009 15:13:06 +0000, Logan Rathbone wrote:
>>
>>> Honestly, I find that 99% of computer users don't think about file
>>> formats at all. They don't think "ah, this is a Microsoft Word
>>> document, MIME type 'application/msword', and here's the reason I have
>>> chosen to use this format", but rather "this is a Word document. This
>>> is what I use to type stuff."
>>>
>>> Damnit, people. *Care* more!
>>
>> Most users think a .doc file is text.
>
> It was, back in the heyday of MSDOS...

...twenty years ago. (I still sometimes find software with plain-text .doc
files. Kinda rare, but still happens.)

--
Alcohol makes you immune to gravity. And bulletproof.

~kurt

unread,
Jul 10, 2009, 11:04:31 PM7/10/09
to
Auric__ <not.m...@email.address> wrote:
>
> ...twenty years ago. (I still sometimes find software with plain-text .doc
> files. Kinda rare, but still happens.)

Yea, a library in my current project has a bunch of .doc documentation
files that are all ASCII.

- Kurt

Peter Chant

unread,
Jul 11, 2009, 10:48:26 AM7/11/09
to
Blumf wrote:

> Not exactly true. They're using matched curling quote marks, have been
> since Word 6 IIRC. Very irritating when you wanted just plain quotes but
> it is a standard now, covered by Unicode, and the PHB likes it (always a
> MS priority. What? You think they got rich by making good software? :) )

To be fair, m4 used to configure sendmail uses the very odd quote character
on the top left of the keyboard, iirc so does bash.

Pete

--
http://www.petezilla.co.uk

Robert Heller

unread,
Jul 11, 2009, 11:12:17 AM7/11/09
to

The backquote character is a ligitimate ASCII character. M$ uses 8-bit
character codes *instead* of available ASCII characters and does so
where they shouldn't (like in Hotmail,com et. al. footers).

>
> Pete

Grant

unread,
Jul 11, 2009, 4:38:01 PM7/11/09
to
On Sat, 11 Jul 2009 15:48:26 +0100, Peter Chant <REMpe...@CAPpetezilla.ITALSco.uk> wrote:

>To be fair, m4 used to configure sendmail uses the very odd quote character
>on the top left of the keyboard, iirc so does bash.

Don't use m4 here (directly). It's a backtick '`' (common name).

Bash no longer uses (in recommended usage) the backtick, so commands
in a new shell now use $(), for example 'timestamp=$(date +%F-%T)'
instead of the obsolete backtick version 'timestamp=`date +%F-%T`'.

Slackware scripts are full of backticks, showing their incredible age ;)

I miss the days (daze?) of 7-bit ascii, so much easier back then, at
least for those with English as only language and only the odd US English
spelling issue to worry about :)

And WordStar used the character high bit (msb, bit 7) as a word delimiter
in doc mode.

Grant.
--
http://bugsplatter.id.au

Grant

unread,
Jul 11, 2009, 4:49:26 PM7/11/09
to
On Sat, 11 Jul 2009 10:12:17 -0500, Robert Heller <hel...@deepsoft.com> wrote:

>The backquote character is a ligitimate ASCII character. M$ uses 8-bit
>character codes *instead* of available ASCII characters and does so
>where they shouldn't (like in Hotmail,com et. al. footers).

MSFT does all sorts of things they shouldn't -- virtual monopolies are
like that...

Recently I viewed a .pdf file where the author had let the .pdf engine
replace all unisex quotes with 66, 99 and 6, 9 style quotes. Okay, you
think? No, it destroyed all the scripts in the document because a
straight quoted value like "a" became ``a´´ (pair each of #96 'grave
accent' - backtick, and #180 'acute accent').

It's a mess, yes?

Grant.
--
http://bugsplatter.id.au

Auric__

unread,
Jul 13, 2009, 11:41:13 AM7/13/09
to
On Sat, 11 Jul 2009 20:49:26 GMT, Grant wrote:

> On Sat, 11 Jul 2009 10:12:17 -0500, Robert Heller <hel...@deepsoft.com>
> wrote:
>
>>The backquote character is a ligitimate ASCII character. M$ uses 8-bit
>>character codes *instead* of available ASCII characters and does so
>>where they shouldn't (like in Hotmail,com et. al. footers).
>
> MSFT does all sorts of things they shouldn't -- virtual monopolies are
> like that...

"Virtual"?

--
Armageddon, here we come!

0 new messages