gibberish in some emails

Jo-Anne

unread,

Mar 10, 2017, 2:29:37 AM3/10/17

to mozilla-suppo...@lists.mozilla.org

For years I've copied and pasted DVD information from Amazon's website
to Notepad and from there have copied the information into emails that I
send to myself to save for future purchases. Suddenly, my received
emails are appearing with a lot of gibberish in them, mostly in front of
some lines but sometimes in the middle as well. (The sent emails look
OK.) For example:

ï¿½ï¿½ï¿½ Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers
ï¿½ï¿½ï¿½ Directors: Hal Ashby
ï¿½ï¿½ï¿½ Format: Special Edition, Subtitled, Widescreen

Any idea of what's happening and what I can do to stop it?

--
Thank you,
Jo-Anne

kes

unread,

Mar 10, 2017, 3:49:12 AM3/10/17

to mozilla-suppo...@lists.mozilla.org

Have a look here, where the same problem is discussed.
https://www.instructables.com/answers/What-causes-these-strange-characters-to-appear/

The solution, in simple terms, is in the penultimate post of poster
'AshishD54'.

Mike Easter

unread,

Mar 10, 2017, 4:37:43 AM3/10/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> For years I've copied and pasted DVD information from Amazon's website
> to Notepad and from there have copied the information into emails that I
> send to myself to save for future purchases. Suddenly, my received
> emails are appearing with a lot of gibberish in them, mostly in front of
> some lines but sometimes in the middle as well. (The sent emails look
> OK.) For example:

Your headers say Win 7 and your Tb encoding char-set win 1252.

If there are non-printing characters on the html text, it seems to me
that your notepad should be 'getting rid' of them.

I looked at the amazon DVD page for Being There movie and I can't yet
understand how non-printing/ non-ascii chars from the html can be
getting past Win 7 notepad.

That is; I understand what makes such characters, but I don't understand
why your particular html > Win7 notepad > Tb scenario is happening.

--
Mike Easter

Jo-Anne

unread,

Mar 10, 2017, 12:12:17 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Thank you, kes. That post says the problem is coming from the Mac OS and
that smart quotes and dashes need to be unchecked. However, I'm sending
and receiving these emails on my Windows 7 computer--and the string of
characters I've pasted here occurs at the beginnings of lines after I've
stripped out formatting with Notepad. So far, it's happening only on
material copied from Amazon webpages, and it started recently.

--
Jo-Anne

Jo-Anne

unread,

Mar 10, 2017, 12:36:26 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Thank you, Mike. I checked my emails, and this started happening March
1. I'm beginning to wonder if it's my ISP, which uses Yahoo Mail. I've
done more exploring, and I've found (1) the wonky emails look fine in
webmail, and (2) the emails sent from Thunderbird on my Windows 7
machine go to my iPad email client too--and they're also wonky there. So
if it's Thunderbird that's the issue, it's on the sending side, not the
receiving side.

--
Jo-Anne

Good Guy

unread,

Mar 10, 2017, 12:47:14 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Have you tried pasting without formatting from within TB? To do this, go to:

Edit >> Paste without formatting (or press: CTRL+SHIFT+V or even Alt+E followed by O)

--

With over 500 million devices now running Windows 10, customer satisfaction is higher than any previous version of windows.

Jo-Anne

unread,

Mar 10, 2017, 1:23:00 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

On 3/10/2017 11:46 AM, Good Guy wrote:
> On 10/03/2017 07:29, Jo-Anne wrote:
>> For years I've copied and pasted DVD information from Amazon's website
>> to Notepad and from there have copied the information into emails that
>> I send to myself to save for future purchases. Suddenly, my received
>> emails are appearing with a lot of gibberish in them, mostly in front
>> of some lines but sometimes in the middle as well. (The sent emails
>> look OK.) For example:
>>
>> ï¿½ï¿½ï¿½ Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers
>> ï¿½ï¿½ï¿½ Directors: Hal Ashby
>> ï¿½ï¿½ï¿½ Format: Special Edition, Subtitled, Widescreen
>>
>> Any idea of what's happening and what I can do to stop it?
>>
>
> Have you tried pasting without formatting from within TB? To do this,
> go to:
>
> Edit >> Paste without formatting (or press: CTRL+SHIFT+V or even Alt+E
> followed by O)
>

> Copy-Paste-Without-Formatting <http://i.imgur.com/6Wv5Er7.png>
>
>

Thank you for the suggestion, but there was no difference.

--
Jo-Anne

Jo-Anne

unread,

Mar 10, 2017, 1:27:38 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

After trying several things, I made a discovery. The lines that begin
with the string of characters have a four-space indent when I copy and
paste from the webpage. As an experiment, I deleted the four spaces in
Notepad--and the string of characters didn't appear in the received
emails. I've never had to do that before, and I still don't understand
why four spaces would translate into those characters--especially after
going through Notepad.

Any ideas?

--
Thank you again,
Jo-Anne

James Moe

unread,

Mar 10, 2017, 1:53:46 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

On 03/10/2017 11:27 AM, Jo-Anne wrote:
> I've never had to do that before, and I still don't understand
> why four spaces would translate into those characters--especially after
> going through Notepad.
>

They are likely a utf-8 character that looks like a space, and has
other magical properties.

--
James Moe
jmm-list at sohnen-moe dot com
Think.

Jo-Anne

unread,

Mar 10, 2017, 3:10:04 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

On 3/10/2017 12:53 PM, James Moe wrote:
> On 03/10/2017 11:27 AM, Jo-Anne wrote:
>> I've never had to do that before, and I still don't understand
>> why four spaces would translate into those characters--especially after
>> going through Notepad.
>>
> They are likely a utf-8 character that looks like a space, and has
> other magical properties.
>

Thank you. But why does it show up even after the material has been
through Notepad? And why now? I'm not doing anything differently.

--
Jo-Anne

Mike Easter

unread,

Mar 10, 2017, 3:30:39 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> After trying several things, I made a discovery. The lines that begin
> with the string of characters have a four-space indent when I copy and
> paste from the webpage. As an experiment, I deleted the four spaces in
> Notepad--and the string of characters didn't appear in the received
> emails. I've never had to do that before, and I still don't understand
> why four spaces would translate into those characters--especially after
> going through Notepad.
>
> Any ideas?

Sometimes when I want to C&P plaintext from a webpage and I'm in Fx, I
use its View/ Page style/ No style function. That doesn't eliminate all
of the junk, but much of it. Prepare in advance to 're-find' your place
on the page, such as copying a search string before you flip.

I can't think of any other browsers that have a No style toggle like Fx,
one of my 'favorite' features of that browser.

I use sticky keys, so it flips with alt-VYN alt-vyb.

--
Mike Easter

Ron K.

unread,

Mar 10, 2017, 3:49:01 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Your discovery of the 4 space indent was hiding hidden characters that
Notepad could not display, Notepad is an ASCII editor, not a UTF-8 editor,
Try using Wordpad and see if it displays what was hidden, and an
permitting the removal.
--
Ron K.
Thunderbird user since May, 2003

James Moe

unread,

Mar 10, 2017, 4:02:39 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

On 03/10/2017 01:09 PM, Jo-Anne wrote:
>
> And why now? I'm not doing anything differently.
>

The characters are new additions to the web page you are scraping.

Mike Easter

unread,

Mar 10, 2017, 4:06:32 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Ron K. wrote:
> Your discovery of the 4 space indent was hiding hidden characters that
> Notepad could not display, Notepad is an ASCII editor, not a UTF-8
> editor, Try using Wordpad and see if it displays what was hidden, and
> an permitting the removal.

I'm finding that Win7 notepad can handle a lot of characters, but the
'conversion' such as the so-called Win 'alt-codes' doesn't convert the
same way as many available tables say, but does provide special characters.

--
Mike Easter

Ron K.

unread,

Mar 10, 2017, 5:23:15 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Still using Vista SP2 so it may not be updated with the capability's if
the Win7 version.

Jo-Anne

unread,

Mar 10, 2017, 8:53:56 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Thank you for the suggestion, Ron. I tried Wordpad and got the same
result as with Notepad. So far, it looks like the only thing I can do is
a Search/Replace on the 4 spaces.

--
Jo-Anne

Jo-Anne

unread,

Mar 10, 2017, 8:58:34 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

On 3/10/2017 3:02 PM, James Moe wrote:
> On 03/10/2017 01:09 PM, Jo-Anne wrote:
>>
>> And why now? I'm not doing anything differently.
>>
> The characters are new additions to the web page you are scraping.
>

Thank you, James. If that's the case, is anyone else able to reproduce
the problem?

Basically, it's that if I copy something (movie product information is
what I'm copying) from the Amazon webpage that has four spaces at the
beginning of a line and send it to myself in an email, that line will
appear in my received email with the bunch of characters in place of the
four spaces--even after I've copied and pasted the webpage info into
Notepad or Wordpad and then recopied it from there into an email. This
started happening on March 1. Before that, the received email simply
showed the four spaces.

--
Jo-Anne

Jo-Anne

unread,

Mar 10, 2017, 9:06:20 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

Thank you, Mike! That did it. What a mess to look at, though... and I
still had to run the copy through Notepad to avoid the piled on
illustrations of various sorts. By the time I was done, it was easier
just to do a search/replace of the spaces in Notepad. Still, it's good
to know that "no-style" takes away the string of characters.

I guess Amazon has indeed changed how it presents its webpages--and not
for the better, as far as I'm concerned.

--
Jo-Anne

Good Guy

unread,

Mar 10, 2017, 9:54:34 PM3/10/17

to mozilla-suppo...@lists.mozilla.org

On 11/03/2017 01:58, Jo-Anne wrote:

Thank you, James. If that's the case, is anyone else able to reproduce the problem?

Without any specific links, it is impossible to reproduce the problem. So it is up to you whether you want to provide a link or not otherwise we can keep guessing what is going on. As pointed out in this thread, Notepad normally strips out non printing, non visible characters but you seem to suggest otherwise. So we have to take your word for it.

Jo-Anne

unread,

Mar 11, 2017, 2:29:37 AM3/11/17

to mozilla-suppo...@lists.mozilla.org

On 3/10/2017 8:54 PM, Good Guy wrote:
> On 11/03/2017 01:58, Jo-Anne wrote:
>>
>>>
>>
>> Thank you, James. If that's the case, is anyone else able to reproduce
>> the problem?
>>
>>
> Without any specific links, it is impossible to reproduce the problem.
> So it is up to you whether you want to provide a link or not otherwise
> we can keep guessing what is going on. As pointed out in this thread,
> Notepad normally strips out non printing, non visible characters but you
> seem to suggest otherwise. So we have to take your word for it.
>
>

What I said was that I could go to any Amazon movie webpage and try
copying and pasting product information--and would get the multiple
characters at the beginning of each line that began with four spaces.

What I have done is copy and paste the material into Notepad and then
copy and paste it from there into an email that I send to myself. The
email in the Sent folder is fine; the one in the Inbox has the
characters. As an example, at
https://www.amazon.com/Being-There-Blu-ray-Melvyn-Douglas/dp/B01MQWIOHJ/ref=sr_1_1?s=movies-tv&ie=UTF8&qid=1489216903&sr=1-1&keywords=being+there+criterion+blu+ray
I copied and pasted the "Product Details" section. What arrived in my
Inbox was this:

Product Details

ï¿½ï¿½ï¿½ Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers
ï¿½ï¿½ï¿½ Directors: Hal Ashby
ï¿½ï¿½ï¿½ Format: Special Edition, Subtitled, Widescreen

ï¿½ï¿½ï¿½ Language: English
ï¿½ï¿½ï¿½ Subtitles: English
ï¿½ï¿½ï¿½ Region: All Regions
ï¿½ï¿½ï¿½ Number of discs: 1
ï¿½ï¿½ï¿½ Rated:
ï¿½ï¿½ï¿½ PG
ï¿½ï¿½ï¿½ Parental Guidance Suggested
ï¿½ï¿½ï¿½ Studio: Sony Pictures Home Entertainment
ï¿½ï¿½ï¿½ DVD Release Date: March 21, 2017
ï¿½ï¿½ï¿½ Run Time: 130 minutes
ï¿½ï¿½ï¿½ Average Customer Review: 4.7 out of 5 starsï¿½ See all reviews
(987 customer reviews)
ï¿½ï¿½ï¿½ ASIN: B01MQWIOHJ

Jo-Anne

Chris Ramsden

unread,

Mar 11, 2017, 5:15:57 AM3/11/17

to thunderbird mailing list

On 11 Mar 2017 07:29, "Jo-Anne" <Jo-...@nowhere.com> wrote:

On 3/10/2017 8:54 PM, Good Guy wrote:

On 11/03/2017 01:58, Jo-Anne wrote:

Thank you, James. If that's the case, is anyone else able to reproduce
the problem?

Without any specific links, it is impossible to reproduce the problem.
So it is up to you whether you want to provide a link or not otherwise
we can keep guessing what is going on. As pointed out in this thread,
Notepad normally strips out non printing, non visible characters but you
seem to suggest otherwise. So we have to take your word for it.

What I said was that I could go to any Amazon movie webpage and try copying and pasting product information--and would get the multiple characters at the beginning of each line that began with four spaces.

What I have done is copy and paste the material into Notepad and then copy and paste it from there into an email that I send to myself. The email in the Sent folder is fine; the one in the Inbox has the characters. As an example, at https://www.amazon.com/Being-There-Blu-ray-Melvyn-Douglas/dp/B01MQWIOHJ/ref=sr_1_1?s=movies-tv&ie=UTF8&qid=1489216903&sr=1-1&keywords=being+there+criterion+blu+ray
I copied and pasted the "Product Details" section. What arrived in my Inbox was this:

Product Details

ï¿½ï¿½ï¿½ Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers
ï¿½ï¿½ï¿½ Directors: Hal Ashby
ï¿½ï¿½ï¿½ Format: Special Edition, Subtitled, Widescreen

ï¿½ï¿½ï¿½ Language: English
ï¿½ï¿½ï¿½ Subtitles: English
ï¿½ï¿½ï¿½ Region: All Regions
ï¿½ï¿½ï¿½ Number of discs: 1
ï¿½ï¿½ï¿½ Rated:
ï¿½ï¿½ï¿½ PG
ï¿½ï¿½ï¿½ Parental Guidance Suggested
ï¿½ï¿½ï¿½ Studio: Sony Pictures Home Entertainment
ï¿½ï¿½ï¿½ DVD Release Date: March 21, 2017
ï¿½ï¿½ï¿½ Run Time: 130 minutes
ï¿½ï¿½ï¿½ Average Customer Review: 4.7 out of 5 starsï¿½ See all reviews (987 customer reviews)
ï¿½ï¿½ï¿½ ASIN: B01MQWIOHJ

Jo-Anne
__________________________________________

Did the suggestion to paste using plain text in Thunderbird not help?

For a better plain text editing experience I'd recommend Notepad++. It knows about Unicode and it lets you see the non-printing characters. Regular Notepad is limited and perhaps behind the times, and with Wordpad you simply don't know what you'll get.

I was recently looking into some updated email signatures at work and was struck by how some malformed elements were handled "better" by Outlook; it recognised an email address as such and offered it as a clickable link. Thunderbird required a mailto: declaration to offer it in this way. However, a malformed message forwarded via Thunderbird was silently repaired and the requisite mailto: inserted. So I am not too surprised that you see different behaviours in your compose window and the Sent folder. And that's on top of any unexpected coercion from one encoding system in your browser to that being used in Thunderbird.

--

Chris

Mike Easter

unread,

Mar 11, 2017, 9:41:10 AM3/11/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> The email in the Sent folder is fine; the one in the Inbox has the
> characters.

I'm thinking on this.

It is 'easier' if I'm thinking about composing vs reading in either folder.

I'm trying to figure out some way that your font and encoding
configuration could do that, but I'm not there yet.

--
Mike Easter

Jo-Anne

unread,

Mar 11, 2017, 12:20:09 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

Thank you for working on it, Mike!

--
Jo-Anne

Mike Easter

unread,

Mar 11, 2017, 12:32:52 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> Mike Easter wrote:
>> Jo-Anne wrote:
>>> The email in the Sent folder is fine; the one in the Inbox has the
>>> characters.
>>
>> I'm thinking on this.

>> I'm trying to figure out some way that your font and encoding
>> configuration could do that, but I'm not there yet.
>>
> Thank you for working on it, Mike!
>

My 'assumptions' of how you are doing it seem to make it impossible --
to be different -- so now I'm thinking that there is some mystery
ingredient in your process that I'm not imagining.

Take us through everything from what computer & OS you are using from
the beginning of accessing amazon to the copying, pasting, recopying,
repasting, emailing including agents and service to the viewing of the
inbox and also the sent folder.

--
Mike Easter

Jo-Anne

unread,

Mar 11, 2017, 12:32:58 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

On 3/11/2017 4:15 AM, Chris Ramsden wrote:
>
>
> On 11 Mar 2017 07:29, "Jo-Anne" <Jo-...@nowhere.com

Hi, Chris,

So far, the only thing that works is to delete the four spaces before
sending the email. I've tried not only Notepad but also Wordpad and
(just now) Notepad++. I've tried pasting without formatting. Nothing
helps. Note that my Sent email looks fine; it's only the received email
that has the extra characters.

I also recently tried a non-Amazon webpage that had indented lines, and
it too showed these characters.

Here’s the kicker: I always use my regular account with my ISP to send
myself emails and address to my account with my ISP. However, I also
have a gmail account. Today, I did this: I used my regular account to
send the email, but I addressed it to my gmail account. It came through
WITHOUT the extra characters. That suggests to me that the problem is
something Yahoo Mail has done (my ISP uses it, so I’m kinda stuck with it).

--
Jo-Anne

Jo-Anne

unread,

Mar 11, 2017, 12:39:40 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

On 3/10/2017 1:29 AM, Jo-Anne wrote:

I always use my regular account with my ISP (which uses Yahoo Mail) to
send myself emails, and I address them to my account with my ISP.
However, I also have a gmail account. Today, I did the following: I used

my regular account to send the email, but I addressed it to my gmail
account. It came through WITHOUT the extra characters. That suggests to

me that the problem is something Yahoo Mail has done.

Anyone else using Yahoo Mail? And if so, can you duplicate the problem?
(In case you haven't read all the posts, the problem is that a bunch of
gibberish appears in received emails in front of indented lines copied
from a webpage, even after they've been processed through Notepad,
Wordpad, or Notepad++ and even after using "paste without formatting".)

--
Thank you,
Jo-Anne

Jo-Anne

unread,

Mar 11, 2017, 12:49:10 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

Hi, Mike,

I just sent another post to the list with an update: I THINK Yahoo Mail
may be doing something to the emails. My regular email account is
through my ISP, and it uses Yahoo Mail. I also have a gmail account.
Today I sent myself the same material--but this time, although I sent it
from my account with my ISP, I emailed it to my gmail account. NO EXTRA
CHARACTERS.

If it is an issue between Yahoo Mail and Thunderbird, I wonder why it's
happening. If I look at the received emails in webmail instead of
through Thunderbird, there are no extra characters. It's only once the
emails come in to Thunderbird that they show up. (And, by the way, I
discovered yesterday that it happens not only with Amazon. I copied some
material from a news webpage that had some indented lines, and they too
had the same characters added.)

--
Jo-Anne

Richard Alan

unread,

Mar 11, 2017, 12:56:54 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:

> ..but I addressed it to my gmail account. It came through WITHOUT the

> extra characters. That suggests to me that the problem is something
> Yahoo Mail has done.

It's probably not identical to _email_ via Yahoo, but I subscribe to a
Yahoogroups mailing list and a two-sentence email uses about 20 kilobytes
of .. stuff. Junk. Yes, there are all kinds of weird characters that
periodically show up in those emails. I, for one, would blame Yahoo.

Jo-Anne

unread,

Mar 11, 2017, 1:01:20 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

I'll wait to see what others say about my particular problem, but I've
had other trouble with Yahoo lately; so it certainly wouldn't be
surprising if this is yet another issue of theirs.

--
Jo-Anne

Mike Easter

unread,

Mar 11, 2017, 1:53:43 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> I just sent another post to the list with an update: I THINK Yahoo Mail
> may be doing something to the emails. My regular email account is
> through my ISP, and it uses Yahoo Mail. I also have a gmail account.
> Today I sent myself the same material--but this time, although I sent it
> from my account with my ISP, I emailed it to my gmail account. NO EXTRA
> CHARACTERS.
>
> If it is an issue between Yahoo Mail and Thunderbird, I wonder why it's
> happening. If I look at the received emails in webmail instead of
> through Thunderbird, there are no extra characters. It's only once the
> emails come in to Thunderbird that they show up. (And, by the way, I
> discovered yesterday that it happens not only with Amazon. I copied some
> material from a news webpage that had some indented lines, and they too
> had the same characters added.)

I'm speaking of the webmails:

Both yahoo and gmail webmail provide access to the raw email/ message
source.

In gmail, with the mail item opened, the very top line is the Subject
and the next line is the content of the From. That line has a reply
icon (an arcing arrow L) and an arrow down menu to its immediate right.

That arrow down menu contains a Show original which is the message source.

In Yahoo webmail, with the mail item opened, you need to scoot down near
the bottom of the opened message where there are Reply, Reply All,
Forward, and More icons. Clicking the 3 dots of the More provides
access to View Raw Message.

You didn't say if your ISP yahoo mail is IMAP, but if it is, you can
examine the message source of the Sent in the yahoo webmail.

--
Mike Easter

Mayayana

unread,

Mar 11, 2017, 2:11:20 PM3/11/17

to mozilla-suppo...@lists.mozilla.org

"Jo-Anne" <Jo-...@nowhere.com> wrote

| �� Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers
| �� Directors: Hal Ashby
| �� Format: Special Edition, Subtitled, Widescreen

|
| Any idea of what's happening and what I can do to stop it?
|

Maybe try View -> character encioding -> UTF-8?

It's a complex issue and your case seems to be quirky.
I don't know what was on the webpage, but the text
you pasted is UTF-8 code for 3 question marks in diamonds.
The character values are EF BF BD.

http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%F6&mode=char

That is, 3 bytes are used to display each question mark.
You're seeing ANSI text, so you see those bytes as each
being a character. The question is why there are 3 ?
in diamonds in the first place. Maybe they were supposed
to be emoticons or something you don't have a font for?
I'm not sure. In other words, you're seeing a corruption
of text that's of no value to you, anyway. :)

The problem arises because there are different ways
to render byte values, according to different text encodings.
Everything is always bytes, or numerical values. In
a text file those bytes represent characters. But there
are different ways to do that.

ASCII text uses 0-128 values in a byte to display
English characters and a few others -- basically what's
on your keyboard. In the early days that was all anyone
needed.

ANSI, which was most common until recently, uses all
256 values in a byte. One byte per character. What's
displayed depends on the system "codepage". The first
128 are ASCII, but from 128-255 depends on language
settings. In other words, on your computer, EF in ANSI
(charcter 239) is an i with an umlaut. On a Russian
machine it will probably be a Russian character. In UTF-8,
in this case, it's only part of the code for one character.

Mike Easter referred to "char set 1252". That's the ANSI
codepage most commonly used for Western text. English
is 1033. Codepage, character set, character encoding...
those are all referring to how ANSI text is displayed about
byte value 127.

ANSI allows Western characters to be represented
by one byte. With globalization that's not good enough.
Languages like Japanese and Chinese won't fit. In order
to fit all characters we have unicode, which uses 2 bytes.
0-128 are still ASCII. Beyond that, there are about 65K
more characters.

UTF-8 is a compromise. Rather than use 2 bytes for
all characters and break a lot of software, it's a way
to render unicode as multi-byte, in which most Western
characters don't have to be changed. So it's less of
a jarring transition than it would be to suddenly convert
everything to 2-byte unicode. UTF-8 has become the
standard in most webpages. UTF-8 is still ASCII for byte
values 0-128, but beyond that it uses 1-4 bytes to display
a character. So "quick brown fox" is identical at the byte
level in ASCII, ANSI, or UTF-8. But emoticons, various
marks, and non-Western languages are displayed using
1-4 bytes.

So what does all that mean? Usually it's not an issue.
That's probably why you haven't seen a problem before.
If you see funky characters you can try viewing as UTF-8.
In this case it's best to just remove the text. It serves
no purpose for you.

In browsers you usually won't have to be concerned.
The webpage encoding tells the browser how to interpret
the text. The one place where a problem might arise would
be if you save a UTF-8 encoded webpage, edit it, then
save that as ANSI text. You could end up with stuff like
capital A with an accent over it littering the page. Microsoft
is one site that does that. They use UTF-8 for spaces and
curly quotes, even though the webpage is in English. The
result is that it has to be kept as UTF-8 or else the
corrupted characters (from ANSI point of view) have to
be removed.

For the record, Notepad has handled different encodings
for a long time. When you save a file you'll see you have
options. If you try to save one encoding type as another,
Notepad will warn you. You could play around with that
to get a sense of the differences.

Jo-Anne

unread,

Mar 12, 2017, 3:29:42 AM3/12/17

to mozilla-suppo...@lists.mozilla.org

My Yahoo Mail is POP. I checked the raw message of one of the emails in
Yahoo's webmail, the body of which follows. Do you see anything in it
that would cause it to show the odd characters when it reaches the email
client? (I discovered that it shows the same odd characters on my iPad
in the Apple email client.)

<html>
<head>

<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
</head>
<body bgcolor="#FFFFFF" text="#000000">
715515195218 blu-ray 
 
Being There [Blu-ray] 
Melvyn Douglas (Actor), Shirley MacLaine (Actor), Hal Ashby
(Director) Rated: 
PG 
Format: Blu-ray 
4.7 out of 5 stars 987 customer reviews 
 
Blu-ray 
$27.99 
 
In one of his most finely tuned performances, Peter Sellers (The
Pink Panther) plays the pure-hearted Chance, a gardener forced out
of moneyed seclusion and into the urban wilds of Washington, D.C.,
after the death of his employer. Shocked to discover that the real
world doesn’t respond to the click of a remote, Chance stumbles
haplessly into celebrity after being taken under the wing of a
tycoon (Oscar winner Melvyn Douglas), who mistakes his new
protégé’s mumbling about horticulture for sagacious pronouncements
on life and politics, and whose wife (The Apartment’s Shirley
MacLaine) targets Chance as the object of her desire. Adapted from
a novel by Jerzy Kosinski, this hilarious, deeply melancholy
satire marks the culmination a remarkable string of films by Hal
Ashby (Harold and Maude) in the 1970s, and serves as a carefully
modulated examination of the ideals, anxieties, and media-fueled
delusions that shaped American culture during that decade. 
 
BLU-RAY SPECIAL EDITION FEATURES 
 
-New, restored 4K digital transfer, supervised by cinematographer
Caleb Deschanel, with uncompressed monaural soundtrack 
 
-New documentary on the making of the film, featuring interviews
with members of the 
 
production team 
 
-Excerpts from a 1980 American Film Institute seminar with
director Hal Ashby 
 
-Author Jerzy Kosinksi in a 1979 appearance on The Dick Cavett
Show 
 
-Appearances from 1980 by actor Peter Sellers on NBC’s Today and
The Don Lane Show 
 
-Promo reel featuring Sellers and Ashby 
 
-Trailer and TV spots 
 
-Deleted scene, outtakes, and an alternate ending 
 
-PLUS: An essay by critic Mark Harris 
 
Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers 
Directors: Hal Ashby 
Format: Special Edition, Subtitled, Widescreen 
Language: English 
Subtitles: English 
Region: All Regions 
Number of discs: 1 
Rated: 
PG 
Parental Guidance Suggested 
Studio: Sony Pictures Home Entertainment 
DVD Release Date: March 21, 2017 
Run Time: 130 minutes 
Average Customer Review: 4.7 out of 5 stars See all reviews
(987 customer reviews) 
ASIN: B01MQWIOHJ
</body>
</html>

Mike Easter

unread,

Mar 12, 2017, 11:14:13 AM3/12/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> My Yahoo Mail is POP. I checked the raw message of one of the emails in
> Yahoo's webmail, the body of which follows. Do you see anything in it
> that would cause it to show the odd characters when it reaches the email
> client? (I discovered that it shows the same odd characters on my iPad
> in the Apple email client.)
>
> <html>
> <head>

I thought this was about creating a plaintext extraction of an html
webpage content by C&P to plaintext notepad and (I assumed) plaintext Tb.

Why is this raw content in html?

My understanding including assumptions was that normally the route is:
amazon html > notepad (plaintext) > Tb (plaintext) > mailed via your
yahoo ISP > popped into your Tb inbox.

I assumed that your display of Tb inbox and Tb Sent that we have been
discussing was both in plaintext. View/ message body as plaintext.

--
Mike Easter

king daddy 2

unread,

Mar 12, 2017, 12:19:05 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

I've been following this thread because I have seen (rarely) three (what I think) are icons related to cellphone
OS-icons that do not translate into anything useful in Windows.

I pasted the above into Microsoft Expression Web 4, in two separate raw files.

One with , and one without this top line:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

The pages look identical to me,
and there are no "strange happenings" in either result.

I am unsure if email has some unseen control characters that do not get copied when you copy/paste the data into your
posts here.

Carl

Jo-Anne

unread,

Mar 12, 2017, 1:12:05 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

No, I much prefer to receive emails in html. In the past, that has
worked fine once I've stripped out webpage formatting with Notepad. Why
should I suddenly begin getting those odd characters this month--and why
only on my ISP account emails and not on my gmail account emails?

--
Jo-Anne

Mayayana

unread,

Mar 12, 2017, 1:17:45 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

"Jo-Anne" <Jo-...@nowhere.com> wrote

| My Yahoo Mail is POP. I checked the raw message of one of the emails in
| Yahoo's webmail, the body of which follows. Do you see anything in it
| that would cause it to show the odd characters when it reaches the email
| client? (I discovered that it shows the same odd characters on my iPad
| in the Apple email client.)
|

See my explanation below. It's not easy to follow,
but it's hard to explain unless you get how text
encoding works.

It's hard to know exactly what happens without
having the exact pasting because what you're
getting is characters indicating unrepresentable
text.

You've apparently copied UTF-8 text from the
website. Notepad handles it. It may have been
UTF-8 space characters or some such. You then
paste that into an email that you're sending as
ANSI text. The line "charset=windows-1252"
indicates that. When it arrives, your email client
reads it as ANSI text because that's how you sent
it. The mysterious behavior is due to the fact
that you're dealing witrh text windows that can
handle either. So it looks normal before you send it,
but gets rendered as ANSI when you receive it.

Instead of copying the text here, download the
webpage from your webmail. If you open that in
a hex editor you should see byte values prepending
lines like "Actors:" that are not being displayed.
What to do? Switch all settings to UTF-8 or just
cut the distorted text.

Good Guy

unread,

Mar 12, 2017, 1:46:23 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

I tried two ways:

Copied to Notepad and from notepad to TB ans this is what I got:

Product Details

Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers

Directors: Hal Ashby

Format: Special Edition, Subtitled, Widescreen

Language: English
Subtitles: English
Region: All Regions
Number of discs: 1
Rated:

PG
Parental Guidance Suggested

Studio: Sony Pictures Home Entertainment

DVD Release Date: March 21, 2017

Run Time: 130 minutes
Average Customer Review: 4.7 out of 5 stars See all reviews (987 customer reviews)
ASIN: B01MQWIOHJ
Amazon Best Sellers Rank: #2,701 in Movies & TV (See Top 100 in Movies & TV) #127 in Movies & TV > Blu-ray > Comedy
#276 in Movies & TV > Blu-ray > Drama

Second Method:
From TB using
edit >> Paste without formatting:

Product Details

Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers

Directors: Hal Ashby

Format: Special Edition, Subtitled, Widescreen

Language: English
Subtitles: English
Region: All Regions
Number of discs: 1
Rated:

PG
Parental Guidance Suggested

Studio: Sony Pictures Home Entertainment

DVD Release Date: March 21, 2017

Run Time: 130 minutes
Average Customer Review: 4.7 out of 5 stars See all reviews (987 customer reviews)
ASIN: B01MQWIOHJ
Amazon Best Sellers Rank: #2,701 in Movies & TV (See Top 100 in Movies & TV) #127 in Movies & TV > Blu-ray > Comedy
#276 in Movies & TV > Blu-ray > Drama

Do you see anything funny?

Good Guy

unread,

Mar 12, 2017, 1:50:45 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

On 11/03/2017 07:29, Jo-Anne wrote:

And for completeness, this what I get when I pasted without doing anything special:

Product Details

Actors: Melvyn Douglas, Shirley MacLaine, Peter Sellers

Directors: Hal Ashby

Format: Special Edition, Subtitled, Widescreen

Language: English
Subtitles: English
Region: All Regions
Number of discs: 1
Rated:

PG

Parental Guidance Suggested

Studio: Sony Pictures Home Entertainment

DVD Release Date: March 21, 2017

Run Time: 130 minutes
Average Customer Review: 4.7 out of 5 stars See all reviews (987 customer reviews)

ASIN: B01MQWIOHJ
Amazon Best Sellers Rank: #2,701 in Movies & TV (See Top 100 in Movies & TV)
- #127 in Movies & TV > Blu-ray > Comedy
- #276 in Movies & TV > Blu-ray > Drama

Mike Easter

unread,

Mar 12, 2017, 1:52:37 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> Mike Easter wrote:

>> My understanding including assumptions was that normally the route is:
>> amazon html > notepad (plaintext) > Tb (plaintext) > mailed via your
>> yahoo ISP > popped into your Tb inbox.
>>
>> I assumed that your display of Tb inbox and Tb Sent that we have been
>> discussing was both in plaintext. View/ message body as plaintext.
>
> No, I much prefer to receive emails in html.

I don't like -- IMO -- trying to handle html with variable html-creators
and variable html-renderers is a bad idea. There are myriad ways of
creating html, and if you aren't doing it adroitly or adeptly 'by hand'
with a 'non-wysiwyg' text editor, but instead you are depending on the
different display/creation agents to handle html as if it were
consistent and reliable, you will get various unpredictable events.

I don't have much interest in trying to understand such html chaos.

--
Mike Easter

Mayayana

unread,

Mar 12, 2017, 2:13:46 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

"Mike Easter" <Mi...@ster.invalid> wrote

| > No, I much prefer to receive emails in html.
|
| I don't like -- IMO -- trying to handle html with variable html-creators
| and variable html-renderers is a bad idea.

More importantly, HTML email is a source
of malware and web bug spyware. Text is
harmless. People reading their email as plain
text need only worry about attachments.

Not long ago that was common sense and
good manners. People who didn't know better
were informed that they should change their
settings to plain text. ("And lose the bilious
stationery." :)

That all changed with webmail. Now that the
majority are reading their email as a webpage,
the difference between webpage and email has
blurred. There are businesses that deal specifically
in email spyware, like Constant Contact, which
allows customers to have a record of when and
how much of their email is read. That's not
feasible except in webmail. Yet the customers of
CC apparently feel it's safe to assume that nearly
everyone is using webmail.

TB at least blocks 3rd-party images by default
in HTML email, which is nice. It eliminates the
spyware problem.

Jo-Anne

unread,

Mar 12, 2017, 2:37:21 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

>
>
>
> --
> With over 500 million devices now running Windows 10, customer
> satisfaction is higher than any previous version of windows.

Both look fine to me. Do you use Yahoo Mail? I think that might be where
the trouble is, since it occurs in both Thunderbird and on my iPad using
Apple's email client.

--
Jo-Anne

king daddy 2

unread,

Mar 12, 2017, 4:01:26 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

I am not having the same results as described here. All seems normal.
After the Copy into Notepad, then Copy/Paste into TB new mail,
I have tried sending as html,
and sending as Text.... to YAHOO and Gmail web mail.

Both are received back respectively into Thunderbird (POP3) as expected.

When I paste the Amazon Copy 'Product Details' into notepad, There are no characters... just 4 blanks before each
indented line.
I tried both Notepad and Editpad.
My COPY of the Amazon page Marked Block of 'Product details' is not the source of the problem.

Viewing your post in HexEdit, as expected, indeed it shows the subject lines prefixed with
ï¿½ï¿½ï¿½ (EF BF BD EF BF BD EF BF BD)

Mike Easter

unread,

Mar 12, 2017, 4:57:58 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

king daddy 2 wrote:
> (EF BF BD EF BF BD EF BF BD)

EF BF BD can be a 'regurgitation' of U+FFFD replacement character.

Here's an example:

https://codingrigour.wordpress.com/2011/02/17/the-case-of-the-mysterious-characters/
The Case of the Mysterious Characters

"The three characters correspond to the bytes EF BF BD (in hex), which
is the utf-8 encoding of the Unicode character U+FFFD REPLACEMENT
CHARACTER."

--
Mike Easter

Jo-Anne

unread,

Mar 12, 2017, 5:41:20 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

Thank you, Mayayana! I made the change in Thunderbird options from
Western (ISO-8859-1) to Unicode (UTF-8) for outgoing and incoming
character encodings and changed the fonts from Western to Unicode. That
seems to have fixed the problem, and so far everything looks normal--the
emails sent and received by my regular ISP account and those sent and
received by my gmail account.

I still don't know what changed that would cause the odd behavior to
start happening in my ISP account but not in my gmail account, but I'm
happy that everything is looking normal again.

--
Jo-Anne

Mayayana

unread,

Mar 12, 2017, 10:02:40 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

"Jo-Anne" <Jo-...@nowhere.com> wrote

| I still don't know what changed that would cause the odd behavior to
| start happening in my ISP account but not in my gmail account, but I'm
| happy that everything is looking normal again.
|

Those Googlites are clever. :) If you're
reading it online it may be that gmail "sniffs"
the encoding to make sure it's got the encoding
set right. The reason it just started happening
is likely to be a change at Amazon. But you
have a lot of variables there. Different email
programs, with encoding settings for each.
You'll only see the gibberish when you're
viewing ANSI (western 1252 or other codepage)
*and* you accidentally copied/pasted UTF-8
characters into the sending client. Since
pretty much any text window can handle
UTF-8, it's not surprising that the gibberish
didn't show up on the sending end. The sending
client gracefully handled the UTF-8 but then
obeyed your directive to mark it as ANSI.

Jo-Anne

unread,

Mar 12, 2017, 10:43:27 PM3/12/17

to mozilla-suppo...@lists.mozilla.org

Actually, it turns out it's not just Amazon. I copied and pasted
something from Yahoo News recently, and it did the same thing.

I always read my gmail in Thunderbird and in the Apple email client on
my iPad (I dislike webmail). And the same gibberish showed up in my
Yahoo emails in the Apple email client as in Thunderbird (but never my
gmails). Given that it was happening only with Yahoo Mail and that it
started on March 1 (after years of working OK), I can't help wondering
if Yahoo Mail made some change that set this off. (It's done some other
bad stuff lately, so I wouldn't be surprised.)

--
Jo-Anne

Mayayana

unread,

Apr 3, 2017, 9:59:47 AM4/3/17

to mozilla-suppo...@lists.mozilla.org

An update on gibberish in emails that may be helpful:

While corrupt characters are not a problem other
than visually, they may get more common in the future.

The other day I got an email with corrupt characters
in the subject line. Looking at the internal code, I saw
that while I was reading a GMail email that was in ANSI,
Google had actually inserted a single block of UTF-8 text,
just to show an "emoji".

I don't think most email programs can read both in a
single email. It's in conflict with the MIME format rules.
Presumably Google is just making this stuff up with the
assumption that it will be read as webmail, in a browser.

Technical explanations aside.... It turned out that
the sender had used an emoji character that's supposed
to be "!!".
As in.... "Come to our party!!"
While I can see !! perfectly well, the emoji version was
a small cartoon, actually a unicode character that requires
me to not only view the text as unicode/utf-8 but to also
have the font installed. I don't have any emoji fonts installed.
So I couldn't see the character for two reasons.

The email was from a GMail user on an iPhone. I suspect
she was probably encouraged to use cute cartoons in her
email and has no idea that most people will see nonsense.

Jo-Anne

unread,

Apr 3, 2017, 12:41:33 PM4/3/17

to mozilla-suppo...@lists.mozilla.org

I dislike emojis and never use them, but I have friends who do.
Sometimes, even when they come through as intended on my phone (I don't
have them installed on the computer), I don't know what they mean and
have to look them up. Not a time-saver, and I'm not fond of "cute," but
we'll probably see more and more of them.

--
Jo-Anne

PietB

unread,

Apr 4, 2017, 2:48:27 AM4/4/17

to mozilla-suppo...@lists.mozilla.org

Jo-Anne wrote:
> I dislike emojis and never use them, but I have friends who do.
> Sometimes, even when they come through as intended on my phone
> (I don't have them installed on the computer), I don't know what
> they mean and have to look them up.

+1

> Not a time-saver, and I'm not fond of "cute," but we'll probably
> see more and more of them.

No doubt about that. But even so it's no reason to pay even
a split second of attention to that screen pollution.

-p