Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A few questiosn about encoding

321 views
Skip to first unread message

Νικόλαος Κούρας

unread,
Jun 9, 2013, 6:44:57 AM6/9/13
to
A few questiosn about encoding please:

>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for
>> values up to 256?

>Because then how do you tell when you need one byte, and when you need
>two? If you read two bytes, and see 0x4C 0xFA, does that mean two
>characters, with ordinal values 0x4C and 0xFA, or one character with
>ordinal value 0x4CFA?

I mean utf-8 could use 1 byte for storing the 1st 256 characters. I meant up to 256, not above 256.


>> UTF-8 and UTF-16 and UTF-32
>> I though the number beside of UTF- was to declare how many bits the
>> character set was using to store a character into the hdd, no?

>Not exactly, but close. UTF-32 is completely 32-bit (4 byte) values.
>UTF-16 mostly uses 16-bit values, but sometimes it combines two 16-bit
>values to make a surrogate pair.

A surrogate pair is like itting for example Ctrl-A, which means is a combination character that consists of 2 different characters?
Is this what a surrogate is? a pari of 2 chars?


>UTF-8 uses 8-bit values, but sometimes
>it combines two, three or four of them to represent a single code-point.

'a' to be utf8 encoded needs 1 byte to be stored ? (since ordinal = 65)
'α΄' to be utf8 encoded needs 2 bytes to be stored ? (since ordinal is > 127 )
'a chinese ideogramm' to be utf8 encoded needs 4 byte to be stored ? (since ordinal > 65000 )

The amount of bytes needed to store a character solely depends on the character's ordinal value in the Unicode table?

Fábio Santos

unread,
Jun 9, 2013, 8:18:08 AM6/9/13
to Νικόλαος Κούρας, pytho...@python.org

> --
> http://mail.python.org/mailman/listinfo/python-list

In short, a utf-8 character takes 1 to 4 bytes. A utf-16 character takes 2 to 4 bytes. A utf-32 always takes 4 bytes.

The process of encoding bytes to characters is called encoding. The opposite is decoding. This is all made transparent in python with the encode() and decode() methods. You normally don't care about this kind of things.

Nobody

unread,
Jun 9, 2013, 1:01:06 PM6/9/13
to
On Sun, 09 Jun 2013 03:44:57 -0700, Νικόλαος Κούρας wrote:

>>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for
>>> values up to 256?
>
>>Because then how do you tell when you need one byte, and when you need
>>two? If you read two bytes, and see 0x4C 0xFA, does that mean two
>>characters, with ordinal values 0x4C and 0xFA, or one character with
>>ordinal value 0x4CFA?
>
> I mean utf-8 could use 1 byte for storing the 1st 256 characters. I
> meant up to 256, not above 256.

But then you've used up all 256 possible bytes for storing the first 256
characters, and there aren't any left for use in multi-byte sequences.

You need some means to distinguish between a single-byte character and an
individual byte within a multi-byte sequence.

UTF-8 does that by allocating specific ranges to specific purposes.
0x00-0x7F are single-byte characters, 0x80-0xBF are continuation bytes of
multi-byte sequences, 0xC0-0xFF are leading bytes of multi-byte sequences.

This scheme has the advantage of making UTF-8 non-modal, i.e. if a byte is
corrupted, added or removed, it will only affect the character containing
that particular byte; the encoder can re-synchronise at the beginning of
the following character.

OTOH, with encodings such as UTF-16, UTF-32 or ISO-2022, adding or
removing a byte will result in desyncronisation, with all subsequent
characters being corrupted.

> A surrogate pair is like itting for example Ctrl-A, which means is a
> combination character that consists of 2 different characters? Is this
> what a surrogate is? a pari of 2 chars?

A surrogate pair is a pair of 16-bit codes used to represent a single
Unicode character whose code is greater than 0xFFFF.

The 2048 codepoints from 0xD800 to 0xDFFF inclusive aren't used to
represent characters, but "surrogates". Unicode characters with codes
in the range 0x10000-0x10FFFF are represented in UTF-16 as a pair of
surrogates. First, 0x10000 is subtracted from the code, giving a value in
the range 0-0xFFFFF (20 bits). The top ten bits are added to 0xD800 to
give a value in the range 0xD800-0xDBFF, while the bottom ten bits are
added to 0xDC00 to give a value in the range 0xDC00-0xDFFF.

Because the codes used for surrogates aren't valid as individual
characters, scanning a string for a particular character won't
accidentally match part of a multi-word character.

> 'a' to be utf8 encoded needs 1 byte to be stored ? (since ordinal = 65)
> 'α΄' to be utf8 encoded needs 2 bytes to be stored ? (since ordinal is
> > 127 ) 'a chinese ideogramm' to be utf8 encoded needs 4 byte to be
> stored ? (since ordinal > 65000 )

Most Chinese, Japanese and Korean (CJK) characters have codepoints within
the BMP (i.e. <= 0xFFFF), so they only require 3 bytes in UTF-8. The
codepoints above the BMP are mostly for archaic ideographs (those no
longer in normal use), mathematical symbols, dead languages, etc.

> The amount of bytes needed to store a character solely depends on the
> character's ordinal value in the Unicode table?

Yes. UTF-8 is essentially a mechanism for representing 31-bit unsigned
integers such that smaller integers require fewer bytes than larger
integers (subsequent revisions of Unicode cap the range of possible
codepoints to 0x10FFFF, as that's all that UTF-16 can handle).

Chris “Kwpolska” Warrick

unread,
Jun 9, 2013, 1:12:22 PM6/9/13
to Νικόλαος Κούρας, pytho...@python.org
On Sun, Jun 9, 2013 at 12:44 PM, Νικόλαος Κούρας <nikos...@gmail.com> wrote:
> A few questiosn about encoding please:
>
>>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for
>>> values up to 256?
>
>>Because then how do you tell when you need one byte, and when you need
>>two? If you read two bytes, and see 0x4C 0xFA, does that mean two
>>characters, with ordinal values 0x4C and 0xFA, or one character with
>>ordinal value 0x4CFA?
>
> I mean utf-8 could use 1 byte for storing the 1st 256 characters. I meant up to 256, not above 256.

It is required so the computer can know where characters begin.
0x0080 (first non-ASCII character) becomes 0xC280 in UTF-8. Further
details here: http://en.wikipedia.org/wiki/UTF-8#Description

>>> UTF-8 and UTF-16 and UTF-32
>>> I though the number beside of UTF- was to declare how many bits the
>>> character set was using to store a character into the hdd, no?
>
>>Not exactly, but close. UTF-32 is completely 32-bit (4 byte) values.
>>UTF-16 mostly uses 16-bit values, but sometimes it combines two 16-bit
>>values to make a surrogate pair.
>
> A surrogate pair is like itting for example Ctrl-A, which means is a combination character that consists of 2 different characters?
> Is this what a surrogate is? a pari of 2 chars?

http://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B10000_to_U.2B10FFFF

Long story short: codepoint - 0x10000 (up to 20 bits) → two 10-bit
numbers → 0xD800 + first_half 0xDC00 + second_half. Rephrasing:

We take MATHEMATICAL BOLD CAPITAL B (U+1D401). If you have UTF-8: 𝐁

It is over 0xFFFF, and we need to use surrogate pairs. We end up with
0xD401, or 0b1101010000000001. Both representations are worthless, as
we have a 16-bit number, not a 20-bit one. We throw in some leading
zeroes and end up with 0b00001101010000000001. Split it in half and
we get 0b0000110101 and 0b0000000001, which we can now shorten to
0b110101 and 0b1, or translate to hex as 0x0035 and 0x0001. 0xD800 +
0x0035 and 0xDC00 + 0x0035 → 0xD835 0xDC00. Type it into python and:

>>> b'\xD8\x35\xDC\x01'.decode('utf-16be')
'𝐁'

And before you ask: that “BE” stands for Big-Endian. Little-Endian
would mean reversing the bytes in a codepoint, which would make it
'\x35\xD8\x01\xDC' (the name is based on the first 256 characters,
which are 0x6500 for 'a' in a little-endian encoding.

Another question you may ask: 0xD800…0xDFFF are reserved in Unicode
for the purposes of UTF-16, so there is no conflicts.

>>UTF-8 uses 8-bit values, but sometimes
>>it combines two, three or four of them to represent a single code-point.
>
> 'a' to be utf8 encoded needs 1 byte to be stored ? (since ordinal = 65)
> 'α΄' to be utf8 encoded needs 2 bytes to be stored ? (since ordinal is > 127 )

yup. α is at 0x03B1, or 945 decimal.

> 'a chinese ideogramm' to be utf8 encoded needs 4 byte to be stored ? (since ordinal > 65000 )

Not necessarily, as CJK characters start at U+2E80, which is in the
3-byte range (0x0800 through 0xFFFF) — the table is here:
http://en.wikipedia.org/wiki/UTF-8#Description

--
Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16
stop html mail | always bottom-post
http://asciiribbon.org | http://caliburn.nl/topposting.html

Νικόλαος Κούρας

unread,
Jun 12, 2013, 5:09:05 AM6/12/13
to
>> (*) infact UTF8 also indicates the end of each character

> Up to a point. The initial byte encodes the length and the top few
> bits, but the subsequent octets aren’t distinguishable as final in
> isolation. 0x80-0xBF can all be either medial or final.


So, the first high-bits are a directive that UTF-8 uses to know how many
bytes each character is being represented as.

0-127 codepoints(characters) use 1 bit to signify they need 1 bit for
storage and the rest 7 bits to actually store the character ?

while

128-256 codepoints(characters) use 2 bit to signify they need 2 bits for
storage and the rest 14 bits to actually store the character ?

Isn't 14 bits way to many to store a character ?

Steven D'Aprano

unread,
Jun 12, 2013, 5:24:42 AM6/12/13
to
On Wed, 12 Jun 2013 09:09:05 +0000, Νικόλαος Κούρας wrote:

> Isn't 14 bits way to many to store a character ?

No.

There are 1114111 possible characters in Unicode. (And in Japan, they
sometimes use TRON instead of Unicode, which has even more.)

If you list out all the combinations of 14 bits:

0000 0000 0000 00
0000 0000 0000 01
0000 0000 0000 10
0000 0000 0000 11
[...]
1111 1111 1111 10
1111 1111 1111 11

you will see that there are only 32767 (2**15-1) such values. You can't
fit 1114111 characters with just 32767 values.



--
Steven

Νικόλαος Κούρας

unread,
Jun 12, 2013, 7:23:49 AM6/12/13
to
Thanks Steven,
So, how many bytes does UTF-8 stored for codepoints > 127 ?

example for codepoint 256, 1345, 16474 ?

Dave Angel

unread,
Jun 12, 2013, 8:43:05 AM6/12/13
to pytho...@python.org
Actually, it's worse. There are 16536 such values (2**14), assuming you
include null, which you did in your list.

--
DaveA

Ulrich Eckhardt

unread,
Jun 12, 2013, 8:52:09 AM6/12/13
to
Am 12.06.2013 13:23, schrieb Νικόλαος Κούρας:
> So, how many bytes does UTF-8 stored for codepoints > 127 ?

What has your research turned up? I personally consider it lazy and
respectless to get lots of pointers that you could use for further
research and ask for more info before you even followed these links.


> example for codepoint 256, 1345, 16474 ?

Yes, examples exist. Gee, if there only was an information network that
you could access and where you could locate information on various
programming-related topics somehow. Seriously, someone should invent
this thing! But still, even without it, you have all the tools (i.e.
Python) in your hand to generate these examples yourself! Check out ord,
bin, encode, decode for a start.


Uli

Nobody

unread,
Jun 12, 2013, 4:30:23 PM6/12/13
to
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:

> So, how many bytes does UTF-8 stored for codepoints > 127 ?

U+0000..U+007F 1 byte
U+0080..U+07FF 2 bytes
U+0800..U+FFFF 3 bytes
>=U+10000 4 bytes

So, 1 byte for ASCII, 2 bytes for other Latin characters, Greek, Cyrillic,
Arabic, and Hebrew, 3 bytes for Chinese/Japanese/Korean, 4 bytes for dead
languages and mathematical symbols.

The mechanism used by UTF-8 allows sequences of up to 6 bytes, for a total
of 31 bits, but UTF-16 is limited to U+10FFFF (slightly more than 20 bits).

Steven D'Aprano

unread,
Jun 12, 2013, 8:13:34 PM6/12/13
to
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:

> So, how many bytes does UTF-8 stored for codepoints > 127 ?

Two, three or four, depending on the codepoint.


> example for codepoint 256, 1345, 16474 ?

You can do this yourself. I have already given you enough information in
previous emails to answer this question on your own, but here it is again:

Open an interactive Python session, and run this code:

c = ord(16474)
len(c.encode('utf-8'))


That will tell you how many bytes are used for that example.



--
Steven

Steven D'Aprano

unread,
Jun 12, 2013, 9:40:44 PM6/12/13
to
On Wed, 12 Jun 2013 21:30:23 +0100, Nobody wrote:

> The mechanism used by UTF-8 allows sequences of up to 6 bytes, for a
> total of 31 bits, but UTF-16 is limited to U+10FFFF (slightly more than
> 20 bits).

Same with UTF-8 and UTF-32, both of which are limited to U+10FFFF because
that is what Unicode is limited to.

The *mechanism* of UTF-8 can go up to 6 bytes (or even 7 perhaps?), but
that's not UTF-8, that's UTF-8-plus-extra-codepoints. Likewise the
mechanism of UTF-32 could go up to 0xFFFFFFFF, but doing so means you
don't have Unicode chars any more, and hence your byte-string is not
valid UTF-32:

py> b = b'\xFF'*8
py> b.decode('UTF-32')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)


--
Steven

Chris Angelico

unread,
Jun 12, 2013, 10:01:55 PM6/12/13
to pytho...@python.org
On Thu, Jun 13, 2013 at 11:40 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> The *mechanism* of UTF-8 can go up to 6 bytes (or even 7 perhaps?), but
> that's not UTF-8, that's UTF-8-plus-extra-codepoints.

And a proper UTF-8 decoder will reject "\xC0\x80" and "\xed\xa0\x80",
even though mathematically they would translate into U+0000 and U+D800
respectively. The UTF-16 *mechanism* is limited to no more than
Unicode has currently used, but I'm left wondering if that's actually
the other way around - that Unicode planes were deemed to stop at the
point where UTF-16 can't encode any more. Not that it matters; with
most of the current planes completely unallocated, it seems unlikely
we'll be needing more.

ChrisA

Νικόλαος Κούρας

unread,
Jun 13, 2013, 2:09:19 AM6/13/13
to
On 13/6/2013 3:13 πμ, Steven D'Aprano wrote:
> On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:
>
>> So, how many bytes does UTF-8 stored for codepoints > 127 ?
>
> Two, three or four, depending on the codepoint.

The amount of bytes needed by UTF-8 to store a code-point(character),
depends on the ordinal value of the code-point in the Unicode charset,
correct?

If this is correct then the higher the ordinal value(which is an decimal
integer) in the Unicode charset the more bytes needed for storage.

Its like the bigger a decimal integer is the bigger binary number it
produces.

Is this correct?


>> example for codepoint 256, 1345, 16474 ?
>
> You can do this yourself. I have already given you enough information in
> previous emails to answer this question on your own, but here it is again:
>
> Open an interactive Python session, and run this code:
>
> c = ord(16474)
> len(c.encode('utf-8'))
>
>
> That will tell you how many bytes are used for that example.
This si actually wrong.

ord()'s arguments must be a character for which we expect its ordinal value.

>>> chr(16474)
'䁚'

Some Chinese symbol.
So code-point '䁚' has a Unicode ordinal value of 16474, correct?

where in after encoding this glyph's ordinal value to binary gives us
the following bytes:

>>> bin(16474).encode('utf-8')
b'0b100000001011010'

Now, we take tow symbols out:

'b' symbolism which is there to tell us that we are looking a bytes
object as well as the
'0b' symbolism which is there to tell us that we are looking a binary
representation of a bytes object

Thus, there we count 15 bits left.
So it says 15 bits, which is 1-bit less that 2 bytes.
Is the above statements correct please?


but thinking this through more and more:

>>> chr(16474).encode('utf-8')
b'\xe4\x81\x9a'
>>> len(b'\xe4\x81\x9a')
3

it seems that the bytestring the encode process produces is of length 3.

So i take it is 3 bytes?

but there is a mismatch of what >>> bin(16474).encode('utf-8') and >>>
chr(16474).encode('utf-8') is telling us here.

Care to explain that too please ?






Νικόλαος Κούρας

unread,
Jun 13, 2013, 2:21:28 AM6/13/13
to
On 12/6/2013 11:30 μμ, Nobody wrote:
> On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:
>
>> So, how many bytes does UTF-8 stored for codepoints > 127 ?
>
> U+0000..U+007F 1 byte
> U+0080..U+07FF 2 bytes
> U+0800..U+FFFF 3 bytes
>> =U+10000 4 bytes

'U' stands for Unicode code-point which means a character right?

How can you be able to tell up to what character utf-8 needs 1 byte or 2
bytes or 3?


And some of the bytes' bits are used to tell where a code-points
representations stops, right? i mean if we have a code-point that needs
2 bytes to be stored that the high bit must be set to 1 to signify that
this character's encoding stops at 2 bytes.

I just know that 2^8 = 256, that's by first look 265 places, which mean
256 positions to hold a code-point which in turn means a character.

We take the high bit out and then we have 2^7 which is enough positions
for 0-127 standard ASCII. High bit is set to '0' to signify that char is
encoded in 1 byte.

Please tell me that i understood correct so far.

But how about for 2 or 3 or 4 bytes?

Am i saying ti correct ?



jmfauth

unread,
Jun 13, 2013, 2:28:40 AM6/13/13
to

------

UTF-8, Unicode (consortium): 1 to 4 *Unicode Transformation Unit*

UTF-8, ISO 10646: 1 to 6 *Unicode Transformation Unit*

(still actual, unless tealy freshly modified)

jmf

Chris Angelico

unread,
Jun 13, 2013, 2:48:34 AM6/13/13
to pytho...@python.org
On Thu, Jun 13, 2013 at 4:21 PM, Νικόλαος Κούρας <sup...@superhost.gr> wrote:
> How can you be able to tell up to what character utf-8 needs 1 byte or 2
> bytes or 3?

You look up Wikipedia, using the handy links that have been put to you
MULTIPLE TIMES.

ChrisA

Steven D'Aprano

unread,
Jun 13, 2013, 3:11:08 AM6/13/13
to
On Thu, 13 Jun 2013 09:09:19 +0300, Νικόλαος Κούρας wrote:

> On 13/6/2013 3:13 πμ, Steven D'Aprano wrote:

>> Open an interactive Python session, and run this code:
>>
>> c = ord(16474)
>> len(c.encode('utf-8'))
>>
>>
>> That will tell you how many bytes are used for that example.
> This si actually wrong.
>
> ord()'s arguments must be a character for which we expect its ordinal
> value.

Gah!

That's twice I've screwed that up. Sorry about that!


> >>> chr(16474)
> '䁚'
>
> Some Chinese symbol.
> So code-point '䁚' has a Unicode ordinal value of 16474, correct?

Correct.


> where in after encoding this glyph's ordinal value to binary gives us
> the following bytes:
>
> >>> bin(16474).encode('utf-8')
> b'0b100000001011010'

No! That creates a string from 16474 in base two:

'0b100000001011010'

The leading 0b is just syntax to tell you "this is base 2, not base 8
(0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.

Then you encode the string '0b100000001011010' into UTF-8. There are 17
characters in this string, and they are all ASCII characters to they take
up 1 byte each, giving you bytes b'0b100000001011010' (in ASCII form). In
hex form, they are:

b'\x30\x62\x31\x30\x30\x30\x30\x30\x30\x30\x31\x30\x31\x31\x30\x31\x30'

which takes up a lot more room, which is why Python prefers to show ASCII
characters as characters rather than as hex.

What you want is:

chr(16474).encode('utf-8')


[...]
> Thus, there we count 15 bits left.
> So it says 15 bits, which is 1-bit less that 2 bytes. Is the above
> statements correct please?

No. There are 17 BYTES there. The string "0" doesn't get turned into a
single bit. It still takes up a full byte, 0x30, which is 8 bits.


> but thinking this through more and more:
>
> >>> chr(16474).encode('utf-8')
> b'\xe4\x81\x9a'
> >>> len(b'\xe4\x81\x9a')
> 3
>
> it seems that the bytestring the encode process produces is of length 3.

Correct! Now you have got the right idea.




--
Steven

Νικόλαος Κούρας

unread,
Jun 13, 2013, 3:42:40 AM6/13/13
to
On 13/6/2013 10:11 πμ, Steven D'Aprano wrote:

>> >>> chr(16474)
>> '䁚'
>>
>> Some Chinese symbol.
>> So code-point '䁚' has a Unicode ordinal value of 16474, correct?
>
> Correct.
>
>
>> where in after encoding this glyph's ordinal value to binary gives us
>> the following bytes:
>>
>> >>> bin(16474).encode('utf-8')
>> b'0b100000001011010'

An observations here that you please confirm as valid.

1. A code-point and the code-point's ordinal value are associated into a
Unicode charset. They have the so called 1:1 mapping.

So, i was under the impression that by encoding the code-point into
utf-8 was the same as encoding the code-point's ordinal value into utf-8.

That is why i tried to:
bin(16474).encode('utf-8') instead of chr(16474).encode('utf-8')

So, now i believe they are two different things.
The code-point *is what actually* needs to be encoded and *not* its
ordinal value.


> The leading 0b is just syntax to tell you "this is base 2, not base 8
> (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.

But byte objects are represented as '\x' instead of the aforementioned
'0x'. Why is that?


> No! That creates a string from 16474 in base two:
> '0b100000001011010'

I disagree here.
16474 is a number in base 10. Doing bin(16474) we get the binary
representation of number 16474 and not a string.
Why you say we receive a string while python presents a binary number?


> Then you encode the string '0b100000001011010' into UTF-8. There are 17
> characters in this string, and they are all ASCII characters to they take
> up 1 byte each, giving you bytes b'0b100000001011010' (in ASCII form).

0b100000001011010 stands for a number in base 2 for me not as a string.
Have i understood something wrong?


Chris Angelico

unread,
Jun 13, 2013, 3:58:04 AM6/13/13
to pytho...@python.org
On Thu, Jun 13, 2013 at 5:42 PM, Νικόλαος Κούρας <sup...@superhost.gr> wrote:
> On 13/6/2013 10:11 πμ, Steven D'Aprano wrote:
>> No! That creates a string from 16474 in base two:
>> '0b100000001011010'
>
> I disagree here.
> 16474 is a number in base 10. Doing bin(16474) we get the binary
> representation of number 16474 and not a string.
> Why you say we receive a string while python presents a binary number?

You can disagree all you like. Steven cited a simple point of fact,
one which can be verified in any Python interpreter. Nikos, you are
flat wrong here; bin(16474) creates a string.

ChrisA

Νικόλαος Κούρας

unread,
Jun 13, 2013, 4:08:04 AM6/13/13
to
On 13/6/2013 10:58 πμ, Chris Angelico wrote:
> On Thu, Jun 13, 2013 at 5:42 PM, �������� ������ <sup...@superhost.gr> wrote:
>> On 13/6/2013 10:11 ��, Steven D'Aprano wrote:
>>> No! That creates a string from 16474 in base two:
>>> '0b100000001011010'
>>
>> I disagree here.
>> 16474 is a number in base 10. Doing bin(16474) we get the binary
>> representation of number 16474 and not a string.
>> Why you say we receive a string while python presents a binary number?
>
> You can disagree all you like. Steven cited a simple point of fact,
> one which can be verified in any Python interpreter. Nikos, you are
> flat wrong here; bin(16474) creates a string.

Indeed python embraced it in single quoting '0b100000001011010' and not
as 0b100000001011010 which in fact makes it a string.

But since bin(16474) seems to create a string rather than an expected
number(at leat into my mind) then how do we get the binary
representation of the number 16474 as a number?

Chris Angelico

unread,
Jun 13, 2013, 4:20:38 AM6/13/13
to pytho...@python.org
In Python 2:
>>> 16474

In Python 3, you have to fiddle around with ctypes, but broadly
speaking, the same thing.

ChrisA

Νικόλαος Κούρας

unread,
Jun 13, 2013, 5:41:41 AM6/13/13
to
typing 16474 in interactive session both in python 2 and 3 gives back
the number 16474

while we want the the binary representation of the number 16474


Nobody

unread,
Jun 13, 2013, 6:02:38 AM6/13/13
to
On Thu, 13 Jun 2013 12:01:55 +1000, Chris Angelico wrote:

> On Thu, Jun 13, 2013 at 11:40 AM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>> The *mechanism* of UTF-8 can go up to 6 bytes (or even 7 perhaps?), but
>> that's not UTF-8, that's UTF-8-plus-extra-codepoints.
>
> And a proper UTF-8 decoder will reject "\xC0\x80" and "\xed\xa0\x80", even
> though mathematically they would translate into U+0000 and U+D800
> respectively. The UTF-16 *mechanism* is limited to no more than Unicode
> has currently used, but I'm left wondering if that's actually the other
> way around - that Unicode planes were deemed to stop at the point where
> UTF-16 can't encode any more.

Indeed. 5-byte and 6-byte sequences were originally part of the UTF-8
specification, allowing for 31 bits. Later revisions of the standard
imposed the UTF-16 limit on Unicode as a whole.

Steven D'Aprano

unread,
Jun 13, 2013, 7:49:44 AM6/13/13
to
On Thu, 13 Jun 2013 12:41:41 +0300, Νικόλαος Κούρας wrote:

>> In Python 2:
>>>>> 16474
> typing 16474 in interactive session both in python 2 and 3 gives back
> the number 16474
>
> while we want the the binary representation of the number 16474

Python does not work that way. Ints *always* display in decimal.
Regardless of whether you enter the decimal in binary:

py> 0b100000001011010
16474


octal:

py> 0o40132
16474


or hexadecimal:

py> 0x405A
16474


ints always display in decimal. The only way to display in another base
is to build a string showing what the int would look like in a different
base:

py> hex(16474)
'0x405a'

Notice that the return value of bin, oct and hex are all strings. If they
were ints, then they would display in decimal, defeating the purpose!


--
Steven

Νικόλαος Κούρας

unread,
Jun 13, 2013, 10:19:47 AM6/13/13
to
On 13/6/2013 2:49 μμ, Steven D'Aprano wrote:

Please confirm these are true statement:

A code-point and the code-point's ordinal value are associated into a
Unicode charset. They have the so called 1:1 mapping.

So, i was under the impression that by encoding the code-point into
utf-8 was the same as encoding the code-point's ordinal value into utf-8.

So, now i believe they are two different things.
The code-point *is what actually* needs to be encoded and *not* its
ordinal value.


> The leading 0b is just syntax to tell you "this is base 2, not base 8
> (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.

But byte objects are represented as '\x' instead of the aforementioned
'0x'. Why is that?

> ints always display in decimal. The only way to display in another base
> is to build a string showing what the int would look like in a different
> base:
>
> py> hex(16474)
> '0x405a'
>
> Notice that the return value of bin, oct and hex are all strings. If they
> were ints, then they would display in decimal, defeating the purpose!

Thank you didn't knew that! indeed it working like this.

To encode a number we have to turn it into a string first.

"16474".encode('utf-8')
b'16474'

That 'b' stand for bytes.
How can i view this byte's object representation as hex() or as bin()?

============
Also:
>>> len('0b100000001011010')
17

You said this string consists of 17 chars.
Why the leading syntax of '0b' counts as bits as well? Shouldn't be 15
bits instead of 17?



Message has been deleted

Cameron Simpson

unread,
Jun 13, 2013, 9:00:37 PM6/13/13
to Νικόλαος Κούρας, pytho...@python.org
On 13Jun2013 17:19, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
| A code-point and the code-point's ordinal value are associated into
| a Unicode charset. They have the so called 1:1 mapping.
|
| So, i was under the impression that by encoding the code-point into
| utf-8 was the same as encoding the code-point's ordinal value into
| utf-8.
|
| So, now i believe they are two different things.
| The code-point *is what actually* needs to be encoded and *not* its
| ordinal value.

Because there is a 1:1 mapping, these are the same thing: a code
point is directly _represented_ by the ordinal value, and the ordinal
value is encoded for storage as bytes.

| > The leading 0b is just syntax to tell you "this is base 2, not base 8
| > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.
|
| But byte objects are represented as '\x' instead of the
| aforementioned '0x'. Why is that?

You're confusing a "string representation of a single number in
some base (eg 2 or 16)" with the "string-ish representation of a
bytes object".

The former is just notation for writing a number in different bases, eg:

27 base 10
1b base 16
33 base 8
11011 base 2

A common convention, and the one used by hex(), oct() and bin() in
Python, is to prefix the non-base-10 representations with "0x" for
base 16, "0o" for base 8 ("o"ctal) and "0b" for base 2 ("b"inary):

27
0x1b
0o33
0b11011

This allows the human reader or a machine lexer to decide what base
the number is written in, and therefore to figure out what the
underlying numeric value is.

Conversely, consider the bytes object consisting of the values [97,
98, 99, 27, 10]. In ASCII (and UTF-8 and the iso-8859-x encodings)
these may all represent the characters ['a', 'b', 'c', ESC, NL].
So when "printing" a bytes object, which is a sequence of small integers representing
values stored in bytes, it is compact to print:

b'abc\x1b\n'

which is ['a', 'b', 'c', chr(27), newline].

The slosh (\) is the common convention in C-like languages and many
others for representing special characters not directly represents
by themselves. So "\\" for a slosh, "\n" for a newline and "\x1b"
for character 27 (ESC).

The bytes object is still just a sequence on integers, but because
it is very common to have those integers represent text, and very
common to have some text one want represented as bytes in a direct
1:1 mapping, this compact text form is useful and readable. It is
also legal Python syntax for making a small bytes object.

To demonstrate that this is just a _representation_, run this:

>>> [ i for i in b'abc\x1b\n' ]
[97, 98, 99, 27, 10]

at an interactive Python 3 prompt. See? Just numbers.

| To encode a number we have to turn it into a string first.
|
| "16474".encode('utf-8')
| b'16474'
|
| That 'b' stand for bytes.

Syntactic details. Read this:
http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

| How can i view this byte's object representation as hex() or as bin()?

See above. A bytes is a _sequence_ of values. hex() and bin() print
individual values in hexadecimal or binary respectively. You could
do this:

for value in b'16474':
print(value, hex(value), bin(value))

Cheers,
--
Cameron Simpson <c...@zip.com.au>

Uhlmann's Razor: When stupidity is a sufficient explanation, there is no need
to have recourse to any other.
- Michael M. Uhlmann, assistant attorney general
for legislation in the Ford Administration

Nick the Gr33k

unread,
Jun 14, 2013, 1:34:37 AM6/14/13
to
On 14/6/2013 1:46 πμ, Dennis Lee Bieber wrote:
> On Wed, 12 Jun 2013 09:09:05 +0000 (UTC), ???????? ??????
> <sup...@superhost.gr> declaimed the following:
>
>>>> (*) infact UTF8 also indicates the end of each character
>>
>>> Up to a point. The initial byte encodes the length and the top few
>>> bits, but the subsequent octets aren’t distinguishable as final in
>>> isolation. 0x80-0xBF can all be either medial or final.
>>
>>
>> So, the first high-bits are a directive that UTF-8 uses to know how many
>> bytes each character is being represented as.
>>
>> 0-127 codepoints(characters) use 1 bit to signify they need 1 bit for
>> storage and the rest 7 bits to actually store the character ?
>>
> Not quite... The leading bit is a 0 -> which means 0..127 are sent
> as-is, no manipulation.

So, in utf-8, the leading bit which is a zero 0, its actually a flag to
tell that the code-point needs 1 byte to be stored and the rest 7 bits
is for the actual value of 0-127 code-points ?

>> 128-256 codepoints(characters) use 2 bit to signify they need 2 bits for
>> storage and the rest 14 bits to actually store the character ?
>>
> 128..255 -- in what encoding? These all have the leading bit with a
> value of 1. In 8-bit encodings (ISO-Latin-1) the meaning of those values is
> inherent in the specified encoding and they are sent as-is.

So, latin-iso or greek-iso, the leading 0 is not a flag like it is in
utf-8 encoding because latin-iso and greek-iso and all *-iso use all 8
bits for storage?

But, in utf-8, the leading bit, which is 1, is to tell that the
code-point needs 2 byte to be stored and the rest 7 bits is for the
actual value of 128-255 code-points ?

But why 2 bytes? leading 1 is a flag and the rest 7 bits can hold the
encoded value.

Bu that is not the case since we know that utf-8 needs 2 bytes to store
code-points 127-255


> 1110 starts a three byte sequence, 11110 starts a four byte sequence...
> Basically, count the number of leading 1-bits before a 0 bit, and that
> tells you how many bytes are in the multi-byte sequence -- and all bytes
> that start with 10 are supposed to be the continuations of a multibyte set
> (and not a signal that this is a 1-byte entry -- those only have a leading
> 0)

Why doesn't it work like this?

leading 0 = 1 byte flag
leading 1 = 2 bytes flag
leading 00 = 3 bytes flag
leading 01 = 4 bytes flag
leading 10 = 5 bytes flag
leading 11 = 6 bytes flag

Wouldn't it be more logical?


> Original UTF-8 allowed for 31-bits to specify a character in the Unicode
> set. It used 6 bytes -- 48 bits total, but 7 bits of the first byte were
> the flag (6 leading 1 bits and a 0 bit), and two bits (leading 10) of each
> continuation.

utf8 6 byted = 48 bits - 7 bits(from first bytes) - 2 bits(for each
continuation) * 5 = 48 - 7 - 10 = 31 bits indeed to store the actual
code-point. But 2^31 is still a huge number to store any kind of
character isnt it?





--
What is now proved was at first only imagined!

Zero Piraeus

unread,
Jun 14, 2013, 2:00:57 AM6/14/13
to pytho...@python.org
:

On 14 June 2013 01:34, Nick the Gr33k <sup...@superhost.gr> wrote:
> Why doesn't it work like this?
>
> leading 0 = 1 byte flag
> leading 1 = 2 bytes flag
> leading 00 = 3 bytes flag
> leading 01 = 4 bytes flag
> leading 10 = 5 bytes flag
> leading 11 = 6 bytes flag
>
> Wouldn't it be more logical?

Think about it. Let's say that, as per your scheme, a leading 0
indicates "1 byte" (as is indeed the case in UTF8). What things could
follow that leading 0? How does that impact your choice of a leading
00 or 01 for other numbers of bytes?

... okay, you're obviously going to need to be spoon-fed a little more
than that. Here's a byte:

01010101

Is that a single byte representing a code point in the 0-127 range, or
the first of 4 bytes representing something else, in your proposed
scheme? How can you tell?

Now look at the way UTF8 does it:
<http://en.wikipedia.org/wiki/Utf-8#Description>

Really, follow the link and study the table carefully. Don't continue
reading this until you believe you understand the choices that the
designers of UTF8 made, and why they made them.

Pay particular attention to the possible values for byte 1. Do you
notice the difference between that scheme, and yours:

0xxxxxxx
1xxxxxxx
00xxxxxx
01xxxxxx
10xxxxxx
11xxxxxx

If you don't see it, keep looking until you do ... this email gives
you more than enough hints to work it out. Don't ask someone here to
explain it to you. If you want to become competent, you must use your
brain.

-[]z.

Nick the Gr33k

unread,
Jun 14, 2013, 2:59:59 AM6/14/13
to
On 14/6/2013 4:00 πμ, Cameron Simpson wrote:
> On 13Jun2013 17:19, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
> | A code-point and the code-point's ordinal value are associated into
> | a Unicode charset. They have the so called 1:1 mapping.
> |
> | So, i was under the impression that by encoding the code-point into
> | utf-8 was the same as encoding the code-point's ordinal value into
> | utf-8.
> |
> | So, now i believe they are two different things.
> | The code-point *is what actually* needs to be encoded and *not* its
> | ordinal value.
>
> Because there is a 1:1 mapping, these are the same thing: a code
> point is directly _represented_ by the ordinal value, and the ordinal
> value is encoded for storage as bytes.

So, you are saying that:

chr(16474).encode('utf-8') #being the code-point encoded

ord(chr(16474)).encode('utf-8') #being the code-point's ordinal
encoded which gives an error.

that shows us that a character is what is being be encoded to utf-8 but
the character's ordinal cannot.

So, whay you say "....and the ordinal value is encoded for storage as
bytes." ?


> | > The leading 0b is just syntax to tell you "this is base 2, not base 8
> | > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.
> |
> | But byte objects are represented as '\x' instead of the
> | aforementioned '0x'. Why is that?
>
> You're confusing a "string representation of a single number in
> some base (eg 2 or 16)" with the "string-ish representation of a
> bytes object".

>>> bin(16474)
'0b100000001011010'
that is a binary format string representation of number 16474, yes?

>>> hex(16474)
'0x405a'
that is a hexadecimal format string representation of number 16474, yes?

WHILE:

b'abc\x1b\n' = a string representation of a byte, which in turn is a
series of integers, so that makes this a string representation of
integers, is this correct?

\x1b = ESC character

\ = for seperating bytes
x = to flag that the following bytes are going to be represented as hex
values? whats exactly 'x' means here? character perhaps?

Still its not clear into my head what the difference of '0x1b' and
'\x1b' is:

i think:
0x1b = an integer represented in hex format

\x1b = a character represented in hex format

id this true?




> | How can i view this byte's object representation as hex() or as bin()?
>
> See above. A bytes is a _sequence_ of values. hex() and bin() print
> individual values in hexadecimal or binary respectively.

>>> for value in b'\x97\x98\x99\x27\x10':
... print(value, hex(value), bin(value))
...
151 0x97 0b10010111
152 0x98 0b10011000
153 0x99 0b10011001
39 0x27 0b100111
16 0x10 0b10000


>>> for value in b'abc\x1b\n':
... print(value, hex(value), bin(value))
...
97 0x61 0b1100001
98 0x62 0b1100010
99 0x63 0b1100011
27 0x1b 0b11011
10 0xa 0b1010


Why these two give different values when printed?

Nick the Gr33k

unread,
Jun 14, 2013, 3:28:32 AM6/14/13
to
On 14/6/2013 9:00 πμ, Zero Piraeus wrote:
> :
>
> On 14 June 2013 01:34, Nick the Gr33k <sup...@superhost.gr> wrote:
>> Why doesn't it work like this?
>>
>> leading 0 = 1 byte flag
>> leading 1 = 2 bytes flag
>> leading 00 = 3 bytes flag
>> leading 01 = 4 bytes flag
>> leading 10 = 5 bytes flag
>> leading 11 = 6 bytes flag
>>
>> Wouldn't it be more logical?
>
> Think about it. Let's say that, as per your scheme, a leading 0
> indicates "1 byte" (as is indeed the case in UTF8). What things could
> follow that leading 0? How does that impact your choice of a leading
> 00 or 01 for other numbers of bytes?
>
> ... okay, you're obviously going to need to be spoon-fed a little more
> than that. Here's a byte:
>
> 01010101
>
> Is that a single byte representing a code point in the 0-127 range, or
> the first of 4 bytes representing something else, in your proposed
> scheme? How can you tell?

Indeed.

You cannot tell if it stands for 1 byte or a 4 byte sequence:

0 + 1010101 = leading 0 stands for 1byte representation of a code-point

01 + 010101 = leading 01 stands for 4byte representation of a code-point

the problem here in my scheme of how utf8 encoding works is that you
cannot tell whether the flag is '0' or '01'

Same happen with leading '1' and '11'. You cannot tell what the flag is,
so you cannot know if the Unicode code-point is being represented as
2-byte sequence or 6 bye sequence

Understood


> Now look at the way UTF8 does it:
> <http://en.wikipedia.org/wiki/Utf-8#Description>
>
> Really, follow the link and study the table carefully. Don't continue
> reading this until you believe you understand the choices that the
> designers of UTF8 made, and why they made them.
>
> Pay particular attention to the possible values for byte 1. Do you
> notice the difference between that scheme, and yours:
>
> 0xxxxxxx
> 1xxxxxxx
> 00xxxxxx
> 01xxxxxx
> 10xxxxxx
> 11xxxxxx
>
> If you don't see it, keep looking until you do ... this email gives
> you more than enough hints to work it out. Don't ask someone here to
> explain it to you. If you want to become competent, you must use your
> brain.

0xxxxxxx
110xxxxx 10xxxxxx
1110xxxx 10xxxxxx 10xxxxxx
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

I did read the link but i still cannot see why

1. '110' is the flag for 2-byte code-point
2. why the in the 2nd byte and every subsequent byte leading flag has to
be '10'

Antoon Pardon

unread,
Jun 14, 2013, 3:36:29 AM6/14/13
to pytho...@python.org
Op 13-06-13 10:08, Νικόλαος Κούρας schreef:
You don't. You should remember that python (or any programming language)
doesn't print numbers. It always prints string representations of
numbers. It is just so that we are so used to the decimal representation
that we think of that representation as being the number.

Normally that is not a problem but it can cause confusion when you are
working with mulitple representations.

--
Antoon Pardon

Nick the Gr33k

unread,
Jun 14, 2013, 3:49:51 AM6/14/13
to
Hold on!
Youa re basically saying here that:


>>> 16474
16474

is nto a number as we think but instead is string representation of a
number?

I dont think so, if it were a string representation of a number that
would print the following:

>>> 16474
'16474'

Python prints numbers:

>>> 16474
16474
>>> 0b100000001011010
16474
>>> 0x405a
16474

it prints them all to decimal format though.
but when we need a decimal integer to be turned into bin() or hex() we
can bin(number) hex(number) and just remove the pair of single quoting.

Antoon Pardon

unread,
Jun 14, 2013, 4:22:31 AM6/14/13
to pytho...@python.org
Op 14-06-13 09:49, Nick the Gr33k schreef:
> On 14/6/2013 10:36 πμ, Antoon Pardon wrote:
>> Op 13-06-13 10:08, Νικόλαος Κούρας schreef:
>>>
>>> Indeed python embraced it in single quoting '0b100000001011010' and
>>> not as 0b100000001011010 which in fact makes it a string.
>>>
>>> But since bin(16474) seems to create a string rather than an expected
>>> number(at leat into my mind) then how do we get the binary
>>> representation of the number 16474 as a number?
>>
>> You don't. You should remember that python (or any programming language)
>> doesn't print numbers. It always prints string representations of
>> numbers. It is just so that we are so used to the decimal representation
>> that we think of that representation as being the number.
>>
>> Normally that is not a problem but it can cause confusion when you are
>> working with mulitple representations.
> Hold on!
> Youa re basically saying here that:
>
>
> >>> 16474
> 16474
>
> is nto a number as we think but instead is string representation of a
> number?
Yes, or if you prefer what python prints is the decimal notation of the number.

>
> I dont think so, if it were a string representation of a number that
> would print the following:
>
> >>> 16474
> '16474'

No it wouldn't, You are confusing representation in the everyday meaning
with representation as python jargon.


> Python prints numbers:
No it doesn't, numbers are abstract concepts that can be represented in
various notations, these notations are strings. Those notaional strings
end up being printed. As I said before we are so used in using the
decimal notation that we often use the notation and the number interchangebly
without a problem. But when we are working with multiple notations that
can become confusing and we should be careful to seperate numbers from their
representaions/notations.


> but when we need a decimal integer

There are no decimal integers. There is only a decimal notation of the number.
Decimal, octal etc are not characteristics of the numbers themselves.

--

Antoon Pardon

Nick the Gr33k

unread,
Jun 14, 2013, 4:37:02 AM6/14/13
to
On 14/6/2013 11:22 πμ, Antoon Pardon wrote:

>> Python prints numbers:
> No it doesn't, numbers are abstract concepts that can be represented in
> various notations, these notations are strings. Those notaional strings
> end up being printed. As I said before we are so used in using the
> decimal notation that we often use the notation and the number interchangebly
> without a problem. But when we are working with multiple notations that
> can become confusing and we should be careful to seperate numbers from their
> representaions/notations.

How do we separate a number then from its represenation-natation?

What is a notation anywat? is it a way of displayment? but that would be
a represeantion then....

Please explain this line as it uses both terms.

No it doesn't, numbers are abstract concepts that can be represented in
various notations

>> but when we need a decimal integer
>
> There are no decimal integers. There is only a decimal notation of the number.
> Decimal, octal etc are not characteristics of the numbers themselves.

So everything we see like:

16474
nikos
abc123

everything is a string and nothing is a number? not even number 1?

Heiko Wundram

unread,
Jun 14, 2013, 5:06:55 AM6/14/13
to pytho...@python.org
Am 14.06.2013 10:37, schrieb Nick the Gr33k:
> So everything we see like:
>
> 16474
> nikos
> abc123
>
> everything is a string and nothing is a number? not even number 1?

Come on now, this is _so_ obviously trolling, it's not even remotely
funny anymore. Why doesn't killfiling work with the mailing list version
of the python list? :-(

--
--- Heiko.

Nick the Gr33k

unread,
Jun 14, 2013, 5:32:56 AM6/14/13
to
I'mm not trolling man, i just have hard time understanding why numbers
acts as strings.

Cameron Simpson

unread,
Jun 14, 2013, 6:19:16 AM6/14/13
to Nick the Gr33k, pytho...@python.org
On 14Jun2013 11:37, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
| On 14/6/2013 11:22 πμ, Antoon Pardon wrote:
|
| >>Python prints numbers:
| >No it doesn't, numbers are abstract concepts that can be represented in
| >various notations, these notations are strings. Those notaional strings
| >end up being printed. As I said before we are so used in using the
| >decimal notation that we often use the notation and the number interchangebly
| >without a problem. But when we are working with multiple notations that
| >can become confusing and we should be careful to seperate numbers from their
| >representaions/notations.
|
| How do we separate a number then from its represenation-natation?

Shrug. When you "print" a number, Python transcribes a string
representation of it to your terminal.

| What is a notation anywat? is it a way of displayment? but that
| would be a represeantion then....

Yep. Same thing. A "notation" is a particulart formal method of
representation.

| No it doesn't, numbers are abstract concepts that can be represented in
| various notations
|
| >>but when we need a decimal integer
| >
| >There are no decimal integers. There is only a decimal notation of the number.
| >Decimal, octal etc are not characteristics of the numbers themselves.
|
| So everything we see like:
|
| 16474
| nikos
| abc123
|
| everything is a string and nothing is a number? not even number 1?

Everything you see like that is textual information. Internally to
Python, various types are used: strings, bytes, integers etc. But
when you print something, text is output.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

A long-forgotten loved one will appear soon. Buy the negatives at any price.

Fábio Santos

unread,
Jun 14, 2013, 6:20:53 AM6/14/13
to Heiko Wundram, pytho...@python.org


On 14 Jun 2013 10:20, "Heiko Wundram" <mode...@modelnine.org> wrote:
>
> Am 14.06.2013 10:37, schrieb Nick the Gr33k:
>>

>> So everything we see like:
>>
>> 16474
>> nikos
>> abc123
>>
>> everything is a string and nothing is a number? not even number 1?
>
>

> Come on now, this is _so_ obviously trolling, it's not even remotely funny anymore. Why doesn't killfiling work with the mailing list version of the python list? :-(

I have skimmed the archives for this month, and I estimate that a third of this month's activity on this list was helping this person. About 80% of that is wasted in explaining basic concepts he refuses to read in links given to him. A depressingly large number of replies to his posts are seemingly ignored.

Since this is a lot of spam, I feel like leaving the list, but I also honestly want to help people use python and the replies to questions of others often give me much insight on several matters.

Cameron Simpson

unread,
Jun 14, 2013, 6:14:54 AM6/14/13
to Nick the Gr33k, pytho...@python.org
On 14Jun2013 09:59, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
| On 14/6/2013 4:00 πμ, Cameron Simpson wrote:
| >On 13Jun2013 17:19, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
| >| A code-point and the code-point's ordinal value are associated into
| >| a Unicode charset. They have the so called 1:1 mapping.
| >|
| >| So, i was under the impression that by encoding the code-point into
| >| utf-8 was the same as encoding the code-point's ordinal value into
| >| utf-8.
| >|
| >| So, now i believe they are two different things.
| >| The code-point *is what actually* needs to be encoded and *not* its
| >| ordinal value.
| >
| >Because there is a 1:1 mapping, these are the same thing: a code
| >point is directly _represented_ by the ordinal value, and the ordinal
| >value is encoded for storage as bytes.
|
| So, you are saying that:
|
| chr(16474).encode('utf-8') #being the code-point encoded
|
| ord(chr(16474)).encode('utf-8') #being the code-point's ordinal
| encoded which gives an error.
|
| that shows us that a character is what is being be encoded to utf-8
| but the character's ordinal cannot.
|
| So, whay you say "....and the ordinal value is encoded for storage
| as bytes." ?

No, I mean conceptually, there is no difference between a codepoint
and its ordinal value. They are the same thing.

Inside Python itself, a character (a string of length 1; there is
no separate character type) is a distinct type. Interally, the
characters in a string are stored numericly. As Unicode codepoints,
as their ordinal values.

It is a meaningful idea to store a Python string encoded into bytes
using some text encoding scheme (utf-8, iso-8859-7, what have you).

It is not a meaningful thing to store a number "encoded" without
some more context. The .encode() method that accepts an encoding
name like "utf-8" is specificly an encoding procedure FOR TEXT.

So strings have such a method, and integers do not.

When you write:

chr(16474)

you receive a _string_, containing the single character whose ordinal
is 16474. It is meaningful to transcribe this string to bytes using
a text encoding procedure like 'utf-8'.

When you write:

ord(chr(16474))

you get an integer. Because ord() is the reverse of chr(), you get
the integer 16474.

Integers do not have .encode() methods that accept a _text_ encoding
name like 'utf-8' because integers are not text.

| >| > The leading 0b is just syntax to tell you "this is base 2, not base 8
| >| > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.
| >|
| >| But byte objects are represented as '\x' instead of the
| >| aforementioned '0x'. Why is that?
| >
| >You're confusing a "string representation of a single number in
| >some base (eg 2 or 16)" with the "string-ish representation of a
| >bytes object".
|
| >>> bin(16474)
| '0b100000001011010'
| that is a binary format string representation of number 16474, yes?

Yes.

| >>> hex(16474)
| '0x405a'
| that is a hexadecimal format string representation of number 16474, yes?

Yes.

| WHILE:
| b'abc\x1b\n' = a string representation of a byte, which in turn is a
| series of integers, so that makes this a string representation of
| integers, is this correct?

A "bytes" Python object. So not "a byte", 5 bytes.
It is a string representation of the series of byte values,
ON THE PREMISE that the bytes may well represent text.
On that basis, b'abc\x1b\n' is a reasonable way to display them.

In other contexts this might not be a sensible way to display these
bytes, and then another format would be chosen, possibly hand
constructed by the programmer, or equally reasonable, the hexlify()
function from the binascii module.

| \x1b = ESC character

Considering the bytes to be representing characters, then yes.

| \ = for seperating bytes

No, \ to introduce a sequence of characters with special meaning.

Normally a character in a b'...' item represents the byte value
matching the character's Unicode ordinal value. But several characters
are hard or confusing to place literally in a b'...' string. For
example a newline character or and escape character.

'a' means 65.
'\n' means 10 (newline, hence the 'n').
'\x1b' means 33 (escape, value 27, value 0x1b in hexadecimal).
And, of course, '\\' means a literal slosh, value 92.

| x = to flag that the following bytes are going to be represented as
| hex values? whats exactly 'x' means here? character perhaps?

A slosh followed by an 'x' means there will be 2 hexadecimal digits
to follow, and those two digits represent the byte value.

So, yes.

| Still its not clear into my head what the difference of '0x1b' and
| '\x1b' is:

They're the same thing in two similar but slightly different formats.

0x1b is a legitimate "bare" integer value in Python.

\x1b is a sequence you find inside strings (and "byte" strings, the
b'...' format).

| i think:
| 0x1b = an integer represented in hex format

Yes.

| \x1b = a character represented in hex format

Yes.

| >| How can i view this byte's object representation as hex() or as bin()?
| >
| >See above. A bytes is a _sequence_ of values. hex() and bin() print
| >individual values in hexadecimal or binary respectively.
|
| >>> for value in b'\x97\x98\x99\x27\x10':
| ... print(value, hex(value), bin(value))
| ...
| 151 0x97 0b10010111
| 152 0x98 0b10011000
| 153 0x99 0b10011001
| 39 0x27 0b100111
| 16 0x10 0b10000
|
|
| >>> for value in b'abc\x1b\n':
| ... print(value, hex(value), bin(value))
| ...
| 97 0x61 0b1100001
| 98 0x62 0b1100010
| 99 0x63 0b1100011
| 27 0x1b 0b11011
| 10 0xa 0b1010
|
|
| Why these two give different values when printed?

97 is in base 10 (9*10+7=97), but the notation '\x97' is base 16, so 9*16+7=151.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

I'm Bubba of Borg. Y'all fixin' to be assimilated.

Antoon Pardon

unread,
Jun 14, 2013, 6:50:14 AM6/14/13
to pytho...@python.org
Op 14-06-13 10:37, Nick the Gr33k schreef:
> On 14/6/2013 11:22 πμ, Antoon Pardon wrote:
>
>>> Python prints numbers:
>> No it doesn't, numbers are abstract concepts that can be represented in
>> various notations, these notations are strings. Those notaional strings
>> end up being printed. As I said before we are so used in using the
>> decimal notation that we often use the notation and the number
>> interchangebly
>> without a problem. But when we are working with multiple notations that
>> can become confusing and we should be careful to seperate numbers
>> from their
>> representaions/notations.
>
> How do we separate a number then from its represenation-natation?
What do you mean? Internally there is no representation linked
to the number, so there is nothing to be seperated. Only when
a number needs to be printed, is a representation for that
number built and displayed.


> What is a notation anywat? is it a way of displayment? but that would
> be a represeantion then....
Yes a notation is a representation. However "represenation" is also
a bit of python jargon that has a specific meaning. So in order to
not confuse with multiple possible meanings for "representation" I
chose to use "notation"


>> There are no decimal integers. There is only a decimal notation of
>> the number.
>> Decimal, octal etc are not characteristics of the numbers themselves.
>
>
> So everything we see like:
>
> 16474
> nikos
> abc123
>
> everything is a string and nothing is a number? not even number 1?
There is a difference between "everything we see" as you
write earlier and just plain "eveything" as you write
later. Python works with numbers, but at the moment
it has to display such a number it has to produce something
that is printable. So it will build a string that can be
used as a notation for that number, a numeral. And that
is what will be displayed.

--
Antoon.

Antoon Pardon

unread,
Jun 14, 2013, 7:09:03 AM6/14/13
to pytho...@python.org
Op 14-06-13 11:32, Nick the Gr33k schreef:

> I'mm not trolling man, i just have hard time understanding why numbers
> acts as strings.
They don't. No body claimed numbers acted like strings. What was explained,
was that when numbers are displayed, they are converted into a notational
string, which is then displayed. This to clear you of your confusion between
numerals and numbers which you displayed by writing something like "the
binary representation as a number"

--
Antoon Pardon

Heiko Wundram

unread,
Jun 14, 2013, 6:15:11 AM6/14/13
to pytho...@python.org
Am 14.06.2013 11:32, schrieb Nick the Gr33k:
> I'mm not trolling man, i just have hard time understanding why numbers
> acts as strings.

If you can't grasp the conceptual differences between numbers and
their/a representation, it's probably best if you stayed away from
programming alltogether.

I don't think you're actually as thick as you sound, but rather either
you're simply too damn lazy to take the time to inform yourself from all
the hints/links/information you've been given, or you're trolling. I'm
still leaning towards the second.

--
--- Heiko.

rusi

unread,
Jun 14, 2013, 7:51:46 AM6/14/13
to
On Jun 14, 3:20 pm, Fábio Santos <fabiosantos...@gmail.com> wrote:
> > Come on now, this is _so_ obviously trolling, it's not even remotely
>
> funny anymore. Why doesn't killfiling work with the mailing list version of
> the python list? :-(
>
> I have skimmed the archives for this month, and I estimate that a third of
> this month's activity on this list was helping this person. About 80% of
> that is wasted in explaining basic concepts he refuses to read in links
> given to him. A depressingly large number of replies to his posts are
> seemingly ignored.
>
> Since this is a lot of spam, I feel like leaving the list, but I also
> honestly want to help people use python and the replies to questions of
> others often give me much insight on several matters.

Adding my +1 to this sentiment.

In older saner and more politically incorrect times, when there was a
student who was as idiotic as Nikos, he would be made to:
-- run five rounds of the field
-- stay after school
-- write pages of "I shall not talk in class"

In the age of cut-n-paste the last has lost its sting. Likewise the
first two are hard to administer across the internet.
Still if we are genuinely interested in solving this problem, ways may
be found, for example:

Any question from Nikos that has any English error, should be returned
with:
Correct your English before we look at your python.

If he is brazen enough to correct one error and leave the other 35,
then we put in a 24-hour delay for each reply.

I am sure others can come up with better solutions if we wish.

The alternative is that this disease has an unfavorable prognosis:
[Yes Nikos is an infectious disease: I believe I can pull out mails
from Steven and Grant Edwards whic hare begng tolook sspcicious ly
like Nikos [Sorry Im not much good at imitation!] ]

And that unfavorable prognosis is what Fabio is suggesting -- people
will start leaving the list/group.

Nikos:
This is not against you personally. Just your current mode of conduct
towards this list.
And that mode quite simply is this: You have no interest in python,
you are only interested in the immediate questions of your web-hosting.

Mark Lawrence

unread,
Jun 14, 2013, 7:57:02 AM6/14/13
to pytho...@python.org
On 14/06/2013 11:20, Fábio Santos wrote:

>
> Since this is a lot of spam, I feel like leaving the list, but I also
> honestly want to help people use python and the replies to questions of
> others often give me much insight on several matters.
>

Plenty of genuine people needing genuine help on the tutor mailing list,
or have you been there already?

--
"Steve is going for the pink ball - and for those of you who are
watching in black and white, the pink is next to the green." Snooker
commentator 'Whispering' Ted Lowe.

Mark Lawrence

rusi

unread,
Jun 14, 2013, 8:09:05 AM6/14/13
to
On Jun 14, 4:51 pm, rusi <rustompm...@gmail.com> wrote:
> On Jun 14, 3:20 pm, Fábio Santos <fabiosantos...@gmail.com> wrote:
>
> > > Come on now, this is _so_ obviously trolling, it's not even remotely
>
> > funny anymore. Why doesn't killfiling work with the mailing list version of
> > the python list? :-(
>
> > I have skimmed the archives for this month, and I estimate that a third of
> > this month's activity on this list was helping this person. About 80% of
> > that is wasted in explaining basic concepts he refuses to read in links
> > given to him. A depressingly large number of replies to his posts are
> > seemingly ignored.
>
> > Since this is a lot of spam, I feel like leaving the list, but I also
> > honestly want to help people use python and the replies to questions of
> > others often give me much insight on several matters.
>
> Adding my +1 to this sentiment.

Since identifying a disease by the right name is key to finding a
cure:
Nikos is not trolling or spamming; he is help-vampiring.

Lets call it that.

Heiko Wundram

unread,
Jun 14, 2013, 8:31:53 AM6/14/13
to pytho...@python.org
Am 14.06.2013 14:09, schrieb rusi:
> Since identifying a disease by the right name is key to finding a
> cure:
> Nikos is not trolling or spamming; he is help-vampiring.

Just to explain the trolling allegation: I'm not talking about him
wanting to get his scripts fixed, that's help-vampiring most certainly,
and an extreme form of that (thanks btw. for pointing me to that term,
whoever did).

I was talking about his repeated attempts at "making conversation" by
asking questions about encoding, short-circuit evaluation and such which
seem like they are relevant for him to solve his problem, but due to his
persistence of understanding things in a wrong way/not understanding
them at all/repeating the same misunderstandings time after time have
drifted off into endless repetitions of the same facts by helpful
posters, and have gotten a lot of people seriously annoyed (also, due to
other facts such as him changing his NNTP hosts and/or From-addresses
which breaks kill-filing).

Now, if that latter behaviour isn't trolling, I don't know what is.
Simply nobody who takes what he does at least a little bit serious is
_as_ thick as he makes himself seem.

--
--- Heiko.

Nick the Gr33k

unread,
Jun 14, 2013, 8:36:11 AM6/14/13
to
Hold on.

number = an abstract sense
numeral = ?
notation = ?
represenation = ?

Nick the Gr33k

unread,
Jun 14, 2013, 8:41:25 AM6/14/13
to
On 14/6/2013 1:19 μμ, Cameron Simpson wrote:
> On 14Jun2013 11:37, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
> | On 14/6/2013 11:22 πμ, Antoon Pardon wrote:
> |
> | >>Python prints numbers:
> | >No it doesn't, numbers are abstract concepts that can be represented in
> | >various notations, these notations are strings. Those notaional strings
> | >end up being printed. As I said before we are so used in using the
> | >decimal notation that we often use the notation and the number interchangebly
> | >without a problem. But when we are working with multiple notations that
> | >can become confusing and we should be careful to seperate numbers from their
> | >representaions/notations.
> |
> | How do we separate a number then from its represenation-natation?
>
> Shrug. When you "print" a number, Python transcribes a string
> representation of it to your terminal.

>>> 16
16

So the output 16 is in fact a string representation of the number 16 ?

Then in what 16 and '16; differ to?

>
> | What is a notation anywat? is it a way of displayment? but that
> | would be a represeantion then....
>
> Yep. Same thing. A "notation" is a particulart formal method of
> representation.


Can you elaborate please?
> | No it doesn't, numbers are abstract concepts that can be represented in
> | various notations
> |
> | >>but when we need a decimal integer
> | >
> | >There are no decimal integers. There is only a decimal notation of the number.
> | >Decimal, octal etc are not characteristics of the numbers themselves.
> |
> | So everything we see like:
> |
> | 16474
> | nikos
> | abc123
> |
> | everything is a string and nothing is a number? not even number 1?
>
> Everything you see like that is textual information. Internally to
> Python, various types are used: strings, bytes, integers etc. But
> when you print something, text is output.
>
> Cheers,
>
Thanks!

Joel Goldstick

unread,
Jun 14, 2013, 8:44:11 AM6/14/13
to Nick the Gr33k, pytho...@python.org
go away Nick.  Go far away.  You are not a good person.  You are not even a good Troll.  You are just nick the *ick.  You should take up something you can do better than this.. like maybe sleeping


Nick the Gr33k

unread,
Jun 14, 2013, 8:45:15 AM6/14/13
to
On 14/6/2013 1:20 μμ, Fábio Santos wrote:
>
> On 14 Jun 2013 10:20, "Heiko Wundram" <mode...@modelnine.org
I'am not spamming and as you say i dare to ask what other don't if they
don't understand something.
I'am not trolling and actually help people with my questions(as you
admitted yourself) and you are helping with your replies.

we are all benefit out of this.

Nick the Gr33k

unread,
Jun 14, 2013, 8:50:37 AM6/14/13
to
On 14/6/2013 2:51 μμ, rusi wrote:

> Nikos:
> This is not against you personally. Just your current mode of conduct
> towards this list.
> And that mode quite simply is this: You have no interest in python,
> you are only interested in the immediate questions of your web-hosting.

If that was True i wouldn't be asking to be given detailed explanations
of a solution provided to me, neither would i been asked to understand
why the way i tried it is wrong and what was the correct way of writing
the code and why.

So, if i had no interest of actually learning python i would just cut n'
paste provided code without worrying what it actually does, since
knowing that came form you would be enough to know that works.

Heiko Wundram

unread,
Jun 14, 2013, 8:58:05 AM6/14/13
to pytho...@python.org
Am 14.06.2013 14:45, schrieb Nick the Gr33k:
> we are all benefit out of this.

Let's nominate you for a nobel prize, saviour of python-list!

--
--- Heiko.

Nick the Gr33k

unread,
Jun 14, 2013, 8:59:01 AM6/14/13
to
On 14/6/2013 1:50 μμ, Antoon Pardon wrote:

> Python works with numbers, but at the moment
> it has to display such a number it has to produce something
> that is printable. So it will build a string that can be
> used as a notation for that number, a numeral. And that
> is what will be displayed.

so a number is just a number but when this number needs to be displayed
into a monitor, then the printed form of that number we choose to call
it a numeral?

So, a numeral = a string representation of a number. Is this correct?

Antoon Pardon

unread,
Jun 14, 2013, 9:25:27 AM6/14/13
to pytho...@python.org
Op 14-06-13 14:36, Nick the Gr33k schreef:
I already explained these in previous responses. I am not going to repeat
myself. IMO you are out of place here. You belong in a tutor class about
basical computing concepts. There you can aquire the knowledge that is
more or less expected of those who want to contribute here. I don't mind
the occasional gap in knowledge but with you it seems more there is an
occasional grain of knowledge in a sea of ignorance. To remedy the former
a single explanation is mostly sufficient. To remedy the latter you need
a tutorial course.

Now there is nothing wrong in being ignorant. The question is how do you
proceed from there. The answer is not by starting a project that is far
above your ability and pestering the experts in the hope they will spoon
feed you.

Fábio Santos

unread,
Jun 14, 2013, 9:25:54 AM6/14/13
to Heiko Wundram, pytho...@python.org
On Fri, Jun 14, 2013 at 1:58 PM, Heiko Wundram <mode...@modelnine.org> wrote:
> Am 14.06.2013 14:45, schrieb Nick the Gr33k:
>
>> we are all benefit out of this.
>
>
> Let's nominate you for a nobel prize, saviour of python-list!
>

I don't want to be saved. I just found out how to mute conversations in gmail.

--
Fábio Santos

Antoon Pardon

unread,
Jun 14, 2013, 9:52:38 AM6/14/13
to pytho...@python.org
Op 14-06-13 14:59, Nick the Gr33k schreef:
Yes, when you print an integer, what actually happens is something along
the following algorithm (python 2 code):


def write_int(out, nr):
ord0 = ord('0')
lst = []
negative = False
if nr < 0:
negative = True
nr = -nr
while nr:
digit = nr % 10
lst.append(chr(digit + ord0))
nr /= 10
if negative:
lst.append('-')
lst.reverse()
if not lst:
lst.append('0')
numeral = ''.join(lst)
out.write(numeral)

--
Antoon Pardon





Nick the Gr33k

unread,
Jun 14, 2013, 9:58:20 AM6/14/13
to
On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
> Normally a character in a b'...' item represents the byte value
> matching the character's Unicode ordinal value.

The only thing that i didn't understood is this line.
First please tell me what is a byte value

> \x1b is a sequence you find inside strings (and "byte" strings, the
> b'...' format).

\x1b is a character(ESC) represented in hex format

b'\x1b' is a byte object that represents what?


>>> chr(27).encode('utf-8')
b'\x1b'

>>> b'\x1b'.decode('utf-8')
'\x1b'

After decoding it gives the char ESC in hex format
Shouldn't it result in value 27 which is the ordinal of ESC ?

> No, I mean conceptually, there is no difference between a code-point
> and its ordinal value. They are the same thing.

Why Unicode charset doesn't just contain characters, but instead it
contains a mapping of (characters <--> ordinals) ?

I mean what we do is to encode a character like chr(65).encode('utf-8')

What's the reason of existence of its corresponding ordinal value since
it doesn't get involved into the encoding process?

Thank you very much for taking the time to explain.

Joel Goldstick

unread,
Jun 14, 2013, 11:21:39 AM6/14/13
to Nick the Gr33k, pytho...@python.org
let's cut to the chase and start with telling us what you DO know Nick.  That would take less typing


Nick the Gr33k

unread,
Jun 14, 2013, 11:26:08 AM6/14/13
to
On 14/6/2013 6:21 μμ, Joel Goldstick wrote:
> let's cut to the chase and start with telling us what you DO know Nick.
> That would take less typing
Well, my biggest successes up until now where to build 3 websites
utilizing database saves and retrievals

in PHP
in Perl
and later in Python

with absolute ignorance of

Apache Configuration:
CGI:
Linux:

with just basic knowledge of linux.
I'am very proud of it.

Neil Cerutti

unread,
Jun 14, 2013, 11:54:06 AM6/14/13
to
On 2013-06-14, Antoon Pardon <antoon...@rece.vub.ac.be> wrote:
> Now there is nothing wrong in being ignorant. The question is
> how do you proceed from there. The answer is not by starting a
> project that is far above your ability and pestering the
> experts in the hope they will spoon feed you.

A major issue is this: the spoon-feeding he does receive is
unefficacious. Smart, well-meaning, articulate people's time is
getting squandered.

I read the responses. I've learned things from them. But Nikos
has not. And once a discussion devolves to reitteration even that
value is lost.

And perhaps worst of all, there's none of the closure or
vicarious catharsis that usually comes from a well-designed
educational transaction.

--
Neil Cerutti

Mark Lawrence

unread,
Jun 14, 2013, 12:12:00 PM6/14/13
to pytho...@python.org
On 14/06/2013 13:58, Heiko Wundram wrote:
> Am 14.06.2013 14:45, schrieb Nick the Gr33k:
>> we are all benefit out of this.
>
> Let's nominate you for a nobel prize, saviour of python-list!
>

The Nobel prize is unsuited in a situation like this, maybe the ACM
Turing Award?

Chris Angelico

unread,
Jun 14, 2013, 1:03:02 PM6/14/13
to pytho...@python.org
On Sat, Jun 15, 2013 at 1:26 AM, Nick the Gr33k <sup...@superhost.gr> wrote:
> Well, my biggest successes up until now where to build 3 websites utilizing
> database saves and retrievals
>
> in PHP
> in Perl
> and later in Python
>
> with absolute ignorance of
>
> Apache Configuration:
> CGI:
> Linux:
>
> with just basic knowledge of linux.
> I'am very proud of it.

Translation:

"I just built a car. I don't know anything about internal combustion
engines or road rules or metalwork, and I'm very proud of the
monstrosity that I'm now selling to my friends."

Would you buy a car built by someone who proudly announces that he has
no clue how to build one? Why do you sell web hosting services when
you have no clue how to provide them?

ChrisA

D'Arcy J.M. Cain

unread,
Jun 14, 2013, 1:13:23 PM6/14/13
to Heiko Wundram, pytho...@python.org
On Fri, 14 Jun 2013 11:06:55 +0200
Heiko Wundram <mode...@modelnine.org> wrote:
> Come on now, this is _so_ obviously trolling, it's not even remotely
> funny anymore. Why doesn't killfiling work with the mailing list
> version of the python list? :-(

A big problem, other than Mr. Support's shenanigans with his email
address, is that even those of us who seem to have successfully
*plonked* him get the responses to him. The biggest issue with a troll
isn't so much the annoying emails from him but the amplified slew of
responses. That's the point of a troll after all.

The answer is to always make sure that you include the previous poster
in the reply as a Cc or To. I filter out any email that has the string
"sup...@superhost.gr" in a header so I would also filter out the
replies if people would follow that simple rule.

I have suggested this before but the push back I get is that then
people would get two copies of the email, one to them and one to the
list. My answer is simple. Get a proper email system that filters out
duplicates. Is there an email client out there that does not have this
facility?

--
D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 788 2246 (DoD#0082) (eNTP) | what's for dinner.
IM: da...@Vex.Net, VOIP: sip:da...@Vex.Net

Chris Angelico

unread,
Jun 14, 2013, 1:31:12 PM6/14/13
to pytho...@python.org
On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain <da...@druid.net> wrote:
> The answer is to always make sure that you include the previous poster
> in the reply as a Cc or To. I filter out any email that has the string
> "sup...@superhost.gr" in a header so I would also filter out the
> replies if people would follow that simple rule.
>
> I have suggested this before but the push back I get is that then
> people would get two copies of the email, one to them and one to the
> list. My answer is simple. Get a proper email system that filters out
> duplicates. Is there an email client out there that does not have this
> facility?

The main downside to that is not the first response, to
somebody@somewhere and python-list, but the subsequent ones. Do you
include everyone's addresses? And if so, how do they then get off the
list? (This is a serious consideration. I had some very angry people
asking me to unsubscribe them from a (private) mailman list I run, but
they weren't subscribed at all - they were being cc'd.)

I prefer to simply mail the list. You should be able to mute entire
threads, and he doesn't start more than a couple a day usually.

ChrisA

D'Arcy J.M. Cain

unread,
Jun 14, 2013, 1:56:19 PM6/14/13
to Chris Angelico, pytho...@python.org
On Sat, 15 Jun 2013 03:31:12 +1000
Chris Angelico <ros...@gmail.com> wrote:
> On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain <da...@druid.net>
> wrote:
> > I have suggested this before but the push back I get is that then
> > people would get two copies of the email, one to them and one to the
> > list. My answer is simple. Get a proper email system that filters
> > out duplicates. Is there an email client out there that does not
> > have this facility?
>
> The main downside to that is not the first response, to
> somebody@somewhere and python-list, but the subsequent ones. Do you
> include everyone's addresses? And if so, how do they then get off the

No, I think Ccing the From is enough. Other than the OP who is already
*plonked* replies to the replies tend to have at least a modicum of
information.

> I prefer to simply mail the list. You should be able to mute entire
> threads, and he doesn't start more than a couple a day usually.

But then I have to deal with each thread. I don't want to deal with
them at all.

Tim Chase

unread,
Jun 14, 2013, 3:00:17 PM6/14/13
to D'Arcy J.M. Cain, pytho...@python.org
On 2013-06-14 13:56, D'Arcy J.M. Cain wrote:
> > I prefer to simply mail the list. You should be able to mute
> > entire threads, and he doesn't start more than a couple a day
> > usually.
>
> But then I have to deal with each thread. I don't want to deal with
> them at all.

At least Thunderbird had the ability to set up a filter of the form
"If the sender matches 'x...@example.com' then kill this thread" so
the thread-killing (or sub-thread killing) was automatic.

I set that up for Xah posts and my life was far better.

I've since switched to Claws for my mail and miss that kill-thread
functionality. :-/

-tkc



D'Arcy J.M. Cain

unread,
Jun 14, 2013, 3:17:10 PM6/14/13
to Tim Chase, pytho...@python.org
On Fri, 14 Jun 2013 14:00:17 -0500
Tim Chase <pytho...@tim.thechases.com> wrote:
> I set that up for Xah posts and my life was far better.

Has he disappeared or is my filtering just really successful?

> I've since switched to Claws for my mail and miss that kill-thread
> functionality. :-/

Heh. Exactly what I am using.

Grant Edwards

unread,
Jun 14, 2013, 3:40:10 PM6/14/13
to
On 2013-06-14, Chris Angelico <ros...@gmail.com> wrote:
> On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain <da...@druid.net> wrote:
>> The answer is to always make sure that you include the previous poster
>> in the reply as a Cc or To. I filter out any email that has the string
>> "sup...@superhost.gr" in a header so I would also filter out the
>> replies if people would follow that simple rule.
>>
>> I have suggested this before but the push back I get is that then
>> people would get two copies of the email, one to them and one to the
>> list. My answer is simple. Get a proper email system that filters out
>> duplicates. Is there an email client out there that does not have this
>> facility?
>
> The main downside to that is not the first response, to
> somebody@somewhere and python-list, but the subsequent ones. Do you
> include everyone's addresses? And if so, how do they then get off the
> list? (This is a serious consideration. I had some very angry people
> asking me to unsubscribe them from a (private) mailman list I run, but
> they weren't subscribed at all - they were being cc'd.)

I think the answer is to automatically kill all threads stared by
"him".

Unfortunately, I don't know if that's possible in most newsreaders.

--
Grant Edwards grant.b.edwards Yow! A dwarf is passing out
at somewhere in Detroit!
gmail.com

Guy Scree

unread,
Jun 14, 2013, 6:50:44 PM6/14/13
to
Your best bet would be to keep an eye on www.edx.org for the next
offerring of Introduction to Computer Science and Programming 6.00x
(probably starts in Sept). It's a no-cost way to take a Python course.

On Fri, 14 Jun 2013 12:32:56 +0300, Nick the Gr33k
<sup...@superhost.gr> wrote:

>On 14/6/2013 12:06 ??, Heiko Wundram wrote:
>> Am 14.06.2013 10:37, schrieb Nick the Gr33k:
>>> So everything we see like:
>>>
>>> 16474
>>> nikos
>>> abc123
>>>
>>> everything is a string and nothing is a number? not even number 1?
>>
>> Come on now, this is _so_ obviously trolling, it's not even remotely
>> funny anymore. Why doesn't killfiling work with the mailing list version
>> of the python list? :-(
>>
>

Walter Hurry

unread,
Jun 14, 2013, 7:32:58 PM6/14/13
to
On Sat, 15 Jun 2013 03:03:02 +1000, Chris Angelico wrote:

> Why do you sell web hosting services when you
> have no clue how to provide them?
>
And why do you continue responding to this timewaster? Please, please
just killfile him and let's all move on.

Cameron Simpson

unread,
Jun 14, 2013, 8:26:32 PM6/14/13
to Nick the Gr33k, pytho...@python.org
On 14Jun2013 16:58, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
| On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
| >Normally a character in a b'...' item represents the byte value
| >matching the character's Unicode ordinal value.
|
| The only thing that i didn't understood is this line.
| First please tell me what is a byte value

The numeric value stored in a byte. Bytes are just small integers
in the range 0..255; the values available with 8 bits of storage.

| >\x1b is a sequence you find inside strings (and "byte" strings, the
| >b'...' format).
|
| \x1b is a character(ESC) represented in hex format

Yes.

| b'\x1b' is a byte object that represents what?

An array of 1 byte, whose value is 0x1b or 27.

| >>> chr(27).encode('utf-8')
| b'\x1b'

Transcribing the ESC Unicode character to byte storage.

| >>> b'\x1b'.decode('utf-8')
| '\x1b'

Reading a single byte array containing a 27 and decoding it assuming 'utf-8'.
This obtains a single character string containing the ESC character.

| After decoding it gives the char ESC in hex format
| Shouldn't it result in value 27 which is the ordinal of ESC ?

When printing strings, the non-printable characters in the string are
_represented_ in hex format, so \x1b was printed.

| > No, I mean conceptually, there is no difference between a code-point
| > and its ordinal value. They are the same thing.
|
| Why Unicode charset doesn't just contain characters, but instead it
| contains a mapping of (characters <--> ordinals) ?

Look, as far as a computer is concerned a character and an ordinal
are the same thing because you just store character ordinals in
memory when you store a string.

When characters are _displayed_, your Terminal (or web browser or
whatever) takes character ordinals and looks them up in a _font_,
which is a mapping of character ordinals to glyphs (character
images), and renders the character image onto your screen.

| I mean what we do is to encode a character like chr(65).encode('utf-8')
| What's the reason of existence of its corresponding ordinal value
| since it doesn't get involved into the encoding process?

Stop thinking of Unicode code points and ordinal values as separate
things. They are effectively two terms for the same thing. So there
is no "corresponding ordinal value". 65 _is_ the ordinal value.

When you run:

chr(65).encode('utf-8')

you're going:

chr(65) ==> 'A'
Producing a string with just one character in it.
Internally, Python stores an array of character ordinals, thus: [65]

'A'.encode('utf-8')
Walk along all the ordinals in the string and transribe them as bytes.
For 65, the byte encoding in 'utf-8' is a single byte of value 65.
So you get an array of bytes (a "bytes object" in Python), thus: [65]

--
Cameron Simpson <c...@zip.com.au>

The double cam chain setup on the 1980's DOHC CB750 was another one of
Honda's pointless engineering breakthroughs. You know the cycle (if you'll
pardon the pun :-), Wonderful New Feature is introduced with much fanfare,
WNF is fawned over by the press, WNF is copied by the other three Japanese
makers (this step is sometimes optional), and finally, WNF is quietly dropped
by Honda.
- Blaine Gardner, <blga...@sim.es.com>

Cameron Simpson

unread,
Jun 14, 2013, 8:28:29 PM6/14/13
to Nick the Gr33k, pytho...@python.org
On 14Jun2013 15:59, Nikos as SuperHost Support <sup...@superhost.gr> wrote:
| So, a numeral = a string representation of a number. Is this correct?

No, a numeral is an individual digit from the string representation of a number.
So: 65 requires two numerals: '6' and '5'.
--
Cameron Simpson <c...@zip.com.au>

In life, you should always try to know your strong points, but this is
far less important than knowing your weak points.
Martin Fitzpatrick <mfitzp...@scot.bbc.co.uk>

Ben Finney

unread,
Jun 14, 2013, 8:42:20 PM6/14/13
to pytho...@python.org
"D'Arcy J.M. Cain" <da...@druid.net> writes:

> The answer is to always make sure that you include the previous poster
> in the reply as a Cc or To.

Dragging the discussion from one forum (comp.lang.python) to another
(every person's individual email) is obnoxious. Please don't.

> I have suggested this before but the push back I get is that then
> people would get two copies of the email, one to them and one to the
> list.

In my case, I don't want to receive the messages by email *at all*. I
participate in this forum using a non-email system, and it works fine so
long as people continue to participate in this forum.

Even for those who do participate by email, though, your approach is
broken:

> My answer is simple. Get a proper email system that filters out
> duplicates.

The message sent to the individual typically arrives earlier (since it
is sent straight from you to the individual), and the message on the
forum arrives later (since it typically requires more processing).

But since we're participating in the discussion on the forum and not in
individual email, it is the later one we want, and the earlier one
should be deleted.

So at the point the first message arrives, it isn't a duplicate. The
mail program will show it anyway, because “remove duplicates” can't
catch it when there's no duplicate yet.

The proper solution is for you not to send that one at all, and send
only the message to the forum.

You do this by using your mail client's “reply to list” function, which
uses the RFC 3696 information in every mailing list message.

Is there any mail client which doesn't have this function? If so, use
your vendor's bug reporting system to request this feature as standard,
and/or switch to a better mail client until they fix that.

--
\ “Timid men prefer the calm of despotism to the boisterous sea |
`\ of liberty.” —Thomas Jefferson |
_o__) |
Ben Finney

Cameron Simpson

unread,
Jun 14, 2013, 10:09:57 PM6/14/13
to pytho...@python.org
On 15Jun2013 10:42, Ben Finney <ben+p...@benfinney.id.au> wrote:
| "D'Arcy J.M. Cain" <da...@druid.net> writes:
| Even for those who do participate by email, though, your approach is
| broken:
| > My answer is simple. Get a proper email system that filters out
| > duplicates.
|
| The message sent to the individual typically arrives earlier (since it
| is sent straight from you to the individual), and the message on the
| forum arrives later (since it typically requires more processing).
|
| But since we're participating in the discussion on the forum and not in
| individual email, it is the later one we want, and the earlier one
| should be deleted.

They're the same message! (Delivered twice.) Replying to either is equivalent.
So broadly I don't care which gets deleted; it works regardless.

| So at the point the first message arrives, it isn't a duplicate. The
| mail program will show it anyway, because “remove duplicates” can't
| catch it when there's no duplicate yet.

But it can when the second one arrives. This is true regardless of
the delivery order.

The correct approach it to file the one message wherever it matches.
Message to me+list (I use the list, not the newsgroup) get filed
in my inbox and _also_ in my python folder. I do that based on the
to/cc headers for the most part, so either message's arrival files
the same way.

I delete dups on entry to a mail folder (automatically, of course)
instead of at mail filing time, but the effect is equivalent.

| The proper solution is for you not to send that one at all, and send
| only the message to the forum.

Bah. Plenty of us like both. In the inbox alerts me that someone
replied to _my_ post, and in the python mail gets it nicely threaded.

| You do this by using your mail client's “reply to list” function, which
| uses the RFC 3696 information in every mailing list message.

No need, but a valid option.

| Is there any mail client which doesn't have this function? If so, use
| your vendor's bug reporting system to request this feature as standard,
| and/or switch to a better mail client until they fix that.

Sorry, I could have sworn you said you weren't using a mail client for this...
--
Cameron Simpson <c...@zip.com.au>

You've read the book. You've seen the movie. Now eat the cast.
- Julian Macassey, describing "Watership Down"

Denis McMahon

unread,
Jun 15, 2013, 2:31:56 AM6/15/13
to
On Fri, 14 Jun 2013 12:32:56 +0300, Nick the Gr33k wrote:

> I'mm not trolling man, i just have hard time understanding why numbers
> acts as strings.

It depends on the context.

--
Denis McMahon, denismf...@gmail.com

Denis McMahon

unread,
Jun 15, 2013, 2:34:46 AM6/15/13
to
On Fri, 14 Jun 2013 16:58:20 +0300, Nick the Gr33k wrote:

> On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
>> Normally a character in a b'...' item represents the byte value
>> matching the character's Unicode ordinal value.

> The only thing that i didn't understood is this line.
> First please tell me what is a byte value

Seriously? You don't understand the term byte? And you're the support
desk for a webhosting company?

--
Denis McMahon, denismf...@gmail.com

Zero Piraeus

unread,
Jun 14, 2013, 9:33:06 AM6/14/13
to pytho...@python.org
:

On 14 June 2013 08:50, Nick the Gr33k <sup...@superhost.gr> wrote:
>
> So, if i had no interest of actually learning python i would just cut n'
> paste provided code without worrying what it actually does, since knowing
> that came form you would be enough to know that works.

Worrying what it actually does is good; an inquiring mind is a
prerequisite for becoming a good programmer.

Another prerequisite is discipline. That means the discipline to try
and work out for yourself what's going on, rather than repeatedly
spamming this list with trivial enquiries.

It also means the discipline to both read and type carefully: until
and unless you learn to take more care in how you express yourself,
both in code and in prose, you will be plagued by syntax errors and
frustrated responses respectively.

I have only skimmed it, but you might find the following tutorial helpful:

http://learnpythonthehardway.org/

Many of the early exercises may seem too basic, and you'll be tempted
to skip them - given your conduct here, I imagine you'll be *strongly*
tempted to skip them. Don't. You need to learn discipline.

-[]z.

Ian Kelly

unread,
Jun 14, 2013, 12:51:46 PM6/14/13
to Python
On Fri, Jun 14, 2013 at 6:09 AM, rusi <rusto...@gmail.com> wrote:
> Since identifying a disease by the right name is key to finding a
> cure:
> Nikos is not trolling or spamming; he is help-vampiring.

I think he's a very dedicated troll elaborately disguised as a help
vampire. Remember that one of the names he previously used to post to
this list was "Ferrous Cranus".

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm

Robert Kern

unread,
Jun 15, 2013, 5:30:59 AM6/15/13
to pytho...@python.org
On 2013-06-15 03:09, Cameron Simpson wrote:
> On 15Jun2013 10:42, Ben Finney <ben+p...@benfinney.id.au> wrote:
> | "D'Arcy J.M. Cain" <da...@druid.net> writes:
> | Even for those who do participate by email, though, your approach is
> | broken:
> | > My answer is simple. Get a proper email system that filters out
> | > duplicates.
> |
> | The message sent to the individual typically arrives earlier (since it
> | is sent straight from you to the individual), and the message on the
> | forum arrives later (since it typically requires more processing).
> |
> | But since we're participating in the discussion on the forum and not in
> | individual email, it is the later one we want, and the earlier one
> | should be deleted.
>
> They're the same message! (Delivered twice.) Replying to either is equivalent.
> So broadly I don't care which gets deleted; it works regardless.
>
> | So at the point the first message arrives, it isn't a duplicate. The
> | mail program will show it anyway, because “remove duplicates” can't
> | catch it when there's no duplicate yet.
>
> But it can when the second one arrives. This is true regardless of
> the delivery order.

Ben said that he doesn't use email for this list. Neither do I. We use one of
the newsgroup mirrors. If you Cc us, we will get a reply on the newsgroup (where
we want it) and a reply in our email (where we don't). The two systems cannot
talk to each other to delete the other message.

> | You do this by using your mail client's “reply to list” function, which
> | uses the RFC 3696 information in every mailing list message.
>
> No need, but a valid option.
>
> | Is there any mail client which doesn't have this function? If so, use
> | your vendor's bug reporting system to request this feature as standard,
> | and/or switch to a better mail client until they fix that.
>
> Sorry, I could have sworn you said you weren't using a mail client for this...

He's suggesting that *you* who are using a mail reader to use the "reply to
list" functionality or request it if it is not present.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Ben Finney

unread,
Jun 15, 2013, 7:29:35 AM6/15/13
to pytho...@python.org
Cameron Simpson <c...@zip.com.au> writes:

> On 15Jun2013 10:42, Ben Finney <ben+p...@benfinney.id.au> wrote:
> | The message sent to the individual typically arrives earlier (since
> | it is sent straight from you to the individual), and the message on
> | the forum arrives later (since it typically requires more
> | processing).
> |
> | But since we're participating in the discussion on the forum and not
> | in individual email, it is the later one we want, and the earlier
> | one should be deleted.
>
> They're the same message! (Delivered twice.) Replying to either is
> equivalent.

Wrong. They have the same Message-Id, but one of them is delivered via
the mailing list, and has the correct RFC 3696 fields in the header to
continue the discussion there.

The one delivered individually is the one to discard, since it was not
delivered via the mailing list.

> Bah. Plenty of us like both. In the inbox alerts me that someone
> replied to _my_ post, and in the python mail gets it nicely threaded.

Your mail client doesn't alert you to a message addressed to you?

> Sorry, I could have sworn you said you weren't using a mail client for
> this...

As I already said, this is demonstrating the fact that “reply to all” is
broken even for the use case of participating via email.

--
\ “Software patents provide one more means of controlling access |
`\ to information. They are the tool of choice for the internet |
_o__) highwayman.” —Anthony Taylor |
Ben Finney

D'Arcy J.M. Cain

unread,
Jun 15, 2013, 7:58:27 AM6/15/13
to pytho...@python.org
On Sat, 15 Jun 2013 21:29:35 +1000
Ben Finney <ben+p...@benfinney.id.au> wrote:
> > Bah. Plenty of us like both. In the inbox alerts me that someone
> > replied to _my_ post, and in the python mail gets it nicely
> > threaded.
>
> Your mail client doesn't alert you to a message addressed to you?

Every message in my mailbox is addressed to me otherwise I wouldn't
get it. Do you mean the To: line? Which address? I have about a
dozen addresses not counting the plus sign addresses like the one you
use for this list. Which one should I treat as special?

> > Sorry, I could have sworn you said you weren't using a mail client
> > for this...
>
> As I already said, this is demonstrating the fact that “reply to all”
> is broken even for the use case of participating via email.

As the person who proposed this I would like to point out that I never
suggested "reply to all”. I suggested including the poster that you
are replying to.

Grant Edwards

unread,
Jun 15, 2013, 10:44:55 AM6/15/13
to
On 2013-06-15, Denis McMahon <denismf...@gmail.com> wrote:
> On Fri, 14 Jun 2013 16:58:20 +0300, Nick the Gr33k wrote:
>
>> On 14/6/2013 1:14 ????, Cameron Simpson wrote:
>>> Normally a character in a b'...' item represents the byte value
>>> matching the character's Unicode ordinal value.
>
>> The only thing that i didn't understood is this line.
>> First please tell me what is a byte value
>
> Seriously? You don't understand the term byte? And you're the support
> desk for a webhosting company?

Well, we haven't had this thread for a week or so...

There is some ambiguity in the term "byte". It used to mean the
smallest addressable unit of memory (which varied in the past -- at
one point, both 20 and 60 bit "bytes" were common). These days the
smallest addressable unit of memory is almost always 8 bits on desktop
and embedded processors (but often not on DSPs). That's why when IEEE
stadards want to refer to an 8-bit chunk of data they use the term
"octet".

:)


Nick the Gr33k

unread,
Jun 15, 2013, 10:49:13 AM6/15/13
to
On 15/6/2013 5:44 μμ, Grant Edwards wrote:

> There is some ambiguity in the term "byte". It used to mean the
> smallest addressable unit of memory (which varied in the past -- at
> one point, both 20 and 60 bit "bytes" were common). These days the
> smallest addressable unit of memory is almost always 8 bits on desktop
> and embedded processors (but often not on DSPs). That's why when IEEE
> stadards want to refer to an 8-bit chunk of data they use the term
> "octet".

What the difference between a byte and a byte's value?

Roy Smith

unread,
Jun 15, 2013, 10:59:32 AM6/15/13
to
In article <kphul7$74q$1...@reader1.panix.com>,
Grant Edwards <inv...@invalid.invalid> wrote:

> There is some ambiguity in the term "byte". It used to mean the
> smallest addressable unit of memory (which varied in the past -- at
> one point, both 20 and 60 bit "bytes" were common).

I would have defined it more like, "some arbitrary collection of
adjacent bits which hold some useful value". Doesn't need to be
addressable, nor does it need to be the smallest such thing.

For example, on the pdp-10 (36 bit word), it was common to treat a word
as either four 9-bit bytes, or five 7-bit bytes (with one bit left
over), depending on what you were doing. And, of course, a nybble was
something smaller than a byte!

And, yes, especially in networking, everybody talks about octets when
they want to make sure people understand what they mean.

Nick the Gr33k

unread,
Jun 15, 2013, 11:14:02 AM6/15/13
to
On 15/6/2013 5:59 μμ, Roy Smith wrote:

> And, yes, especially in networking, everybody talks about octets when
> they want to make sure people understand what they mean.

1 byte = 8 bits

in networking though since we do not use encoding schemes with variable
lengths like utf-8 is, how do we separate when a byte value start and
when it stops?

do we need a start bit and a stop bit for that?

Steven D'Aprano

unread,
Jun 15, 2013, 11:30:18 AM6/15/13
to
On Sat, 15 Jun 2013 17:49:13 +0300, Nick the Gr33k wrote:

> What the difference between a byte and a byte's value?

Nothing.


--
Steven

Joel Goldstick

unread,
Jun 15, 2013, 11:35:17 AM6/15/13
to Nick the Gr33k, pytho...@python.org
On Sat, Jun 15, 2013 at 11:14 AM, Nick the Gr33k <sup...@superhost.gr> wrote:
On 15/6/2013 5:59 μμ, Roy Smith wrote:

And, yes, especially in networking, everybody talks about octets when
they want to make sure people understand what they mean.

1 byte = 8 bits

in networking though since we do not use encoding schemes with variable lengths like utf-8 is, how do we separate when a byte value start and when it stops?

do we need a start bit and a stop bit for that?

 
And this is specific to python how?
 


--
What is now proved was at first only imagined!

Steven D'Aprano

unread,
Jun 15, 2013, 11:40:35 AM6/15/13
to
On Sat, 15 Jun 2013 07:58:27 -0400, D'Arcy J.M. Cain wrote:

> I suggested including the poster that you are replying to.

In the name of all that's good and decent in the world, why on earth
would you do that when replying to a mailing list??? They're already
getting a reply. Sending them TWO identical replies is just rude.


--
Steven

Chris “Kwpolska” Warrick

unread,
Jun 15, 2013, 12:41:41 PM6/15/13
to Steven D'Aprano, pytho...@python.org
Mailman is intelligent enough not to send a second copy in that case.
This message was sent with a CC, and you got only one copy.

--
Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16
stop html mail | always bottom-post
http://asciiribbon.org | http://caliburn.nl/topposting.html
Message has been deleted

D'Arcy J.M. Cain

unread,
Jun 15, 2013, 1:07:29 PM6/15/13
to Chris “Kwpolska” Warrick, pytho...@python.org
On Sat, 15 Jun 2013 18:41:41 +0200
Chris “Kwpolska” Warrick <kwpo...@gmail.com> wrote:
> On Sat, Jun 15, 2013 at 5:40 PM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
> > In the name of all that's good and decent in the world, why on earth
> > would you do that when replying to a mailing list??? They're already
> > getting a reply. Sending them TWO identical replies is just rude.
>
> Mailman is intelligent enough not to send a second copy in that case.
> This message was sent with a CC, and you got only one copy.

Actually, no. Mailman is not your MTA. It only gets the email sent to
the mailing list. Your MTA sends the other one directly so Steve is
correct. He gets two copies. If his client doesn't suppress the
duplicate then he will be presented with both.

Steven D'Aprano

unread,
Jun 15, 2013, 1:12:50 PM6/15/13
to
On Sat, 15 Jun 2013 18:41:41 +0200, Chris “Kwpolska” Warrick wrote:

> On Sat, Jun 15, 2013 at 5:40 PM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>> On Sat, 15 Jun 2013 07:58:27 -0400, D'Arcy J.M. Cain wrote:
>>
>>> I suggested including the poster that you are replying to.
>>
>> In the name of all that's good and decent in the world, why on earth
>> would you do that when replying to a mailing list??? They're already
>> getting a reply. Sending them TWO identical replies is just rude.
>
> Mailman is intelligent enough not to send a second copy in that case.
> This message was sent with a CC, and you got only one copy.

Wrong. I got two copies. One via comp.lang.python, and one direct to me.


--
Steven

W. Trevor King

unread,
Jun 15, 2013, 1:25:28 PM6/15/13
to D'Arcy J.M. Cain, pytho...@python.org
On Sat, Jun 15, 2013 at 01:07:29PM -0400, D'Arcy J.M. Cain wrote:
> On Sat, 15 Jun 2013 18:41:41 +0200 Chris “Kwpolska” Warrick wrote:
> > On Sat, Jun 15, 2013 at 5:40 PM, Steven D'Aprano wrote:
> > > In the name of all that's good and decent in the world, why on earth
> > > would you do that when replying to a mailing list??? They're already
> > > getting a reply. Sending them TWO identical replies is just rude.
> >
> > Mailman is intelligent enough not to send a second copy in that case.
> > This message was sent with a CC, and you got only one copy.
>
> Actually, no. Mailman is not your MTA. It only gets the email sent to
> the mailing list. Your MTA sends the other one directly so Steve is
> correct. He gets two copies. If his client doesn't suppress the
> duplicate then he will be presented with both.

Mailman can (optionally) assume that addresses listed in To, CC, …
fields received an out-of-band copies, and not mail them an
additional copy [1].

Cheers,
Trevor

[1]: http://www.gnu.org/software/mailman/mailman-member/node21.html

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
signature.asc

Chris “Kwpolska” Warrick

unread,
Jun 15, 2013, 1:25:21 PM6/15/13
to pytho...@python.org
On Sat, Jun 15, 2013 at 7:07 PM, D'Arcy J.M. Cain <da...@druid.net> wrote:
> On Sat, 15 Jun 2013 18:41:41 +0200
> Chris “Kwpolska” Warrick <kwpo...@gmail.com> wrote:
>> On Sat, Jun 15, 2013 at 5:40 PM, Steven D'Aprano
>> <steve+comp....@pearwood.info> wrote:
>> > In the name of all that's good and decent in the world, why on earth
>> > would you do that when replying to a mailing list??? They're already
>> > getting a reply. Sending them TWO identical replies is just rude.
>>
>> Mailman is intelligent enough not to send a second copy in that case.
>> This message was sent with a CC, and you got only one copy.
>
> Actually, no. Mailman is not your MTA. It only gets the email sent to
> the mailing list. Your MTA sends the other one directly so Steve is
> correct. He gets two copies. If his client doesn't suppress the
> duplicate then he will be presented with both.

The source code seems to think otherwise:

http://bazaar.launchpad.net/~mailman-coders/mailman/3.0/view/head:/src/mailman/handlers/avoid_duplicates.py

On Sat, Jun 15, 2013 at 7:12 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> Wrong. I got two copies. One via comp.lang.python, and one direct to me.

You are subscribed through Usenet and not
<http://mail.python.org/mailman/listinfo/python-list>, in which case
the above doesn’t apply, because Mailman throws the mail to Usenet and
not you personally.

Nick the Gr33k

unread,
Jun 15, 2013, 1:30:29 PM6/15/13
to
You are spamming my thread.

rusi

unread,
Jun 15, 2013, 1:36:00 PM6/15/13
to
On Jun 15, 10:30 pm, Nick the Gr33k <supp...@superhost.gr> wrote:
>
> You are spamming my thread.

With you as our spamming-guru, Onward! Sky is the limit!

Steven D'Aprano

unread,
Jun 15, 2013, 1:47:14 PM6/15/13
to
On Sat, 15 Jun 2013 19:25:21 +0200, Chris “Kwpolska” Warrick wrote:

> On Sat, Jun 15, 2013 at 7:07 PM, D'Arcy J.M. Cain <da...@druid.net>
> wrote:
>> On Sat, 15 Jun 2013 18:41:41 +0200
>> Chris “Kwpolska” Warrick <kwpo...@gmail.com> wrote:
>>> On Sat, Jun 15, 2013 at 5:40 PM, Steven D'Aprano
>>> <steve+comp....@pearwood.info> wrote:
>>> > In the name of all that's good and decent in the world, why on earth
>>> > would you do that when replying to a mailing list??? They're already
>>> > getting a reply. Sending them TWO identical replies is just rude.
>>>
>>> Mailman is intelligent enough not to send a second copy in that case.
>>> This message was sent with a CC, and you got only one copy.
>>
>> Actually, no. Mailman is not your MTA. It only gets the email sent to
>> the mailing list. Your MTA sends the other one directly so Steve is
>> correct. He gets two copies. If his client doesn't suppress the
>> duplicate then he will be presented with both.
>
> The source code seems to think otherwise:

Mailman is not the only mailing list software in the world, and the
feature you are referring to is optional.


> http://bazaar.launchpad.net/~mailman-coders/mailman/3.0/view/head:/src/
mailman/handlers/avoid_duplicates.py
>
> On Sat, Jun 15, 2013 at 7:12 PM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>> Wrong. I got two copies. One via comp.lang.python, and one direct to
>> me.
>
> You are subscribed through Usenet and not
> <http://mail.python.org/mailman/listinfo/python-list>, in which case the
> above doesn’t apply, because Mailman throws the mail to Usenet and not
> you personally.

I still get two copies if you CC me. That's still unnecessary and rude.
If I wanted a copy emailed to me, I'd subscribe via email rather than via
news. Whether you agree or not, I'd appreciate if you respect my wishes
rather than try to wiggle out of it on a technicality.



--
Steven

Michael Torrie

unread,
Jun 15, 2013, 1:49:41 PM6/15/13
to pytho...@python.org
On 06/15/2013 11:30 AM, Nick the Gr33k wrote:
> You are spamming my thread.

No he's not. The subject is changed on this branch of the thread, so
it's easy to see in any good e-mail reader that this sub-thread or
branch is diverting. This is proper list etiquette.


It is loading more messages.
0 new messages