Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

0 views
Skip to first unread message

Richard Lynch

unread,
Jul 9, 2007, 3:29:37 AM7/9/07
to
On Fri, July 6, 2007 1:23 am, Stanislav Malyshev wrote:
>> You mean this will break:
>>
>> <?php
>> $mask = 0xf0;
>> $value = $_POST['foo'] & $mask;
>> ?>
>>
>> because of Unicode?
>
> I'd say it won't do what it did before. Though I'm not sure bit
> operations on unicode make any sense at all... The problem here is the
> requirement conflict - how PHP can possibly know if $_POST['foo'] is a
> bit field or unicode string?

I'm starting to be quite concerned about PHP 6 Unicode, then...

Maybe strings should be UTF-8 until declared otherwise or something,
because this just won't fly...

As for how it knows?

I dunno. Aren't there headers to indicate what kind of data is coming
in?

Should there be?

If there aren't, or can't be, then you have to let ME tell you what it
is.

You can't just go assuming I've got UTF-16 data coming in --
especially not when the entire Internet has been built and subsisted
on ASCII (more or less) for over a decade.

>> But if I haven't done something new-fangled to make a string be some
>> new-fangled Unicode thingie, then it's just plain old ASCII, no?
>>
>> Or PHP can just assume that anyway...
>
> It can't if we want to keep UTF-16. UTF-16 unlike UTF-8 is not
> compatible with ascii. We could have some "smart downgrade" attempt -
> Python 2 currently does something like this - but it won't work in all
> situations.

This is nuts.

Anybody who actually NEEDS Unicode ought to be the ones who have to
type a new keyword or something, not the bazillion users who have no
need for Unicode and likely never will...

>> But an old script ought to just work...
>
> Sometimes it's not possible - if you use the same variable as string
> and
> bitfield, and bit representation of the string changes, it can't just
> work anymore, something needs to be done to bring them together.

It's just an ASCII string, same as it's always been.

Don't go changing that out from under users for the zillion lines of
code already written.

If you need some new-fangled UTF-16 datatype stringie, then go ahead
and give yourself one.

But don't change all MY data to UTF-16 when it isn't UTF-16!!!

You've got 10 YEARS of legacy data built up being managed by billions
of scripts.

In what sane world do you suddenly declare all that data isn't ASCII
any more and claim that it's UTF-16 when UTF-16 isn't backwards
compatible with ASCII?

>>> Unicode code points can be defined with \u, but PHP6 breaks
>>> existing
>>> octal
>>> and hex escape sequences.
>
> I don't understand what this means...

I think I know...

I have code like this, somewhere:

if (preg_match("|[\xF0-\xFF]|", $data)){
$data = un_microsuck($data);
}

un_microsuck() basically detects and converts any of the goof-ball
extended ASCII from MS products (Word, Outlook, etc) to an HTML
equivalent character.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

Or maybe \xF0 will "work" but the octal \360 won't?

Yikes.

You think PHP 5 adoption rate was slow?

PHP 6 will be GLACIAL if you're changing that much out from under people.

Changing the definition of a string, arguably the most basic data type
in PHP, is not a Good Idea.

I'm sorry not to have spoken up earlier -- I simply failed to
understand what it was anybody was talking about before. :-(

Cripes, now I have to be the curmudgeon who won't let go of PHP 5. :-(

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Richard Lynch

unread,
Jul 9, 2007, 3:48:15 AM7/9/07
to
On Fri, July 6, 2007 11:48 am, Antony Dovgal wrote:
> On 06.07.2007 20:44, Stanislav Malyshev wrote:
>>> You don't by a Porsche if you need a taxi, why would you install
>>> PHP6 if
>>> you don't need Unicode?
>>
>> Namespaces ;)
>
> This reason is only valid if we don't backport such things from PHP6
> to PHP5
> (5.3, 5.5 or whatever it would be), which I think we should do.

Then PHP 6 is going to be a very weird beast...

It adds only the Unicode feature that a tiny niche market needs,
because everything else will be back-ported to PHP 5.

So the only adopters of PHP 6 will be:
Users who need Unicode
Users who just cannot wait for that new feature to get back-ported
Masochists :-)

For how long will this back-port to PHP 5 policy be in effect?

The whole lifetime of PHP 6?

Until PHP 7 is "stable"?

Until the Unicode advocates get tired of back-porting? :-v

Will users then jump from PHP 5 to 7?

Stanislav Malyshev

unread,
Jul 9, 2007, 4:08:58 AM7/9/07
to
> Maybe strings should be UTF-8 until declared otherwise or something,
> because this just won't fly...

UTF8 would not help you with bits (since nobody guarantees you incoming
data is valid UTF-8) and it's impossible to do any unicode stuff on
utf-8 - you'd have to convert it to utf-16 and back on every step.

> I dunno. Aren't there headers to indicate what kind of data is coming
> in?

I know of no headers that can tell you "parameter 'foo' in a form is a
bitmask so please do not try to see it as text".

> If there aren't, or can't be, then you have to let ME tell you what it
> is.

You can. Use binary strings and explicit conversions.

> You can't just go assuming I've got UTF-16 data coming in --
> especially not when the entire Internet has been built and subsisted
> on ASCII (more or less) for over a decade.

Actually, there's INI parameter that says which encoding the incoming
data is in. The problem is not that - the problem is that PHP can't know
that you pass bit fields inside textual information (and in HTTP all
parameters are textual) so you have to work with it manually.

> Anybody who actually NEEDS Unicode ought to be the ones who have to
> type a new keyword or something, not the bazillion users who have no
> need for Unicode and likely never will...

If they have no need for unicode, why run unicode-enabled PHP? Turn it
off and get all your strings untouched.

> It's just an ASCII string, same as it's always been.

IS_STRING

> If you need some new-fangled UTF-16 datatype stringie, then go ahead
> and give yourself one.

IS_UNICODE

> But don't change all MY data to UTF-16 when it isn't UTF-16!!!

Then you can't use unicode mode. Because in Unicode mode the text string
is UTF-16. If it's not a text string, you should tell so, PHP doesn't
have any way to know.

> In what sane world do you suddenly declare all that data isn't ASCII
> any more and claim that it's UTF-16 when UTF-16 isn't backwards
> compatible with ASCII?

Python tried that. They are moving to model PHP 6 uses in Python 3. Must
be not that silly an idea, I guess.

> But now \xF0 isn't going to be ASCII 128 anymore, is it?

ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't
matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of
course. Now how preg_match should handle it depends on preg_match.
--
Stanislav Malyshev, Zend Software Architect
st...@zend.com http://www.zend.com/
(408)253-8829 MSN: st...@zend.com

Tomas Kuliavas

unread,
Jul 9, 2007, 4:11:58 AM7/9/07
to
>>>> Unicode code points can be defined with \u, but PHP6 breaks
>>>> existing octal and hex escape sequences.
>>
>> I don't understand what this means...
>
> I think I know...
>
> I have code like this, somewhere:
>
> if (preg_match("|[\xF0-\xFF]|", $data)){
> $data = un_microsuck($data);
> }
>
> un_microsuck() basically detects and converts any of the goof-ball
> extended ASCII from MS products (Word, Outlook, etc) to an HTML
> equivalent character.
>
> But now \xF0 isn't going to be ASCII 128 anymore, is it?

\xF0 never was ASCII. ASCII (ISO-646) is 7bit character set. \xF0 is
decimal 240. It is 8bit.

> Or maybe \xF0 will "work" but the octal \360 won't?

Are you sure that you can't do that by setting unicode.something_encoding
to iso-8859-1 or windows-1252?

--
Tomas

Alexey Zakhlestin

unread,
Jul 9, 2007, 4:17:57 AM7/9/07
to
T24gNy85LzA3LCBSaWNoYXJkIEx5bmNoIDxjZW9AbC1pLWUuY29tPiB3cm90ZToKPgo+IEFueWJv
ZHkgd2hvIGFjdHVhbGx5IE5FRURTIFVuaWNvZGUgb3VnaHQgdG8gYmUgdGhlIG9uZXMgd2hvIGhh
dmUgdG8KPiB0eXBlIGEgbmV3IGtleXdvcmQgb3Igc29tZXRoaW5nLCBub3QgdGhlIGJhemlsbGlv
biB1c2VycyB3aG8gaGF2ZSBubwo+IG5lZWQgZm9yIFVuaWNvZGUgYW5kIGxpa2VseSBuZXZlciB3
aWxsLi4uCgpJIHdvbmRlciB3aG9tIGRvIHlvdSBtZWFuIGhlcmUuCkkgY2FuJ3QgcmVtZW1iZXIg
bWFueSBub24tdW5pY29kZSBpbnRlcm5ldC1zaXRlcyBidWlsdCBkdXJpbmcgdGhlIGxhc3QgNSB5
ZWFycy4KCkdlcm1hbiwgU3BhbmlzaCwgSmFwYW5lc2UsIFJ1c3NpYW7igKYKSW50ZXJuZXQtc2hv
cHMgaGF2ZSB0aXRsZXMgaW4gdGhlc2UgbGFuZ3VhZ2VzLCBjb21tdW5pdGllcyBoYXZlIHVzZXJz
CndpdGggbmlja25hbWVzIChhdCBsZWFzdCkgaW4gdGhlc2UgbGFuZ3VhZ2VzLCBjb21wYW55LXNp
dGVzIGFyZQptdWx0aWxpZ3VhbCB0aGVzZSBkYXlzLCBldGMuCgpBU0NJSSBpcyBwcm9iYWJseSBv
ayBvbmx5IGZvciBhZHVsdC1zaXRlcyAod2hlcmUgcGVvcGxlIGRvIG5vdCBjYXJlCmFib3V0IHRl
eHRzKSBhbmQgc29tZSBpbnRyYW5ldC1zaXRlcy4KCi0tIApBbGV4ZXkgWmFraGxlc3RpbgpodHRw
Oi8vYmxvZy5taWxrZmFybXNvZnQuY29tLwo=

Andrei Zmievski

unread,
Jul 9, 2007, 2:44:44 PM7/9/07
to
Once again, you're trying to work with bytes inside Unicode strings, =20
which just does not make sense. What do you propose we do, somehow =20
automatically detect that you used \x inside a Unicode string and =20
turn it into a binary one? Or simply allow one to stick any byte =20
sequence inside what is supposed to be a valid UTF-16 string?

If you're trying to generate a UTF-8 string on a byte by byte basis, =20
then it needs to be a binary string, I'm sorry. Whether you do this =20
via being in unicode.semantics=3Doff mode or via using b"" prefix is up =20=

to you.

-Andrei

> unicode.fallback_encoding =3D> 'utf-8' =3D> 'utf-8'
> unicode.filesystem_encoding =3D> no value =3D> no value
> unicode.http_input_encoding =3D> 'utf-8' =3D> 'utf-8'
> unicode.output_encoding =3D> 'utf-8' =3D> 'utf-8'
> unicode.runtime_encoding =3D> 'utf-8' =3D> 'utf-8'
> unicode.script_encoding =3D> 'utf-8' =3D> 'utf-8'
> unicode.semantics =3D> On =3D> On
> unicode.stream_encoding =3D> UTF-8 =3D> UTF-8
>
> --- test.php ---
> <?php
> $string1 =3D "=C4=85";
> $string2 =3D "\xC4\x85";
> var_dump($string1 =3D=3D $string2)
> var_dump(preg_match("/[\240-\377]/",$string1));
> var_dump(preg_match("/[\240-\377]/",$string2));
> ?>
> ---
>
> =C4=85 is in utf-8 (latin small letter a with ogonek, latin extended-a =
=20
> range).
> It contains two bytes with 0xC4 0x85 values.
>
> Expected result and actual result for php 5.2.0:
> ---
> bool(true)
> int(1)
> int(1)
> ---
> "/[\240-\377]/" range should match 0xC4 byte.
>
> Actual result (PHP6):
> ---
> bool(false)
> int(0)
> int(1)
> ---
>
> --=20

Andrei Zmievski

unread,
Jul 9, 2007, 5:41:34 PM7/9/07
to
Yes, backporting major features from PHP 6 to 5 will slow down PHP 6 =20
adoption, and I'd like to avoid it if possible.

There is a way to run two engines side by side, by the way: in =20
separate instances of Apache. It's really not that complicated.

-Andrei


On Jul 6, 2007, at 6:13 AM, Stefan Priebsch wrote:

> IMHO backporting a lot of features to PHP4 is a major reasons for the
> slow PHP5 adoption. Basically, it seems that everybody who is not =20
> using
> OOP feels that PHP4 is fine for them.
>
> I'd say committing to backporting stuff from PHP6 to PHP5 will yield a
> similar situation: very slow or no PHP6 adoption.
>
> BTW, can't the unicode switch be done at compile time? So one can
> compile PHP6 Unicode and PHP6 non-Unicode. Then if there is a =20
> clever way
> of running both engines in parallel, there should be no performance
> impact inside the non-unicode engine. Since there is both versions of
> the engine (that can maybe even selected by a certain statement in the
> main PHP file of the application), unicode and non-unicode users are
> happy. And there is only one version of PHP in the market, to =20
> conquer it
> all.
>
> There must be a reason to upgrade to a new PHP version (usually
> features, maybe performance increase etc.). But there also must be no
> reason not to upgrade. But you all know this, it has been said before.
>
> Kind regards,
>
> Stefan
>
> --=20
>> e-novative> - We make IT work for you.
>
> e-novative GmbH - HR: Amtsgericht M=FCnchen HRB 139407
> Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch
>
> http://www.e-novative.de

Andrei Zmievski

unread,
Jul 9, 2007, 5:42:50 PM7/9/07
to
As we see now, yes they will be in PHP 6.

-Andrei


On Jul 6, 2007, at 7:28 AM, Stefan Priebsch wrote:

> Pierre schrieb:
>> Namespace is one _very_ important reason. If we need a "marketing"
>
> I agree. But AFAIK namespaces were not supposed to be in PHP6, at =20
> least
> not in PHP 6.0. Is there an official position on wether namespaces =20
> will
> be in PHP 6.0?

Andrei Zmievski

unread,
Jul 9, 2007, 5:53:29 PM7/9/07
to
And I think that we shouldn't, since it removes a big incentive for
people to move to PHP 6.

Really, we need to get folks to use Unicode natively as much as
possible. It is the way of the future, and not some "obscure
feature", as some here have suggested. This kind of attitude is
precisely why we've had and continue to have such an
internationalization mess when it comes to building applications.

-Andrei


On Jul 6, 2007, at 9:48 AM, Antony Dovgal wrote:

> On 06.07.2007 20:44, Stanislav Malyshev wrote:
>>> You don't by a Porsche if you need a taxi, why would you install
>>> PHP6 if you don't need Unicode?
>> Namespaces ;)
> This reason is only valid if we don't backport such things from
> PHP6 to PHP5 (5.3, 5.5 or whatever it would be), which I think we
> should do.
>

> --
> Wbr, Antony Dovgal

Antony Dovgal

unread,
Jul 9, 2007, 6:12:15 PM7/9/07
to
On 10.07.2007 01:48, Andrei Zmievski wrote:
> And I think that we shouldn't, since it removes a big incentive for
> people to move to PHP 6.

I don't really see much sense in forcing people to use PHP6 if we accept the "PHP5 = PHP6 - Unicode" formula.
They are just different things, period.

> Really, we need to get folks to use Unicode natively as much as possible.

Andrei, I personally don't need Unicode at all.
I know, that may sound weird, but that's true.

> This kind of attitude is
> precisely why we've had and continue to have such an
> internationalization mess when it comes to building applications.

What attitude are you talking about here?

I'm trying to be honest with myself in the first place.
Do _I_ like that horrible IS_STRING/IS_UNICODE mess we have atm? No.
Do _I_ want to maintain this mess in the future just because of some bad design decision in the past? Noway, we had enough of that already.

I would love to have clean and easy PHP6 without all the "compatibility", which creates gazillion problems to both users and developers.
Please notice that I didn't call Unicode useless crap or whatever others may think about it,
I just want PHP6 to be Unicode-only release because it would make my personal life much easier
without complicating others' lives.

Stefan Priebsch

unread,
Jul 9, 2007, 6:24:33 PM7/9/07
to
Andrei Zmievski schrieb:

> As we see now, yes they will be in PHP 6.

:-))


--

>e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407


Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

--

Christopher Jones

unread,
Jul 9, 2007, 6:33:00 PM7/9/07
to

I also think we shouldn't backport features to PHP5. We should

(i) keep PHP5 a stable release with a known feature set for developers
to use.

(ii) have a smaller code base to maintain in PHP5, reducing the
overhead of merging.

(iii) avoid exacerbating the future situation with uptake of PHP6 vs
PHP5 that we now face with PHP5 vs PHP4.

Chris

Andrei Zmievski wrote:
> And I think that we shouldn't, since it removes a big incentive for
> people to move to PHP 6.
>

> Really, we need to get folks to use Unicode natively as much as

> possible. It is the way of the future, and not some "obscure feature",

> as some here have suggested. This kind of attitude is precisely why


> we've had and continue to have such an internationalization mess when it
> comes to building applications.
>

> -Andrei
>
>
> On Jul 6, 2007, at 9:48 AM, Antony Dovgal wrote:
>
>> On 06.07.2007 20:44, Stanislav Malyshev wrote:
>>>> You don't by a Porsche if you need a taxi, why would you install
>>>> PHP6 if you don't need Unicode?
>>> Namespaces ;)
>> This reason is only valid if we don't backport such things from PHP6
>> to PHP5 (5.3, 5.5 or whatever it would be), which I think we should do.
>>
>> --Wbr, Antony Dovgal

--
Christopher Jones, Oracle
Email: christop...@oracle.com Tel: +1 650 506 8630
Blog: http://blogs.oracle.com/opal/ PHP Book: http://tinyurl.com/f8jad

Nicolas Bérard-Nault

unread,
Jul 9, 2007, 6:57:13 PM7/9/07
to
------=_Part_177062_23373493.1184020025387
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Permit me to give my 2 cents on that and share my small bit of experience
with PHP 6.

First of all, I totally agree with you Antony. I'm currently working on
deploying a big codebase in PHP 6 (for those of you who didn't know, I'm th=
e
GSoC student working on refactoring Jaws for PHP 6) and my head started to
ache when I began understanding all the complications of the unicode
implementation as it is right now. Basically, having that
unicode.semanticsPHP_INI switch just totally kills the fun because I
have to have a working
application if it is ON or OFF. Long story short, this forces me to
explicitly define each string as either binary or unicode, which doesn't
make any "PHP sense". It's actually the first time I'm forced to explicitly
specify a variable type in PHP and I'm not sure I'm the only one who's not
happy about this. I like the unicode support and really appreciate all the
work that's been done on it but I absolutely think it should be implemented
without that headache/pain in the ass switch that'll make transition even
tougher for everyone.

In that case, I can say simplicity is certainly not dumb.

On 7/9/07, Antony Dovgal <ant...@zend.com> wrote:


>
> On 10.07.2007 01:48, Andrei Zmievski wrote:
> > And I think that we shouldn't, since it removes a big incentive for
> > people to move to PHP 6.
>

> I don't really see much sense in forcing people to use PHP6 if we accept

> the "PHP5 =3D PHP6 - Unicode" formula.


> They are just different things, period.
>

> > Really, we need to get folks to use Unicode natively as much as
> possible.
>

> Andrei, I personally don't need Unicode at all.
> I know, that may sound weird, but that's true.
>

> > This kind of attitude is
> > precisely why we've had and continue to have such an
> > internationalization mess when it comes to building applications.
>

> What attitude are you talking about here?
>
> I'm trying to be honest with myself in the first place.
> Do _I_ like that horrible IS_STRING/IS_UNICODE mess we have atm? No.
> Do _I_ want to maintain this mess in the future just because of some bad
> design decision in the past? Noway, we had enough of that already.
>
> I would love to have clean and easy PHP6 without all the "compatibility",
> which creates gazillion problems to both users and developers.
> Please notice that I didn't call Unicode useless crap or whatever others
> may think about it,
> I just want PHP6 to be Unicode-only release because it would make my
> personal life much easier
> without complicating others' lives.
>
> --
> Wbr,
> Antony Dovgal
>

> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


--=20
Nicolas B=E9rard-Nault (nic...@gmail.com)
=C9tudiant D.E.C. Sciences, Lettres & Arts
C=E9gep de Sherbrooke

Homepage: http://nicobn.googlepages.com

------=_Part_177062_23373493.1184020025387--

Stanislav Malyshev

unread,
Jul 9, 2007, 7:02:06 PM7/9/07
to
> Do _I_ like that horrible IS_STRING/IS_UNICODE mess we have atm? No.

I don't think there's any way of having both unstructured character data
and Unicode text represented without having two distinct types. Either
that or you'd have to tell on each step which one it is, and that would
suck much more.

> I would love to have clean and easy PHP6 without all the
> "compatibility", which creates gazillion problems to both users and
> developers.

Fixing unicode=on does not remove the IS_STRING/IS_UNICODE duality. We
still have two kinds of data - unstructured bit stream and structured
text. If we want strlen("превед") to return 6 - since that Russian word
has 6 characters - then we have no way but recognize that it's not just
a collection of bits but Unicode text, and that would require separate
type, as I see it. And as I see it, this is the source of the problems
when people try to operate on text as on bit stream and vice versa.

Unless I totally missed what mess you are referring to...


--
Stanislav Malyshev, Zend Software Architect
st...@zend.com http://www.zend.com/
(408)253-8829 MSN: st...@zend.com

--

Jani Taskinen

unread,
Jul 9, 2007, 7:07:37 PM7/9/07
to
Antony Dovgal kirjoitti:

> On 10.07.2007 01:48, Andrei Zmievski wrote:
>> And I think that we shouldn't, since it removes a big incentive for
>> people to move to PHP 6.
>
> I don't really see much sense in forcing people to use PHP6 if we accept
> the "PHP5 = PHP6 - Unicode" formula.

> They are just different things, period.
>
>> Really, we need to get folks to use Unicode natively as much as possible.
>
> Andrei, I personally don't need Unicode at all.
> I know, that may sound weird, but that's true.
>
>> This kind of attitude is precisely why we've had and continue to have
>> such an internationalization mess when it comes to building
>> applications.
>
> What attitude are you talking about here?
>
> I'm trying to be honest with myself in the first place.
> Do _I_ like that horrible IS_STRING/IS_UNICODE mess we have atm? No.
> Do _I_ want to maintain this mess in the future just because of some bad
> design decision in the past? Noway, we had enough of that already.
>
> I would love to have clean and easy PHP6 without all the
> "compatibility", which creates gazillion problems to both users and
> developers.
> Please notice that I didn't call Unicode useless crap or whatever others
> may think about it, I just want PHP6 to be Unicode-only release because
> it would make my personal life much easier
> without complicating others' lives.

Thank you Antony. This is exactly how I think too.

--Jani

Johannes Schlüter

unread,
Jul 9, 2007, 7:37:55 PM7/9/07
to
Hi,

On Mon, 2007-07-09 at 15:33 -0700, Stanislav Malyshev wrote:
> Fixing unicode=on does not remove the IS_STRING/IS_UNICODE duality. We
> still have two kinds of data - unstructured bit stream and structured
> text.

But we still have the mess that most internal structures (function
tables, class tables, ...) either hold an IS_STRING or IS_UNICODE
depending on a configuration option - just check the amounts of
UG(unicode)?IS_UNICODE:IS_STIRNG (that one even got a macro
ZEND_STR_TYPE) kind of checks - these make the code way harder to read
and maintain.

And again: It is as easy to run PHP 5 and PHP 6 on the same host as PHP
6 with unicode and PHP 6 w/o so I can't see a BC benefit of that setting
but I can see that this gives us two products with the same name - PHP
6. And that's bad.

johannes

Andi Gutmans

unread,
Jul 9, 2007, 10:41:58 PM7/9/07
to
The large amount of the dual IS_UNICODE/IS_STRING will need to stay in
the code base anyway as we will be supporting binary strings in PHP 6.
So it's not accurate that all these maintance issues will be resolved by
not supporting unicode_semantics=3Doff.

I believe unlike what Andrei said, for a large community of ours
(probably the majority) default unicode_semantics=3Don will not be of
interest (we don't live in a purists world). Many won't want to run it
because it's going to be significantly slower and will be harder for
them to work with. This community will be best served to be able to run
in native 8bit mode and having some Unicode functionality available
if/when needed. Having dual mode in PHP 6 is not the same as forking two
code bases. There is still like namespaces automatically reach both
audiences.

If we're talking from a pure "what is most useful to the majority of our
users" I'd actually argue that explicit Unicode strings would be the
most convenient, i.e. instead of doing b"8bitstring" you'd do
U"unicodestring". Other languages do the same and there are reasons for
that. As we've decided on a more aggressive (and risky) approach, I
think having this dual mode is extremely important. It will also make
the upgrade path easier.

Btw, I don't know how many of you have actually tried to port PHP 5 apps
to PHP 6 but it's quite a disaster. We made some fixes in the past 2-3
weeks and its getting better but it still requires a lot of work. If we
don't make this easy then this is all not worth too much.

This project has never been a purists project which is why it's been so
successful, let's not start now...

Andi

Lukas Kahwe Smith

unread,
Jul 10, 2007, 1:05:23 AM7/10/07
to

On 10.07.2007, at 01:19, chris# wrote:

>
>
>
> On Mon, 9 Jul 2007 14:38:03 -0700, Andrei Zmievski
> <and...@gravitonic.com> wrote:
>> Yes, backporting major features from PHP 6 to 5 will slow down PHP 6

>> adoption, and I'd like to avoid it if possible.
>>
>> There is a way to run two engines side by side, by the way: in

>> separate instances of Apache. It's really not that complicated.
>

> Isn't there some evidence of the ability to run two engines side-by-
> side
> with only one instance of Apache; thereby eliminating some overhead?
> Wouldn't that actually be easier?
> I could have sworn I saw that somewhere.
>

maybe someone could make runkit really fast .. *nudge*

regards,
Lukas

Derick Rethans

unread,
Jul 10, 2007, 2:42:21 AM7/10/07
to
On Mon, 9 Jul 2007, chris# wrote:

> On Mon, 9 Jul 2007 14:38:03 -0700, Andrei Zmievski <and...@gravitonic.com> wrote:
> > Yes, backporting major features from PHP 6 to 5 will slow down PHP 6
> > adoption, and I'd like to avoid it if possible.
> >
> > There is a way to run two engines side by side, by the way: in
> > separate instances of Apache. It's really not that complicated.
>
> Isn't there some evidence of the ability to run two engines

> side-by-side with only one instance of Apache; thereby eliminating

> some overhead? Wouldn't that actually be easier? I could have sworn I
> saw that somewhere.

You can do that with fastcgi and lighttpd, not with apache.

Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

Derick Rethans

unread,
Jul 10, 2007, 2:46:18 AM7/10/07
to
--8323329-856224698-1184049802=:32387
Content-Type: TEXT/PLAIN; charset=UTF-8
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Mon, 9 Jul 2007, Nicolas B=C3=A9rard-Nault wrote:

> Permit me to give my 2 cents on that and share my small bit of experience
> with PHP 6.

>=20


> First of all, I totally agree with you Antony. I'm currently working on

> deploying a big codebase in PHP 6 (for those of you who didn't know, I'm =
the
> GSoC student working on refactoring Jaws for PHP 6) and my head started t=


o
> ache when I began understanding all the complications of the unicode
> implementation as it is right now. Basically, having that
> unicode.semanticsPHP_INI switch just totally kills the fun because I
> have to have a working
> application if it is ON or OFF.

Why? Just state that it only works when it is turned ON - I am pretty=20
sure that that's the way we'll go.

> Long story short, this forces me to
> explicitly define each string as either binary or unicode, which doesn't

> make any "PHP sense". It's actually the first time I'm forced to explicit=
ly
> specify a variable type in PHP and I'm not sure I'm the only one who's no=
t
> happy about this. I like the unicode support and really appreciate all th=
e
> work that's been done on it but I absolutely think it should be implement=


ed
> without that headache/pain in the ass switch that'll make transition even
> tougher for everyone.

That I agree with :)

Derick

--=20


--8323329-856224698-1184049802=:32387
Content-Type: text/plain; charset=us-ascii

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--8323329-856224698-1184049802=:32387--

Alexey Zakhlestin

unread,
Jul 10, 2007, 4:17:16 AM7/10/07
to
On 7/10/07, Derick Rethans <der...@php.net> wrote:

> You can do that with fastcgi and lighttpd, not with apache.

not true.
you can do that with ANY server which uses fastcgi (apache can do that too!)

actualy, I believe fastcgi-mode should get some "advertising" from
php.net as it really have its advantages

--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/

Derick Rethans

unread,
Jul 10, 2007, 4:22:08 AM7/10/07
to
On Tue, 10 Jul 2007, Alexey Zakhlestin wrote:

> On 7/10/07, Derick Rethans <der...@php.net> wrote:
>
> > You can do that with fastcgi and lighttpd, not with apache.
>
> not true. you can do that with ANY server which uses fastcgi (apache
> can do that too!)
>

> Actualy, I believe fastcgi-mode should get some "advertising" from

> php.net as it really have its advantages

You're right - didn't think of that, but then again, it was a sneak
commercial for lighttpd ;-)

Derick

Stefan Priebsch

unread,
Jul 10, 2007, 4:39:08 AM7/10/07
to
Derick Rethans schrieb:

> You can do that with fastcgi and lighttpd, not with apache.

(With how much effort) could it be made possible with Apache? Deciding
on unicode and non-unicode PHP at compile time and -optionally- running
two engines in parallel could IMHO ease some performance issues and also
make the code easier to maintain.

Kind regards,

Stefan

--
>e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

--

Andi Gutmans

unread,
Jul 10, 2007, 12:34:52 PM7/10/07
to
I was thinking a bit more about this yesterday. Even if I'd agree with
this discussion (which I don't at this point in time) I think it is
being had far too early. We currently have a very big problem with
ability to upgrade to PHP 6 and making decisions without people actually
getting their feet wet and seeing what the issues are is not a good
idea. Purist decisions tend to fail when they meet the real world.

What I really think we need to do for this release, which we haven't
been good at doing in the past, is build a PHP Compatibility Team which
tries to port many applications to PHP 6 and finds the issues in doing
this port (both with unicode_semantics=3Don/off). We can then learn from
this experience and have good documentation on how to upgrade to both
modes and in some cases, like we have done in the past 2-3 weeks, tweak
PHP 6 to not break backwards compatibility. It is possible in many
cases.

It's something we are willing to spend time on and as I mentioned
already started to do but it would really require a larger amount of
volunteers to pick various apps and do it.

This kind of information would be far more valuable to the project at
this point than a prolonged thread about a piece of software which isn't
finish (and would also give more information for a discussion like the
one we've been having). No one really knows how good/bad of a situation
we are at right now. I know from my end it doesn't look great yet.

Andi

> -----Original Message-----
> From: Andi Gutmans [mailto:an...@zend.com]=20
> Sent: Monday, July 09, 2007 7:39 PM
> To: Antony Dovgal; Andrei Zmievski
> Cc: Stas Malyshev; inte...@lists.php.net
> Subject: RE: [PHP-DEV] What is the use of "unicode.semantics"=20
> in PHP 6?
>=20
> The large amount of the dual IS_UNICODE/IS_STRING will need=20
> to stay in the code base anyway as we will be supporting=20
> binary strings in PHP 6.

> So it's not accurate that all these maintance issues will be=20


> resolved by not supporting unicode_semantics=3Doff.

>=20
> I believe unlike what Andrei said, for a large community of=20
> ours (probably the majority) default unicode_semantics=3Don=20
> will not be of interest (we don't live in a purists world).=20
> Many won't want to run it because it's going to be=20
> significantly slower and will be harder for them to work=20
> with. This community will be best served to be able to run in=20
> native 8bit mode and having some Unicode functionality=20
> available if/when needed. Having dual mode in PHP 6 is not=20
> the same as forking two code bases. There is still like=20


> namespaces automatically reach both audiences.

>=20
> If we're talking from a pure "what is most useful to the=20
> majority of our users" I'd actually argue that explicit=20
> Unicode strings would be the most convenient, i.e. instead of=20
> doing b"8bitstring" you'd do U"unicodestring". Other=20
> languages do the same and there are reasons for that. As=20
> we've decided on a more aggressive (and risky) approach, I=20
> think having this dual mode is extremely important. It will=20


> also make the upgrade path easier.

>=20
> Btw, I don't know how many of you have actually tried to port=20
> PHP 5 apps to PHP 6 but it's quite a disaster. We made some=20
> fixes in the past 2-3 weeks and its getting better but it=20
> still requires a lot of work. If we don't make this easy then=20


> this is all not worth too much.

>=20
> This project has never been a purists project which is why=20


> it's been so successful, let's not start now...

>=20
> Andi
>=20
> --
> PHP Internals - PHP Runtime Development Mailing List To=20
> unsubscribe, visit: http://www.php.net/unsub.php
>=20
>=20

Evert | Rooftop

unread,
Jul 10, 2007, 12:43:18 PM7/10/07
to
Andi Gutmans wrote:
> What I really think we need to do for this release, which we haven't
> been good at doing in the past, is build a PHP Compatibility Team which
> tries to port many applications to PHP 6 and finds the issues in doing
> this port (both with unicode_semantics=on/off). We can then learn from

> this experience and have good documentation on how to upgrade to both
> modes and in some cases, like we have done in the past 2-3 weeks, tweak
> PHP 6 to not break backwards compatibility. It is possible in many
> cases.
>
I'd volunteer for this. Does it help you guys to get started with this
today, or should I be waiting till there's more agreement on some of
this stuff..

Evert

Andi Gutmans

unread,
Jul 10, 2007, 6:05:14 PM7/10/07
to
I think the sooner the better as it's valuable information for the dev
team.
It'd probably be a good idea to have a Wiki where we can document issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi=20

> -----Original Message-----
> From: Evert | Rooftop [mailto:ev...@rooftopsolutions.nl]=20
> Sent: Tuesday, July 10, 2007 9:40 AM
> To: Andi Gutmans
> Cc: Antony Dovgal; Andrei Zmievski; Stas Malyshev;=20
> inte...@lists.php.net
> Subject: Re: [PHP-DEV] What is the use of "unicode.semantics"=20
> in PHP 6?
>=20

> Andi Gutmans wrote:
> > What I really think we need to do for this release, which=20
> we haven't=20
> > been good at doing in the past, is build a PHP Compatibility Team=20
> > which tries to port many applications to PHP 6 and finds=20
> the issues in=20
> > doing this port (both with unicode_semantics=3Don/off). We can then=20
> > learn from this experience and have good documentation on how to=20
> > upgrade to both modes and in some cases, like we have done=20
> in the past=20
> > 2-3 weeks, tweak PHP 6 to not break backwards compatibility. It is=20
> > possible in many cases.
> > =20
> I'd volunteer for this. Does it help you guys to get started=20
> with this today, or should I be waiting till there's more=20


> agreement on some of this stuff..

>=20
> Evert
>=20

Larry Garfield

unread,
Jul 10, 2007, 8:09:51 PM7/10/07
to
On Monday 09 July 2007, Stanislav Malyshev wrote:
> > Do _I_ like that horrible IS_STRING/IS_UNICODE mess we have atm? No.
>
> I don't think there's any way of having both unstructured character data
> and Unicode text represented without having two distinct types. Either
> that or you'd have to tell on each step which one it is, and that would
> suck much more.
>
> > I would love to have clean and easy PHP6 without all the
> > "compatibility", which creates gazillion problems to both users and
> > developers.
>
> Fixing unicode=3Don does not remove the IS_STRING/IS_UNICODE duality. We

> still have two kinds of data - unstructured bit stream and structured
> text. If we want strlen("=D0=BF=D1=80=D0=B5=D0=B2=D0=B5=D0=B4") to return=

6 - since that Russian word
> has 6 characters - then we have no way but recognize that it's not just
> a collection of bits but Unicode text, and that would require separate
> type, as I see it. And as I see it, this is the source of the problems
> when people try to operate on text as on bit stream and vice versa.
>
> Unless I totally missed what mess you are referring to...

I am coming into this discussion decidedly late here, so please thwap me=20
gently if this is a FAQ. Do we have any idea of what percentage of strings=
=20
in the "wild" would break if treated as Unicode vs. not? =20

If 90% of the strings in use would work fine if treated as unicode, then it=
=20
would make sense to just always assume Unicode unless explicitly specified=
=20
otherwise.

If 90% of the strings in use would die if treated as Unicode, then Unicode=
=20
should probably be the exception and only when explicitly defined.

I'm not liking the ghosts of magic_quotes I'm seeing implied here with=20
different modes for the server to be in. That sounds like it would make=20
writing code that works the same everywhere and is not ugly to read (craplo=
ad=20
of markers or lots of conditionals) quite difficult.

As I said, feel free to assuage my fear if appropriate. :-)

=2D-=20
Larry Garfield AIM: LOLG42
la...@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of=20
exclusive property, it is the action of the thinking power called an idea,=
=20
which an individual may exclusively possess as long as he keeps it to=20
himself; but the moment it is divulged, it forces itself into the possessio=
n=20
of every one, and the receiver cannot dispossess himself of it." -- Thomas=
=20
Jefferson

Evert | Rooftop

unread,
Jul 10, 2007, 10:14:15 PM7/10/07
to
Andi Gutmans wrote:
> I think the sooner the better as it's valuable information for the dev
> team.
> It'd probably be a good idea to have a Wiki where we can document issues
> that/common use-cases which are encountered.
> Maybe we should have a Wiki on one of the php.net servers for such
> purposes?
> Andi
>
>
Is anyone aware of a list with a, say top 10 PHP applications?

When such a wiki is setup, how would you suggest to write such
documents.. At least a generic guide would be good (e.g.: common pitfalls)
Should I be documenting the per-project specifics as well?

Evert

Larry Garfield

unread,
Jul 11, 2007, 1:18:57 AM7/11/07
to
On Tuesday 10 July 2007, Evert | Rooftop wrote:
> Andi Gutmans wrote:
> > I think the sooner the better as it's valuable information for the dev
> > team.
> > It'd probably be a good idea to have a Wiki where we can document issues
> > that/common use-cases which are encountered.
> > Maybe we should have a Wiki on one of the php.net servers for such
> > purposes?
> > Andi
>
> Is anyone aware of a list with a, say top 10 PHP applications?
>
> When such a wiki is setup, how would you suggest to write such
> documents.. At least a generic guide would be good (e.g.: common pitfalls)
> Should I be documenting the per-project specifics as well?
>
> Evert

Top 10 by what metric? If I had to guess based on market share, I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

And I run out of steam here. :-) That's just my guess, though.

Probably a better place to look would be to see what is commonly
pre-installable or pre-installed at shared hosts. phpMyAdmin and
Squirrelmail seem to be everywhere. WordPress, Drupal, Joomla, and PHPBB
seem to turn up in "free scripts!" lists a lot.

--

Larry Garfield AIM: LOLG42
la...@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of

exclusive property, it is the action of the thinking power called an idea,

which an individual may exclusively possess as long as he keeps it to

himself; but the moment it is divulged, it forces itself into the possession

of every one, and the receiver cannot dispossess himself of it." -- Thomas

Jefferson

Evert | Rooftop

unread,
Jul 11, 2007, 1:23:43 AM7/11/07
to
Larry Garfield wrote:
>
> Top 10 by what metric? If I had to guess based on market share, I'd say
> (unordered):
>
> Drupal
> Squirrelmail
> WordPress
> phpMyAdmin
> MediaWiki
> Joomla
> PHPBB
>

That will keep me busy =)

Evert

Evert | Rooftop

unread,
Jul 11, 2007, 1:36:22 AM7/11/07
to
One final question..

should I assume while converting code "unicode.semantics" is on or off?

If its on I would be making sure everything is properly casted to binary
strings where this is needed, if it's off the focus would be on making
sure the application runs on both PHP5 and PHP6..

What makes the most sense here? I would personally say I would try it
assuming its off, as this is the most likely for the development teams
to target for ..

Evert

Andi Gutmans wrote:
> I think the sooner the better as it's valuable information for the dev
> team.
> It'd probably be a good idea to have a Wiki where we can document issues
> that/common use-cases which are encountered.
> Maybe we should have a Wiki on one of the php.net servers for such
> purposes?
> Andi
>
>

>> -----Original Message-----
>> From: Evert | Rooftop [mailto:ev...@rooftopsolutions.nl]
>> Sent: Tuesday, July 10, 2007 9:40 AM
>> To: Andi Gutmans
>> Cc: Antony Dovgal; Andrei Zmievski; Stas Malyshev;
>> inte...@lists.php.net
>> Subject: Re: [PHP-DEV] What is the use of "unicode.semantics"
>> in PHP 6?
>>
>> Andi Gutmans wrote:
>>
>>> What I really think we need to do for this release, which
>>>

>> we haven't

>>
>>> been good at doing in the past, is build a PHP Compatibility Team

>>> which tries to port many applications to PHP 6 and finds
>>>

>> the issues in
>>
>>> doing this port (both with unicode_semantics=on/off). We can then

>>> learn from this experience and have good documentation on how to

>>> upgrade to both modes and in some cases, like we have done
>>>

>> in the past
>>
>>> 2-3 weeks, tweak PHP 6 to not break backwards compatibility. It is
>>> possible in many cases.
>>>
>>>

>> I'd volunteer for this. Does it help you guys to get started

>> with this today, or should I be waiting till there's more

>> agreement on some of this stuff..
>>

Robert Lemke

unread,
Jul 11, 2007, 2:50:07 AM7/11/07
to

Am 11.07.2007 um 07:20 schrieb Evert|Rooftop:

>> Top 10 by what metric? If I had to guess based on market share,
>> I'd say (unordered):
>>
>> Drupal
>> Squirrelmail
>> WordPress
>> phpMyAdmin
>> MediaWiki
>> Joomla
>> PHPBB

hey, and what about TYPO3? ;-)

Honestly, I've tried the current version of TYPO3 (4.x) with PHP6 and
as it seems it is not very difficult adapting it. Most of the errors
were of type E_STRICT and with unicode.semantics off it probably
needs few changes because we don't rely on PHP functions for unicode
support.

I'm currently working on TYPO3 5.0 which comes with a new codebase
specifically written for PHP6 and I agree with Nicolas, that the
unicode.semantics switch spoils the fun a little. We just have to
hope that enough hosting companies offer PHP6 based webspaces with
unicdode.semantics turned on. And if they don't, we'll have to start
an initiative and ask hosters specifically to offer such a product.

Robert
--
http://typo3.org/gimmefive

Richard Quadling

unread,
Jul 11, 2007, 4:15:37 AM7/11/07
to
On 11/07/07, Evert | Rooftop <ev...@rooftopsolutions.nl> wrote:

> Larry Garfield wrote:
> >
> > Top 10 by what metric? If I had to guess based on market share, I'd say
> > (unordered):
> >
> > Drupal
> > Squirrelmail
> > WordPress
> > phpMyAdmin
> > MediaWiki
> > Joomla
> > PHPBB
> >
>
> That will keep me busy =)
>
> Evert
>

Would it also be worth checking some of the frameworks too? Prado, eZ, Zend?
--
-----
Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

Lukas Kahwe Smith

unread,
Jul 11, 2007, 4:24:35 AM7/11/07
to

On 11.07.2007, at 07:15, Larry Garfield wrote:

> On Tuesday 10 July 2007, Evert | Rooftop wrote:

>> Andi Gutmans wrote:
>>> I think the sooner the better as it's valuable information for
>>> the dev
>>> team.
>>> It'd probably be a good idea to have a Wiki where we can document
>>> issues
>>> that/common use-cases which are encountered.
>>> Maybe we should have a Wiki on one of the php.net servers for such
>>> purposes?
>>> Andi
>>

>> Is anyone aware of a list with a, say top 10 PHP applications?
>>
>> When such a wiki is setup, how would you suggest to write such
>> documents.. At least a generic guide would be good (e.g.: common
>> pitfalls)
>> Should I be documenting the per-project specifics as well?
>>
>> Evert
>

> Top 10 by what metric? If I had to guess based on market share,
> I'd say
> (unordered):
>
> Drupal
> Squirrelmail
> WordPress
> phpMyAdmin
> MediaWiki
> Joomla
> PHPBB
>

> And I run out of steam here. :-) That's just my guess, though.
>
> Probably a better place to look would be to see what is commonly
> pre-installable or pre-installed at shared hosts. phpMyAdmin and
> Squirrelmail seem to be everywhere. WordPress, Drupal, Joomla, and
> PHPBB
> seem to turn up in "free scripts!" lists a lot.
>

we tried to get most of the top php OSS projects into the primary
testers group:
http://oss.backendmedia.com/PhP4yz
http://oss.backendmedia.com/PhP5yz
http://oss.backendmedia.com/PhP6yz

regards,
Lukas

Lukas Kahwe Smith

unread,
Jul 11, 2007, 4:27:41 AM7/11/07
to

On 11.07.2007, at 00:02, Andi Gutmans wrote:

> I think the sooner the better as it's valuable information for the dev
> team.
> It'd probably be a good idea to have a Wiki where we can document
> issues
> that/common use-cases which are encountered.
> Maybe we should have a Wiki on one of the php.net servers for such
> purposes?

Well I have been asking for a wiki for quite some time. currently a
lot of the release management runs on the OSS wiki of my old company:
http://oss.backendmedia.com/PHPTODO/

Once we have the wiki on php.net servers we could more easily
integrate the login management etc. It makes absolute sense to have
this and IIRC the only opposition has always been Rasmus that insists
that things like this should be in CVS (yes I know we have a todo
file in CVS). But it seems to me like most internals developers have
showed their preference to a wiki with their "feet".

Jani Taskinen

unread,
Jul 11, 2007, 4:33:42 AM7/11/07
to
On Wed, 2007-07-11 at 10:21 +0200, Lukas Kahwe Smith wrote:
> we tried to get most of the top php OSS projects into the primary
> testers group:
> http://oss.backendmedia.com/PhP4yz
> http://oss.backendmedia.com/PhP5yz
> http://oss.backendmedia.com/PhP6yz

Emphasis on word "tried" ? :D
Is there some procedure to follow for releases regarding those testers
anyway?

--Jani

Tomas Kuliavas

unread,
Jul 11, 2007, 6:04:14 AM7/11/07
to
>> > I think the sooner the better as it's valuable information for the dev
>> > team.
>> > It'd probably be a good idea to have a Wiki where we can document
>> issues
>> > that/common use-cases which are encountered.
>> > Maybe we should have a Wiki on one of the php.net servers for such
>> > purposes?
>> > Andi
>>
>> Is anyone aware of a list with a, say top 10 PHP applications?
>>
>> When such a wiki is setup, how would you suggest to write such
>> documents.. At least a generic guide would be good (e.g.: common
>> pitfalls)
>> Should I be documenting the per-project specifics as well?
>>
>> Evert
>
> Top 10 by what metric? If I had to guess based on market share, I'd say
> (unordered):
>
> Drupal
> Squirrelmail

SquirrelMail

1. remove session_unregister call

2. fix get_magic_quotes_gpc() call

3. Turn off unicode.semantics in webserver configuration or php.ini

SquirrelMail scripts are designed to work with binary strings. Lots of
SquirrelMail functions are not compatible with unicode.semantics=on. Some
calls are not prepared to handle changes in crc32(), base64_encode(),
fputs() and fwrite(). If scripts keep backwards compatibility, they will
need wrappers for most of affected string and stream functions.

Some unicode.semantics=on side effects can be fixed without splitting
functions between PHP5 and PHP6, but unicode.script_encoding can't be set
with ini_set() and must be declared on top of all affected scripts.


--
Tomas

Richard Lynch

unread,
Jul 11, 2007, 9:09:18 PM7/11/07
to
On Mon, July 9, 2007 3:06 am, Stanislav Malyshev wrote:
>> But now \xF0 isn't going to be ASCII 128 anymore, is it?
>
> ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't
> matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of
> course. Now how preg_match should handle it depends on preg_match.

I should have said "Extended ASCII".

And, unfortunately, there are at least 3 commonly-used "Extended
ASCII" out there, and, yes, this is exactly what Unicode is trying to
solve.

Only problem is, the data coming into most web apps is usually NOT
UTF-16, nor even UTF-8, but "Windows Extended ASCII" (more or less)
and most end users of PHP do not have the luxury of being able to have
a dedicated server.

So they are going to be stuck with their data getting totally munged
into UTF-16 on new PHP installations and, if I'm following this thread
correctly, NOT going to be able to get back to the actual data that
came IN to their web application.

So the ISPs aren't going to install PHP 6 because their users are
going to be screaming at them that it broke their applications.

Or they'll all install it with this goofy non-Unicode mode, in which
case, there's not much point to them having installed it, and y'all
will be effectively maintaining 3 branches:
PHP 5
PHP 6 ASCII
PHP 6 Unicode

Unless you drop PHP 6 ASCII, in which case even fewer will bother to
install PHP 6, not even in unicode.semantics off mode.

Seems to me we're painted into a corner where the number of people who
actually install PHP 6 is going to be abysmally small...

But maybe I'm just being pessimistic.

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

Richard Lynch

unread,
Jul 11, 2007, 9:16:01 PM7/11/07
to
On Mon, July 9, 2007 3:07 am, Tomas Kuliavas wrote:
>>>>> Unicode code points can be defined with \u, but PHP6 breaks
>>>>> existing octal and hex escape sequences.
>>>
>>> I don't understand what this means...
>>
>> I think I know...
>>
>> I have code like this, somewhere:
>>
>> if (preg_match("|[\xF0-\xFF]|", $data)){
>> $data = un_microsuck($data);
>> }
>>
>> un_microsuck() basically detects and converts any of the goof-ball
>> extended ASCII from MS products (Word, Outlook, etc) to an HTML
>> equivalent character.

>>
>> But now \xF0 isn't going to be ASCII 128 anymore, is it?
>
> \xF0 never was ASCII. ASCII (ISO-646) is 7bit character set. \xF0 is
> decimal 240. It is 8bit.

Don't tell me.

Tell Microsoft.

Cuz I sure as heck get a LOT of input data >> \x7f and I have to do
something reasonable with it...

And I did say "extended ASCII" in the other paragraph, after all...

>> Or maybe \xF0 will "work" but the octal \360 won't?
>
> Are you sure that you can't do that by setting
> unicode.something_encoding
> to iso-8859-1 or windows-1252?

I dunno.

Doesn't really matter if I can't set those in .htaccess, that's for sure.

[joke type="semi"]
All this working going into Unicode, and nobody is pushing to replace
(CR|CRLF|LF) with a new Unicode all-platform newline character?
[/joke]

Richard Lynch

unread,
Jul 11, 2007, 9:18:43 PM7/11/07
to
On Mon, July 9, 2007 3:13 am, Alexey Zakhlestin wrote:
> On 7/9/07, Richard Lynch <c...@l-i-e.com> wrote:
>>
>> Anybody who actually NEEDS Unicode ought to be the ones who have to
>> type a new keyword or something, not the bazillion users who have no
>> need for Unicode and likely never will...
>
> I wonder whom do you mean here.
> I can't remember many non-unicode internet-sites built during the last
> 5 years.

Errrr.

Maybe you've only looked at really humungous corporate sites?

Cuz there are a few million sites in the past 5 years that wouldn't
know what to do with Unicode if it walked up and bit them...

Richard Lynch

unread,
Jul 11, 2007, 9:22:46 PM7/11/07
to
On Mon, July 9, 2007 1:41 pm, Andrei Zmievski wrote:
> Once again, you're trying to work with bytes inside Unicode strings,
> which just does not make sense.

From our perspective, you've gone and changed a fundamental data
structure out from under us, in a non-backwards-compatible way, and
broken a whole bunch of working code, for a feature we don't use, and
can't turn off [*]

This is said without rancor nor animosity, but to explain why we (and
many users) are going to have a very high wtf factor with this.

* Assuming a shared-host environment budget and external factors make
moving to a different host impossible.

I think the PHP core developers frequently forget that there are a LOT
of PHP developers/users out there with severe budget constraints that
just don't have the kinds of resources you are presuming are available
to "solve" the problems being created here...

I can always find a host who will do what I want with enough effort,
but a LOT of users will just give up on PHP 6 and stick with 5 (or 4
even) rather than do that...

Richard Lynch

unread,
Jul 11, 2007, 9:30:13 PM7/11/07
to
On Wed, July 11, 2007 3:11 am, Richard Quadling wrote:
> On 11/07/07, Evert | Rooftop <ev...@rooftopsolutions.nl> wrote:
>> Larry Garfield wrote:
>> >
>> > Top 10 by what metric? If I had to guess based on market share,
>> I'd say
>> > (unordered):
>> >
>> > Drupal
>> > Squirrelmail
>> > WordPress
>> > phpMyAdmin
>> > MediaWiki
>> > Joomla
>> > PHPBB

I saw a reference in this thread to webhosts that don't upgrade
because cPanel didn't work, no?
[Larry said that, I think...]

So, I dunno, maybe the various panels that all those webhosters use
should be a candidate...

I mean, they all seem to have those panel thingies, even if I
personally use them as rarely as humanly possible...

[Talk about making easy things impossible... :-)]

I got no idea which ones are the most common, though.

Richard Lynch

unread,
Jul 11, 2007, 9:45:06 PM7/11/07
to
On Tue, July 10, 2007 7:06 pm, Larry Garfield wrote:
> If 90% of the strings in use would work fine if treated as unicode,
> then it
> would make sense to just always assume Unicode unless explicitly
> specified
> otherwise.

If that 10% includes enough users who have written millions of line of
code in a self-consistent manner that voids ALL their work, you may
want to re-think this 90% number you have chosen...

And of course you need 2 distinct data types for Unicode and strings.

What I don't understand is why you'd lock things down so that:

a) the default "string" is Unicode, breaking XX% of existing applications

b) the end user can't readily change a) in a huge percentage of
existing install base (read: non-dedicated hosting or mixed-user
servers with shared httpd.conf settings)


I realize it's far too late by now to do anything about it, most
likely, but why in the world didn't you just choose a new keyword to
define/declare a string as Unicode?

And did I dream the thread on this way back when where it was stated
that Unicode was backwards-compatible, so this wouldn't be a problem?

Yet now it seems that UTF-16 is *not* backwards-compatible, and this
seems like a pretty big problem to me.

Oh well. I guess I'll just shut up and hope most of my code doesn't
break when I go copying/pasting it into new sites that are locked into
Unicode mode with no way for me to change that...

Richard Lynch

unread,
Jul 11, 2007, 9:53:42 PM7/11/07
to
On Mon, July 9, 2007 5:24 pm, Christopher Jones wrote:
>
> I also think we shouldn't backport features to PHP5. We should

I believe the only serious reason FOR this is if you want to drop the
semantics OFF in PHP 6...

If getting new features requires upgrading to 6 and taking the Unicode
stuff that we theorize will break a great deal of code...

Richard Lynch

unread,
Jul 11, 2007, 9:58:30 PM7/11/07
to
On Tue, July 10, 2007 11:30 am, Andi Gutmans wrote:
> What I really think we need to do for this release, which we haven't
> been good at doing in the past, is build a PHP Compatibility Team
> which
> tries to port many applications to PHP 6 and finds the issues in doing
> this port (both with unicode_semantics=on/off). We can then learn from
> this experience and have good documentation on how to upgrade to both
> modes and in some cases, like we have done in the past 2-3 weeks,
> tweak
> PHP 6 to not break backwards compatibility. It is possible in many
> cases.

This all sounds great...

Where are all the developers you need going to come from? :-v

Is it time yet for, say, the squirrelMail developers to try to run
their app in PHP 6 and tell you what all broke?

You wanna announce that somewhere and take a flood of bug reports in
bugs.php.net?

Just tossing out the idea...

Richard Lynch

unread,
Jul 11, 2007, 9:59:49 PM7/11/07
to
Seems to me...

Both need to be done.

Do both, or pick one if you can't do both, and somebody else will do
the other. That's how FLOSS works. :-)

On Wed, July 11, 2007 12:33 am, Evert | Rooftop wrote:
> One final question..
>
> should I assume while converting code "unicode.semantics" is on or
> off?
>
> If its on I would be making sure everything is properly casted to
> binary
> strings where this is needed, if it's off the focus would be on making
> sure the application runs on both PHP5 and PHP6..
>
> What makes the most sense here? I would personally say I would try it
> assuming its off, as this is the most likely for the development teams
> to target for ..
>
> Evert
>
> Andi Gutmans wrote:

>> I think the sooner the better as it's valuable information for the
>> dev
>> team.
>> It'd probably be a good idea to have a Wiki where we can document
>> issues
>> that/common use-cases which are encountered.
>> Maybe we should have a Wiki on one of the php.net servers for such
>> purposes?
>> Andi
>>
>>

>>> -----Original Message-----
>>> From: Evert | Rooftop [mailto:ev...@rooftopsolutions.nl]
>>> Sent: Tuesday, July 10, 2007 9:40 AM
>>> To: Andi Gutmans
>>> Cc: Antony Dovgal; Andrei Zmievski; Stas Malyshev;
>>> inte...@lists.php.net
>>> Subject: Re: [PHP-DEV] What is the use of "unicode.semantics"
>>> in PHP 6?
>>>

>>> Andi Gutmans wrote:
>>>
>>>> What I really think we need to do for this release, which
>>>>
>>> we haven't
>>>
>>>> been good at doing in the past, is build a PHP Compatibility Team
>>>> which tries to port many applications to PHP 6 and finds
>>>>
>>> the issues in
>>>
>>>> doing this port (both with unicode_semantics=on/off). We can then
>>>> learn from this experience and have good documentation on how to
>>>> upgrade to both modes and in some cases, like we have done
>>>>
>>> in the past
>>>
>>>> 2-3 weeks, tweak PHP 6 to not break backwards compatibility. It is
>>>> possible in many cases.
>>>>
>>>>

>>> I'd volunteer for this. Does it help you guys to get started
>>> with this today, or should I be waiting till there's more
>>> agreement on some of this stuff..
>>>
>>> Evert
>>>
>>>
>>
>>
>

Larry Garfield

unread,
Jul 11, 2007, 10:01:47 PM7/11/07
to
On Wednesday 11 July 2007, Richard Lynch wrote:

> And did I dream the thread on this way back when where it was stated
> that Unicode was backwards-compatible, so this wouldn't be a problem?
>
> Yet now it seems that UTF-16 is *not* backwards-compatible, and this
> seems like a pretty big problem to me.
>
> Oh well. I guess I'll just shut up and hope most of my code doesn't
> break when I go copying/pasting it into new sites that are locked into
> Unicode mode with no way for me to change that...

AFAIK, UTF-8 is backward compatible with ASCII. UTF-16 is not. That's why
Microsoft defaults to UTF-16 (when they don't default to Windows-1251 or
whatever crap it is) and the rest of the universe (at least the parts of it
that I've seen) defaults to UTF-8.

--
Larry Garfield AIM: LOLG42
la...@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

--

Olivier Hill

unread,
Jul 11, 2007, 10:04:11 PM7/11/07
to
Is there a reason why the last 10 messages on this thread are coming from you?

It might just be me, but answering in the same email would be great.

Olivier

> > --
> > PHP Internals - PHP Runtime Development Mailing List
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >
> >
>
>

> --
> Some people have a "gift" link here.
> Know what I want?
> I want you to buy a CD from some indie artist.
> http://cdbaby.com/browse/from/lynch
> Yeah, I get a buck. So?
>

Rasmus Lerdorf

unread,
Jul 11, 2007, 10:17:36 PM7/11/07
to
Richard Lynch wrote:
> On Tue, July 10, 2007 7:06 pm, Larry Garfield wrote:
>> If 90% of the strings in use would work fine if treated as unicode,
>> then it
>> would make sense to just always assume Unicode unless explicitly
>> specified
>> otherwise.
>
> If that 10% includes enough users who have written millions of line of
> code in a self-consistent manner that voids ALL their work, you may
> want to re-think this 90% number you have chosen...
>
> And of course you need 2 distinct data types for Unicode and strings.
>
> What I don't understand is why you'd lock things down so that:
>
> a) the default "string" is Unicode, breaking XX% of existing applications
>
> b) the end user can't readily change a) in a huge percentage of
> existing install base (read: non-dedicated hosting or mixed-user
> servers with shared httpd.conf settings)
>
>
> I realize it's far too late by now to do anything about it, most
> likely, but why in the world didn't you just choose a new keyword to
> define/declare a string as Unicode?
>
> And did I dream the thread on this way back when where it was stated
> that Unicode was backwards-compatible, so this wouldn't be a problem?
>
> Yet now it seems that UTF-16 is *not* backwards-compatible, and this
> seems like a pretty big problem to me.

Richard, you are rather confused on this Unicode stuff. The fact that
PHP and ICU uses UTF-16 internally has absolutely nothing to do with
what is exposed at the scripting level.

The only things that will break in a standard application is stuff that
relies on strings being binary. Normal text passing back and forth
between the browser and the server will work just fine.

The breakages, apart from various bugs at this early stage, are limited
to places where the code is expecting to see a binary string and PHP
hasn't been able to determine this automatically. And hopefully we can
come up with ways to automatically determine when something should
default to a binary string.

But if you write:

$a = "マニュアル";
echo $a[1];

and you expect to have that spew out 0xe3, then yes, it will break
because it will result in ニ which is what it really should do.

And yes, I know a lot of people reading this list don't care much for
other charsets, but people reading an english mailing list are rather
self-selecting.

-Rasmus

Tomas Kuliavas

unread,
Jul 12, 2007, 12:20:15 AM7/12/07
to
>>>>>> Unicode code points can be defined with \u, but PHP6 breaks
>>>>>> existing octal and hex escape sequences.
>>>>
>>>> I don't understand what this means...
>>>
>>> I think I know...
>>>
>>> I have code like this, somewhere:
>>>
>>> if (preg_match("|[\xF0-\xFF]|", $data)){
>>> $data = un_microsuck($data);
>>> }
>>>
>>> un_microsuck() basically detects and converts any of the goof-ball
>>> extended ASCII from MS products (Word, Outlook, etc) to an HTML
>>> equivalent character.
>>>
>>> But now \xF0 isn't going to be ASCII 128 anymore, is it?
>>
>> \xF0 never was ASCII. ASCII (ISO-646) is 7bit character set. \xF0 is
>> decimal 240. It is 8bit.
>
> Don't tell me.
>
> Tell Microsoft.
>
> Cuz I sure as heck get a LOT of input data >> \x7f and I have to do
> something reasonable with it...
>
> And I did say "extended ASCII" in the other paragraph, after all...
>
>>> Or maybe \xF0 will "work" but the octal \360 won't?
>>
>> Are you sure that you can't do that by setting
>> unicode.something_encoding to iso-8859-1 or windows-1252?
>
> I dunno.
>
> Doesn't really matter if I can't set those in .htaccess, that's for sure.

All unicode. settings except unicode.semantics are PHP_INI_ALL.

From README.UNICODE
----
Script Encoding
===============
...
If you cannot change the encoding system wide, you can use a pragma to
override the INI setting in a local script:

<?php declare(encoding = 'Shift-JIS'); ?>
----

--
Tomas

Larry Garfield

unread,
Jul 12, 2007, 12:43:21 AM7/12/07
to
Because he's Richard. He always does that. You should see him on
php-general. :-)

On Wednesday 11 July 2007, Olivier Hill wrote:
> Is there a reason why the last 10 messages on this thread are coming from
> you?
>
> It might just be me, but answering in the same email would be great.
>
> Olivier
>
> On 7/11/07, Richard Lynch <c...@l-i-e.com> wrote:
> > Seems to me...
> >
> > Both need to be done.
> >
> > Do both, or pick one if you can't do both, and somebody else will do
> > the other. That's how FLOSS works. :-)

--
Larry Garfield AIM: LOLG42
la...@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

--

M. Sokolewicz

unread,
Jul 12, 2007, 6:18:42 AM7/12/07
to
Richard Lynch wrote:
> On Wed, July 11, 2007 3:11 am, Richard Quadling wrote:
>> On 11/07/07, Evert | Rooftop <ev...@rooftopsolutions.nl> wrote:
>>> Larry Garfield wrote:
>>>> Top 10 by what metric? If I had to guess based on market share,
>>> I'd say
>>>> (unordered):
>>>>
>>>> Drupal
>>>> Squirrelmail
>>>> WordPress
>>>> phpMyAdmin
>>>> MediaWiki
>>>> Joomla
>>>> PHPBB
>
> I saw a reference in this thread to webhosts that don't upgrade
> because cPanel didn't work, no?
> [Larry said that, I think...]
>
> So, I dunno, maybe the various panels that all those webhosters use
> should be a candidate...
>
> I mean, they all seem to have those panel thingies, even if I
> personally use them as rarely as humanly possible...
>
> [Talk about making easy things impossible... :-)]
>
> I got no idea which ones are the most common, though.
>

Cpanel
Plesk
Ensim

Sebastian Mendel

unread,
Jul 12, 2007, 6:20:57 AM7/12/07
to
Larry Garfield schrieb:

> On Tuesday 10 July 2007, Evert | Rooftop wrote:
>> Andi Gutmans wrote:
>>> I think the sooner the better as it's valuable information for the dev
>>> team.
>>> It'd probably be a good idea to have a Wiki where we can document issues
>>> that/common use-cases which are encountered.
>>> Maybe we should have a Wiki on one of the php.net servers for such
>>> purposes?
>>> Andi
>> Is anyone aware of a list with a, say top 10 PHP applications?
>>
>> When such a wiki is setup, how would you suggest to write such
>> documents.. At least a generic guide would be good (e.g.: common pitfalls)
>> Should I be documenting the per-project specifics as well?
>>
>> Evert
>
> Top 10 by what metric? If I had to guess based on market share, I'd say
> (unordered):
>
> Drupal
> Squirrelmail
> WordPress
> phpMyAdmin

phpMyAdmin runs fine with PHP 6, except masses of notices/stricts (due to
PHP 4 compatibility till 2.11 release this year)

if you find problems tell me


--
Sebastian Mendel

www.sebastianmendel.de

Stanislav Malyshev

unread,
Jul 12, 2007, 5:42:30 PM7/12/07
to
> From our perspective, you've gone and changed a fundamental data
> structure out from under us, in a non-backwards-compatible way, and
> broken a whole bunch of working code, for a feature we don't use, and
> can't turn off [*]

Supporting unicode requires such change. It is a big deal - Unicode does
change the way one thinks about textual information. Text is not a
collection of 8-bit integers anymore. But this step needs to be made if
we want to be able to write applications that deal with modern
environments requiring multi-language and multi-locale support. So PHP 6
is to make this step.

> I can always find a host who will do what I want with enough effort,
> but a LOT of users will just give up on PHP 6 and stick with 5 (or 4
> even) rather than do that...

Maybe. But we have unicode=off option to give them a chance for smoother
transition.
--
Stanislav Malyshev, Zend Software Architect
st...@zend.com http://www.zend.com/
(408)253-8829 MSN: st...@zend.com

Derick Rethans

unread,
Jul 13, 2007, 4:07:20 AM7/13/07
to
On Wed, 11 Jul 2007, Richard Quadling wrote:

> On 11/07/07, Evert | Rooftop <ev...@rooftopsolutions.nl> wrote:
> > Larry Garfield wrote:
> > >

> > > Top 10 by what metric? If I had to guess based on market share, I'd say
> > > (unordered):
> > >
> > > Drupal
> > > Squirrelmail
> > > WordPress
> > > phpMyAdmin

> > > MediaWiki
> > > Joomla
> > > PHPBB
> > >
> >
> > That will keep me busy =)
>

> Would it also be worth checking some of the frameworks too? Prado, eZ,
> Zend?

I did test things a couple of months ago for the eZ Components, and it
didn't seem that bad. But now it's more "messy", but I didn't really
check why.

regards,
Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

Richard Lynch

unread,
Jul 14, 2007, 4:23:28 AM7/14/07
to
On Wed, July 11, 2007 9:14 pm, Rasmus Lerdorf wrote:
> Richard, you are rather confused on this Unicode stuff.

I'm 100% certain we can all agree on that point. :-)

> The fact that
> PHP and ICU uses UTF-16 internally has absolutely nothing to do with
> what is exposed at the scripting level.

But somebody has just said that it will, didn't they?

That GPC data will be Unicode, and trying to use it as ASCII will break?

> The only things that will break in a standard application is stuff
> that
> relies on strings being binary. Normal text passing back and forth
> between the browser and the server will work just fine.
>
> The breakages, apart from various bugs at this early stage, are
> limited
> to places where the code is expecting to see a binary string and PHP
> hasn't been able to determine this automatically. And hopefully we
> can
> come up with ways to automatically determine when something should
> default to a binary string.
>
> But if you write:
>
> $a = "マニュアル";
> echo $a[1];

Whoa.

That was weird...

It was just a bunch of question marks when I read it, and now it's a
bunch of symbols (variants on afz mostly) in my reply...

> and you expect to have that spew out 0xe3, then yes, it will break
> because it will result in ニ which is what it really should do.

You have me beat at the "...if you write" part, because I have no idea
how to make my keyboard make those symbols... :-v

My only concern is that:

http://example.com/foo=bar
echo $_GET['foo'][2];
should still print out 'a' just like it always has.

And:
http://example.com/mask=100110
echo $_GET['mask'] & 110010;
should print out 100010 just like it always has

Folks keep saying that bit-string manipulation makes no sense in
Unicode, and that's fine, I guess...

If a scripter is trying to do that, then see if the string is ASCII
[01]* and typecast it to binary string or whatever and just move on
with life in the old way.

> And yes, I know a lot of people reading this list don't care much for
> other charsets, but people reading an english mailing list are rather
> self-selecting.

I love the idea of users being able to write things in their own
language, and somehow it magically all just "looks right" when I slam
it into the database with mysql_real_escape_string and spew it back
out the the browser with htmlentities!

But it never quite seems to work out, in my limited experience,
because some software somewhere always manages to mangle it...

And I release the whole point of Unicode in PHP 6 is to make PHP 6 not
be that piece of software that mangles it, and I'm sure you guys are
getting that bit right. Well, I hope so anyway. :-)

I especially hope so, because if you don't get it right, I'll never be
able to tell, as I wouldn't notice the difference if it's broken or
not just by looking at the text in anything other than English.

I just get real concerned when it seems to me like a lot of scripts
are going to break, based on what folks who should know post here...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

--

Tomas Kuliavas

unread,
Jul 14, 2007, 4:44:02 AM7/14/07
to
>> But if you write:
>>
>> $a = "マニュアル";
>> echo $a[1];
>
> Whoa.
>
> That was weird...
>
> It was just a bunch of question marks when I read it, and now it's a
> bunch of symbols (variants on afz mostly) in my reply...

Your browser or operating system does not support Japanese symbols and
translation selected in your Hostbaby Webmail (or you could use real name
- SquirrelMail) does not support Japanese characters in reply.

According to google translate $a variable stores word 'Manual' written in
Japanese.

--
Tomas

Rasmus Lerdorf

unread,
Jul 14, 2007, 10:04:12 AM7/14/07
to
Richard Lynch wrote:
>> $a = "マニュアル";

>> echo $a[1];
>
> Whoa.
>
> That was weird...

Right, your mail client doesn't handle Unicode correctly. You might
want to do something about that.

-Rasmus

Uwe Schindler

unread,
Jul 14, 2007, 10:23:26 AM7/14/07
to
That sounds "good" in my ears.

Software that relys on "old" non-unicode behaviour must be written in a =
way two handle non-unicode and Unicode behaviour in two different ways.
But for example a rewritten "Squirrelmail" that runs exlusively on PHP6 =
would be a good thing.

So you could write on your release notes: "We have this new version =
SquirrelMail++ that=E2=80=99s running only on hosts running PHP6. Using =
this would be a great speed and performance increase, because the =
Unicode addons are only available here. If you need an old non-unicode =
version, you have to stay with our old historic version." The old =
historic Squirrelmail version without Unicode support would be stays =
supported until some time. But all users would know: If I want to have =
new features, I should think about a change to PHP6, all other users =
could stay on the old version.

In the case of the fantastic software "SquirrelMail++PHP6-only" (which I =
would use on my servers, too) I would think in this direction!

-----
Uwe Schindler
thet...@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany

> -----Original Message-----
> From: Rasmus Lerdorf [mailto:ras...@lerdorf.com]
> Sent: Saturday, July 14, 2007 4:00 PM
> To: c...@l-i-e.com
> Cc: inte...@lists.php.net
> Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP =
6?
>=20
> Richard Lynch wrote:
> >> $a =3D =
"=C3=A3=C6=92=C5=BE=C3=A3=C6=92=E2=80=B9=C3=A3=C6=92=C2=A5=C3=A3=E2=80=9A=
=C2=A2=C3=A3=C6=92=C2=AB";


> >> echo $a[1];
> >
> > Whoa.
> >
> > That was weird...

>=20


> Right, your mail client doesn't handle Unicode correctly. You might
> want to do something about that.

>=20
> -Rasmus
>=20

Uwe Schindler

unread,
Jul 14, 2007, 10:36:50 AM7/14/07
to
> In the case of the fantastic software "SquirrelMail++PHP6-only" (which I
> would use on my servers, too) I would think in this direction!

My last post was specific to the complaining guy from SquirrelMail:
Squirrelmail is a fantastic example of software that would, in a rewritten
form, make use of PHP6 at many points (there are many bugs with Unicode in
it...) and make profit of it!

Uwe

-----
Uwe Schindler
thet...@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany

--

Tomas Kuliavas

unread,
Jul 14, 2007, 10:59:25 AM7/14/07
to
> That sounds "good" in my ears.
>
> Software that relys on "old" non-unicode behaviour must be written in a
> way two handle non-unicode and Unicode behaviour in two different ways.
> But for example a rewritten "Squirrelmail" that runs exlusively on PHP6
> would be a good thing.
>
> So you could write on your release notes: "We have this new version
> SquirrelMail++ that’s running only on hosts running PHP6. Using this would
> be a great speed and performance increase, because the Unicode addons are
> only available here. If you need an old non-unicode version, you have to
> stay with our old historic version." The old historic Squirrelmail version
> without Unicode support would be stays supported until some time. But all
> users would know: If I want to have new features, I should think about a
> change to PHP6, all other users could stay on the old version.

>
> In the case of the fantastic software "SquirrelMail++PHP6-only" (which I
> would use on my servers, too) I would think in this direction!

There is nothing in current PHP6 version that can be used by SquirrelMail.
Last features are provided by PHP 5.1.0. Limiting code to PHP6 would
reduce user base. SquirrelMail can work on PHP6 with
unicode.semantics=off, if two lines in one script are fixed.

P.S. I am not SquirrelMail guy. I am former SquirrelMail developer and I
use own modified SquirrelMail version. It does not have issues with
Japanese.

--
Tomas

Richard Lynch

unread,
Jul 17, 2007, 5:42:57 PM7/17/07
to
On Sat, July 14, 2007 9:00 am, Rasmus Lerdorf wrote:
> Richard Lynch wrote:
>>> $a = "茫茠啪茫茠鈥姑F捖ッb€毬⒚F捖�;

>>> echo $a[1];
>>
>> Whoa.
>>
>> That was weird...
>
> Right, your mail client doesn't handle Unicode correctly. You might
> want to do something about that.

Or not, since I don't have any chance of reading Japanese even if the
characters "look right"...

I was in Paris once, and using a French keyboard didn't improve my
French either. :-)

I could switch to a "real" mail client instead of the webhost supplied
SquirrelMail, I suppose...

Last time I tried to do that, the Linux mail client "ate" a bunch of
my email and really messed things up badly. I think it was KMail...

Other times I found the sync time of an IMAP mail client to be rather
abysmal compared to the web-based eamil...

Maybe I just store too much old email or something, but I'm not seeing
much reason to switch, since neither of the two renderings were
readable.

It was only interesting that it "switched" in read mode and compose
mode is all.

PS I suspect a newer squirrelMail would handle Unicode just fine, and
I'm sure my webhost will upgrade long before I need them to.

Or I can install a new squirrelMail in my own server and run whatever
version I want, if I go learn Japanese first...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

--

0 new messages