utf8 again and this time i have prove its modwsgi

4 views
Skip to first unread message

gert

unread,
Jul 22, 2008, 5:55:32 PM7/22/08
to modwsgi
Server A http://91.121.53.159/tom/ssh.htm working except for the
crosses out text lines rendered at the bottom wtf ?
Server B http://87.98.218.86/ssh/ssh.htm Just fill in some first and
last name I was in the middle of making something here until i
realized i had the UTF-8 crap again,

BOTH identical locals and programs (proc.py)
root@ns1:/srv/www# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Server A modwsgi trunk
Server B modwsgi 2.1

Apache conf identical

So does this have to do with the ./configure make install files or in
the source code because i am going WHAAAAA here again.

Nimrod A. Abing

unread,
Jul 23, 2008, 12:19:23 AM7/23/08
to mod...@googlegroups.com

You're probably not escaping < and > and other special HTML characters.
--
Best Regards,
Nimrod A. Abing

W http://arsenic.ph/
W http://preownedcar.com/
W http://preownedbike.com/
W http://abing.gotdns.com/

gert

unread,
Jul 23, 2008, 4:07:57 AM7/23/08
to modwsgi
Nope, if you compare the server responses in firebug you can clearly
see encoding problems on server B

Graham Dumpleton

unread,
Jul 23, 2008, 10:20:02 AM7/23/08
to mod...@googlegroups.com
2008/7/23 gert <gert.c...@gmail.com>:

>
> Nope, if you compare the server responses in firebug you can clearly
> see encoding problems on server B

Last time you were saying mod_wsgi-3.0-TRUNK was broken. This time you
are saying it is mod_wsgi 2.1 and that mod_wsgi-3.0-TRUNK is fine.

Last time no further followup was done as you didn't respond to it
being pointed out that you weren't encoding your CDATA correctly. Have
you addressed the CDATA issues?

Did you also perhaps swap around which system was running 2.1 and
which was running 3.0 and thus why the problem appears now for other
version, but perhaps same system, thus meaning it is something system
specific and not mod_wsgi?

As before, supply the smallest and most simplest code which exhibits
the problem, preferably something that doesn't depend on forking off
ssh processes so that it is something that others can actually run if
need be.

If which version of mod_wsgi is used matters, swap around versions on
each system and see if problem moves with mod_wsgi version or stays on
same system.

Graham

gert

unread,
Jul 23, 2008, 12:44:56 PM7/23/08
to modwsgi
On Jul 23, 4:20 pm, "Graham Dumpleton" <graham.dumple...@gmail.com>
wrote:
> 2008/7/23 gert <gert.cuyk...@gmail.com>:
>
> > Nope, if you compare the server responses in firebug you can clearly
> > see encoding problems on server B
>
> Last time you were saying mod_wsgi-3.0-TRUNK was broken. This time you
> are saying it is mod_wsgi 2.1 and that mod_wsgi-3.0-TRUNK is fine.

Well i never was good at numbers :) The last time both versions did
not work for me until i changed the locals that resulted in current
server B http://91.121.53.159/tom/ssh.htm , lets call it the happy
utf-8 server. It just print forked bash stuff out without any
modifications.

> Last time no further followup was done as you didn't respond to it
> being pointed out that you weren't encoding your CDATA correctly. Have
> you addressed the CDATA issues?

No because it works on the happy utf-8 server as you can see in the
example it happily dumps what ever proc.py is printing.
Firebug show's it all, the raw out coming request is totally different
on exactly the same server setup except for modwsgi versions.
It goes wrong before i even touch it with my javascripts.

> Did you also perhaps swap around which system was running 2.1 and
> which was running 3.0 and thus why the problem appears now for other
> version, but perhaps same system, thus meaning it is something system
> specific and not mod_wsgi?

I will find out in a minute when i install trunk version.

> As before, supply the smallest and most simplest code which exhibits
> the problem, preferably something that doesn't depend on forking off
> ssh processes so that it is something that others can actually run if
> need be.

But it only happens when forking processes :) As you can see in the
other examples they all work.

> If which version of mod_wsgi is used matters, swap around versions on
> each system and see if problem moves with mod_wsgi version or stays on
> same system.

ok i am on it.

gert

unread,
Jul 23, 2008, 1:11:13 PM7/23/08
to modwsgi
that's it, nothing to do with modwsgi. Now i have identical versions
and stil utf-8 %$&^$& on server B

gert

unread,
Jul 23, 2008, 1:38:05 PM7/23/08
to modwsgi
http://87.98.218.86/tom/ssh.htm
http://91.121.53.159/tom/ssh.htm

wtf wtf wtf ???? they are like twin servers, why ????

Brian Smith

unread,
Jul 23, 2008, 1:45:06 PM7/23/08
to mod...@googlegroups.com

Where is the source code to this program?

When you run this program through another WSGI container on the same server
(e.g. the simple CGI one in the WSGI specification or wsgiref), what
happens?

Thanks,
Brian

gert

unread,
Jul 23, 2008, 2:04:36 PM7/23/08
to modwsgi
http://code.google.com/p/appwsgi/source/browse

http://code.google.com/p/appwsgi/source/browse/trunk/www/lib/proc.py

I did not try the cgi version yet but i think it will be the same as
the wsgi version. Something on the server that messes things up on a
very low level that ignores locale.

Brian Smith

unread,
Jul 23, 2008, 2:13:39 PM7/23/08
to mod...@googlegroups.com
Gert wrote:
> I did not try the cgi version yet but i think it will be the
> same as the wsgi version. Something on the server that messes
> things up on a very low level that ignores locale.

If so, then it really isn't a problem with mod_wsgi.

- Brian

Robert Coup

unread,
Jul 23, 2008, 6:24:38 PM7/23/08
to mod...@googlegroups.com

Umm, changes to /etc/environment (eg. LANG, LC_*, etc) don't take effect until a reboot. And if you're setting them in ~/.bashrc or something, then they won't take effect until a new login.

if you're getting really desparate, mount one filesystem from the other via sshfs or something and run "diff -r /etc/" ?

Rob :)

gert

unread,
Jul 23, 2008, 6:34:30 PM7/23/08
to modwsgi
yep i know, but still something that has to be documented for modwsgi
users.
If you would push on a elevators button, would you feel save if the
one next to you had a display saying %FL$OR 4

gert

unread,
Jul 23, 2008, 6:49:59 PM7/23/08
to modwsgi
On Jul 24, 12:24 am, "Robert Coup" <robert.c...@koordinates.com>
wrote:
i added the locale command to see if it is really really on utf8
i also removed the ls -l command that seems the only command with
encoding problems
can somebody already make something out whats wrong with ls -l ?
and how i can filter out the bad encodings ?

http://87.98.218.86/tom/ssh.htm
http://91.121.53.159/tom/ssh.htm

gert

unread,
Jul 23, 2008, 7:22:56 PM7/23/08
to modwsgi
How can i safety wire this, so no weird characters are getting in my
xml file no matter what stdout is giving me.

proc = subprocess.Popen(['python', '/srv/www/tom/proc.py'],
stdout=subprocess.PIPE)
xml += proc.stdout.read()


Graham Dumpleton

unread,
Jul 23, 2008, 10:02:04 PM7/23/08
to mod...@googlegroups.com
2008/7/23 gert <gert.c...@gmail.com>:

>
> Server A http://91.121.53.159/tom/ssh.htm working except for the
> crosses out text lines rendered at the bottom wtf ?
> Server B http://87.98.218.86/ssh/ssh.htm Just fill in some first and
> last name I was in the middle of making something here until i
> realized i had the UTF-8 crap again,
>
> BOTH identical locals and programs (proc.py)
> root@ns1:/srv/www# locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=

Can we go back right to start.

Where are you setting all the locale values on the system? Ie., system
wide, user account, Apache envvars file, in Python code somehow?

Also, are you running embedded mode or daemon mode for the application?

Locale environment variables are one of those horrible cases like TZ
where system C functions depend on environment variables. These can be
a problem in Apache where multiple applications are being run and each
is trying to set them to different values. In Python it gets more
confusing as you could set them through os.environ and they appear to
be set, but code in different interpreter could also set them just
after and those will take precedence in setting C level environment
variables.

I have created myself a task to explain such issues in documentation. :-)

Graham

Nimrod A. Abing

unread,
Jul 24, 2008, 4:50:52 AM7/24/08
to mod...@googlegroups.com
On Wed, Jul 23, 2008 at 4:07 PM, gert <gert.c...@gmail.com> wrote:
>
> Nope, if you compare the server responses in firebug you can clearly
> see encoding problems on server B

I don't think this has anything to do with mod_wsgi or UTF encoding. I
find two things you are doing wrong here:

1. Both servers run different versions of mod_wsgi and apparently two
different apps. How are you supposed to pinpoint that it is indeed
mod_wsgi that is at fault when you have two different things running
on two different versions of mod_wsgi.

2. The output from the command in server 1 is contains unescaped
characters that are being parsed as HTML. This is why the rest of the
output has "crossed out" letters. This is because your output is
somehow causing the browser to render parts of it as if it has
appeared in between <strike> or <del> tags.

Problem #1, try running the same app on both servers and see if there
is any difference. If there is, then make sure that you have locale
settings configured the same on both servers.

Problem #2, try passing your shell output to cgi.escape() before
sending it to the browser.

Near as I can tell by looking at the mod_wsgi source, it is not doing
anything that will mess up character encodings. Like I said, this is
not a encoding issue. Your code is spitting out stuff that causes the
browser to interpret it as HTML tags. So try using cgi.escape() on
your output.

Nimrod A. Abing

unread,
Jul 24, 2008, 5:00:36 AM7/24/08
to mod...@googlegroups.com

Sorry did not read down this thread. Looks like you already fixed it
by running the same app on both servers.
Try running your output through cgi.escape() before sending it out as
XML. Or just wrap the output in between CDATA.

Graham Dumpleton

unread,
Jul 24, 2008, 8:57:12 AM7/24/08
to mod...@googlegroups.com
2008/7/24 Nimrod A. Abing <nimrod...@gmail.com>:

>
> On Thu, Jul 24, 2008 at 1:38 AM, gert <gert.c...@gmail.com> wrote:
>>
>> http://87.98.218.86/tom/ssh.htm
>> http://91.121.53.159/tom/ssh.htm
>>
>> wtf wtf wtf ???? they are like twin servers, why ????
>
> Sorry did not read down this thread. Looks like you already fixed it
> by running the same app on both servers.
> Try running your output through cgi.escape() before sending it out as
> XML. Or just wrap the output in between CDATA.

Then we get back to the prior discussion about ensuring that output
doesn't contain characters that would cause CDATA section to be ended
prematurely.

Graham

Nimrod A. Abing

unread,
Jul 24, 2008, 9:45:42 AM7/24/08
to mod...@googlegroups.com

I suggest that he try and use cgi.escape() first though. It looks like
the Javascript on the client side is just blindly injecting stuff into
DOM without properly escaping it. That's a big security no-no. I
certainly hope *that* sort of code does not get used in production.

gert

unread,
Jul 24, 2008, 12:18:58 PM7/24/08
to modwsgi


On Jul 24, 3:45 pm, "Nimrod A. Abing" <nimrod.ab...@gmail.com> wrote:
> On Thu, Jul 24, 2008 at 8:57 PM, Graham Dumpleton
>
>
>
> <graham.dumple...@gmail.com> wrote:
>
> > 2008/7/24 Nimrod A. Abing <nimrod.ab...@gmail.com>:
>
> >> On Thu, Jul 24, 2008 at 1:38 AM, gert <gert.cuyk...@gmail.com> wrote:
>
> >>>http://87.98.218.86/tom/ssh.htm
> >>>http://91.121.53.159/tom/ssh.htm
>
> >>> wtf wtf wtf ???? they are like twin servers, why ????
>
> >> Sorry did not read down this thread. Looks like you already fixed it
> >> by running the same app on both servers.
> >> Try running your output through cgi.escape() before sending it out as
> >> XML. Or just wrap the output in between CDATA.
>
> > Then we get back to the prior discussion about ensuring that output
> > doesn't contain characters that would cause CDATA section to be ended
> > prematurely.
>
> I suggest that he try and use cgi.escape() first though. It looks like
> the Javascript on the client side is just blindly injecting stuff into
> DOM without properly escaping it. That's a big security no-no. I
> certainly hope *that* sort of code does not get used in production.

Ok thats it i am going to make a firebug screen shot because we have
allot of UTF8 disbelievers in here :)
I totaly 100% agree about me not escaping data is bad. But my point is
that the data itself that needs escaping is different on each
server ?

Modifying code again, just a sec and making a screenshot of what i am
receiving in fire bug.

gert

unread,
Jul 24, 2008, 1:01:14 PM7/24/08
to modwsgi

Ramiro Morales

unread,
Jul 24, 2008, 1:09:40 PM7/24/08
to mod...@googlegroups.com
On Thu, Jul 24, 2008 at 2:01 PM, gert <gert.c...@gmail.com> wrote:

> http://87.98.218.86/server2.png

There is something there that looks escape sequences (for syntax
coloring?) around
the ls command output, being inserted by the shell in the server with problems.

--
Ramiro Morales

gert

unread,
Jul 24, 2008, 1:25:34 PM7/24/08
to modwsgi
On Jul 24, 7:09 pm, "Ramiro Morales" <cra...@gmail.com> wrote:
> On Thu, Jul 24, 2008 at 2:01 PM, gert <gert.cuyk...@gmail.com> wrote:
> >http://87.98.218.86/server2.png
>
> There is something there that looks escape sequences (for syntax
> coloring?) around
> the ls command output, being inserted by the shell in the server with problems.

Sweet mother of god :) you are right my putty session on server 2 does
not show colors either, but does show colors on server 1, damn your
good :)
How can i disable colors ?

gert

unread,
Jul 24, 2008, 1:48:28 PM7/24/08
to modwsgi
WHAAAAAAAAAAAAAAAAAAAA !!!! IT WORKS !!!!! WHOOOOOOOOOOO

ls -l --color=none

http://87.98.218.86/tom/ssh.htm


Nimrod A. Abing

unread,
Jul 24, 2008, 10:06:56 PM7/24/08
to mod...@googlegroups.com

Suit yourself. But as Ramiro already pointed out, those "funny"
characters are not UTF-8. Those are escape sequences for altering
colors on a terminal. Which is why I keep telling you, escape your
data first, *even* if you are putting it into a CDATA section. Because
in your case, you are blindly injecting stuff into the browser's DOM.
This is a Very Bad Thing (tm) to do. As you have already seen, some of
your output will have a strikethrough mark which you get if the
browsers thinks it's seen an opening <strike> or <del> tag. Also
browsers like Firefox are very lenient when it comes to *bad* tag soup
so it will still think < del > is <del>.

Also, why so keen to put the blame on mod_wsgi and UTF handling?
Looking at the code for mod_wsgi response handling, it pretty much
makes it "encoding agnostic". Unless you're fiddling around with the
headers, it does not really care what you sort of data you stuff into
the response body. That is, if something breaks in your output due to
character encoding then you are doing something wrong in *your* Python
code so that is the first place that you should look.

gert

unread,
Jul 25, 2008, 12:52:36 PM7/25/08
to modwsgi
On Jul 25, 4:06 am, "Nimrod A. Abing" <nimrod.ab...@gmail.com> wrote:
> Suit yourself. But as Ramiro already pointed out, those "funny"
> characters are not UTF-8. Those are escape sequences for altering
> colors on a terminal. Which is why I keep telling you, escape your
> data first, *even* if you are putting it into a CDATA section. Because
> in your case, you are blindly injecting stuff into the browser's DOM.
> This is a Very Bad Thing (tm) to do. As you have already seen, some of
> your output will have a strikethrough mark which you get if the
> browsers thinks it's seen an opening <strike> or <del> tag. Also
> browsers like Firefox are very lenient when it comes to *bad* tag soup
> so it will still think < del > is <del>.

Now that I now what it is i am going to start cgi.escape() stuff.

> Also, why so keen to put the blame on mod_wsgi and UTF handling?
> Looking at the code for mod_wsgi response handling, it pretty much
> makes it "encoding agnostic". Unless you're fiddling around with the
> headers, it does not really care what you sort of data you stuff into
> the response body. That is, if something breaks in your output due to
> character encoding then you are doing something wrong in *your* Python
> code so that is the first place that you should look.

Come on if you open a document to be read as UTF-8 while it is actual
iso-whatever you get the same type of #*&$^@(&*^
So i think 90% of people like me would not had the slightest idea we
where seeing terminal colors here.

PS Graham DO NOT forget to mention BYTE STREAM and TERMINAL COLORS in
your documentation :-)

Nimrod A. Abing

unread,
Jul 25, 2008, 1:13:44 PM7/25/08
to mod...@googlegroups.com

Well, sometimes it just pays to know what it is you are actually
doing. The first thing you should look into when it comes to character
encoding is your own Python code. You have to remember that you are
free to spit out Unicode from your Python code and stuff it into your
response object as long as it goes into your response *body*. Whether
or not you set the correct headers for your Content-Type and content
encoding is entirely up to you. Don't expect mod_wsgi to automatically
know that you are sending out UTF-8 data. I know it may sound like a
good idea to do that, but trust me, you *don't* want mod_wsgi to
automatically detect Content-Type for the response.

I now see why you are so confused. In essence, those weird characters
are "valid" UTF-8 as far as your browser is concerned. You're going to
get "boxes" or the *correct* character representation of the byte if
your current font has the UTF-8 character for it.

Lastly let me repeat: Do *not* just blindly insert random stuff into
your current HTML DOM without escaping data -- even if it comes from
your own server. You will leave your site open to exploits via XSS and
script injection if you do so.

Brian Smith

unread,
Jul 25, 2008, 1:36:49 PM7/25/08
to mod...@googlegroups.com
Gert wrote:
> Come on if you open a document to be read as UTF-8 while it
> is actual iso-whatever you get the same type of #*&$^@(&*^ So
> i think 90% of people like me would not had the slightest
> idea we where seeing terminal colors here.
>
> PS Graham DO NOT forget to mention BYTE STREAM and TERMINAL
> COLORS in your documentation :-)

If you would have taken the time to test the program under another WSGI
container then you would have seen that it doesn't have anything to do with
mod_wsgi at all. The WSGI specification comes with a CGI reference
implementation embedded right inside it. Once you had produced it on another
container then you would see that the problem is off-topic for this list and
the problem shouldn't have been posted here.

An AJAX-driven site that executes external processes through the shell is
pretty much the worst test case that you can provide. If you think you have
found a real problem with mod_wsgi then you should reduce the problem down
to the smallest test case possible and then submit *that*, instead of
blaming mod_wsgi so that everybody else has to dig through your application
to find out that mod_wsgi isn't at fault at all. If had had done this, then
it is very likely that you would have found the bugs in your program in the
process.

Regards,
Brian

gert

unread,
Jul 25, 2008, 2:07:37 PM7/25/08
to modwsgi
Hell i did not even new there where hidden color bytes in a terminal.
So if I would not have blamed mod_wsgi, it would have been the linux
kernel or apache or even my own mother :P

Now its all like of course terminal colors, but it took allot of posts
before the really really clever guy mentioned terminal colors :-) Aldo
i mentioned in my second post

"Nope, if you compare the server responses in firebug you can clearly
see encoding problems on server B"

But no they had to hammer on my bad javascript :P

Anyway we all learnt something here
1) I am a pain in Graham modwsgi ass
2) Colors are like characters
Reply all
Reply to author
Forward
0 new messages