You're probably not escaping < and > and other special HTML characters.
--
Best Regards,
Nimrod A. Abing
W http://arsenic.ph/
W http://preownedcar.com/
W http://preownedbike.com/
W http://abing.gotdns.com/
Last time you were saying mod_wsgi-3.0-TRUNK was broken. This time you
are saying it is mod_wsgi 2.1 and that mod_wsgi-3.0-TRUNK is fine.
Last time no further followup was done as you didn't respond to it
being pointed out that you weren't encoding your CDATA correctly. Have
you addressed the CDATA issues?
Did you also perhaps swap around which system was running 2.1 and
which was running 3.0 and thus why the problem appears now for other
version, but perhaps same system, thus meaning it is something system
specific and not mod_wsgi?
As before, supply the smallest and most simplest code which exhibits
the problem, preferably something that doesn't depend on forking off
ssh processes so that it is something that others can actually run if
need be.
If which version of mod_wsgi is used matters, swap around versions on
each system and see if problem moves with mod_wsgi version or stays on
same system.
Graham
Where is the source code to this program?
When you run this program through another WSGI container on the same server
(e.g. the simple CGI one in the WSGI specification or wsgiref), what
happens?
Thanks,
Brian
If so, then it really isn't a problem with mod_wsgi.
- Brian
Can we go back right to start.
Where are you setting all the locale values on the system? Ie., system
wide, user account, Apache envvars file, in Python code somehow?
Also, are you running embedded mode or daemon mode for the application?
Locale environment variables are one of those horrible cases like TZ
where system C functions depend on environment variables. These can be
a problem in Apache where multiple applications are being run and each
is trying to set them to different values. In Python it gets more
confusing as you could set them through os.environ and they appear to
be set, but code in different interpreter could also set them just
after and those will take precedence in setting C level environment
variables.
I have created myself a task to explain such issues in documentation. :-)
Graham
I don't think this has anything to do with mod_wsgi or UTF encoding. I
find two things you are doing wrong here:
1. Both servers run different versions of mod_wsgi and apparently two
different apps. How are you supposed to pinpoint that it is indeed
mod_wsgi that is at fault when you have two different things running
on two different versions of mod_wsgi.
2. The output from the command in server 1 is contains unescaped
characters that are being parsed as HTML. This is why the rest of the
output has "crossed out" letters. This is because your output is
somehow causing the browser to render parts of it as if it has
appeared in between <strike> or <del> tags.
Problem #1, try running the same app on both servers and see if there
is any difference. If there is, then make sure that you have locale
settings configured the same on both servers.
Problem #2, try passing your shell output to cgi.escape() before
sending it to the browser.
Near as I can tell by looking at the mod_wsgi source, it is not doing
anything that will mess up character encodings. Like I said, this is
not a encoding issue. Your code is spitting out stuff that causes the
browser to interpret it as HTML tags. So try using cgi.escape() on
your output.
Sorry did not read down this thread. Looks like you already fixed it
by running the same app on both servers.
Try running your output through cgi.escape() before sending it out as
XML. Or just wrap the output in between CDATA.
Then we get back to the prior discussion about ensuring that output
doesn't contain characters that would cause CDATA section to be ended
prematurely.
Graham
I suggest that he try and use cgi.escape() first though. It looks like
the Javascript on the client side is just blindly injecting stuff into
DOM without properly escaping it. That's a big security no-no. I
certainly hope *that* sort of code does not get used in production.
> http://87.98.218.86/server2.png
There is something there that looks escape sequences (for syntax
coloring?) around
the ls command output, being inserted by the shell in the server with problems.
--
Ramiro Morales
Suit yourself. But as Ramiro already pointed out, those "funny"
characters are not UTF-8. Those are escape sequences for altering
colors on a terminal. Which is why I keep telling you, escape your
data first, *even* if you are putting it into a CDATA section. Because
in your case, you are blindly injecting stuff into the browser's DOM.
This is a Very Bad Thing (tm) to do. As you have already seen, some of
your output will have a strikethrough mark which you get if the
browsers thinks it's seen an opening <strike> or <del> tag. Also
browsers like Firefox are very lenient when it comes to *bad* tag soup
so it will still think < del > is <del>.
Also, why so keen to put the blame on mod_wsgi and UTF handling?
Looking at the code for mod_wsgi response handling, it pretty much
makes it "encoding agnostic". Unless you're fiddling around with the
headers, it does not really care what you sort of data you stuff into
the response body. That is, if something breaks in your output due to
character encoding then you are doing something wrong in *your* Python
code so that is the first place that you should look.
Well, sometimes it just pays to know what it is you are actually
doing. The first thing you should look into when it comes to character
encoding is your own Python code. You have to remember that you are
free to spit out Unicode from your Python code and stuff it into your
response object as long as it goes into your response *body*. Whether
or not you set the correct headers for your Content-Type and content
encoding is entirely up to you. Don't expect mod_wsgi to automatically
know that you are sending out UTF-8 data. I know it may sound like a
good idea to do that, but trust me, you *don't* want mod_wsgi to
automatically detect Content-Type for the response.
I now see why you are so confused. In essence, those weird characters
are "valid" UTF-8 as far as your browser is concerned. You're going to
get "boxes" or the *correct* character representation of the byte if
your current font has the UTF-8 character for it.
Lastly let me repeat: Do *not* just blindly insert random stuff into
your current HTML DOM without escaping data -- even if it comes from
your own server. You will leave your site open to exploits via XSS and
script injection if you do so.
If you would have taken the time to test the program under another WSGI
container then you would have seen that it doesn't have anything to do with
mod_wsgi at all. The WSGI specification comes with a CGI reference
implementation embedded right inside it. Once you had produced it on another
container then you would see that the problem is off-topic for this list and
the problem shouldn't have been posted here.
An AJAX-driven site that executes external processes through the shell is
pretty much the worst test case that you can provide. If you think you have
found a real problem with mod_wsgi then you should reduce the problem down
to the smallest test case possible and then submit *that*, instead of
blaming mod_wsgi so that everybody else has to dig through your application
to find out that mod_wsgi isn't at fault at all. If had had done this, then
it is very likely that you would have found the bugs in your program in the
process.
Regards,
Brian