Minor encoding problem

64 views
Skip to first unread message

Mathieu Dubois

unread,
Jul 4, 2017, 5:09:42 PM7/4/17
to pylons-discuss
Dear all,

I'm far from being an expert on that so pardon my ignorance.

We have 2 servers (say prod and dev) running the same code (in some virtualenv). We recently upgraded to Ubuntu 16 (dev was updated from Ubuntu 14.04 and prod from Ubuntu 12.04 to Ubuntu 14.04 and then to Ubuntu 16.04). We use the version of pyramid provided by Ubuntu but we installed. The virutalenv were recreated from scratch for Ubuntu 16.04 and the same version of pyramid_jinja2 was installed there. The code is run with pserve (started by circus) and we use Apache as a reverse proxy to redirect to the internet (AFAIU the configuration is the same).

Our login page contains a '©'. This is served correctly from dev (the HTML code contains '©') but not from prod (the HTML code contains '©'). So somewhere there is something different but I have no idea where to look. it's not blocking but I would like to be sure that we don't miss something. Any advice ?

Mathieu

Steve Piercy

unread,
Jul 4, 2017, 5:24:49 PM7/4/17
to pylons-...@googlegroups.com
UTF-8 all the things.

* Use a meta tag in the HTML to set a character set of UTF-8.
* Ensure your files are encoded with UTF-8.
* Avoid a byte order mark (BOM) in files as that may override
other settings.
* Ensure web servers serve text/html as UTF-8. A simple text
file containing only HTML boilerplate with "©" or the classic
"räksmörgås" should be sufficient to test.
* If you hit a database, ensure the connection uses UTF-8.

There's may be a couple of other places to check, too, but you
get the idea.

--steve


On 7/4/17 at 2:05 PM, pylons-...@googlegroups.com ('Mathieu
Dubois' via pylons-discuss) pronounced:
------------------------
Steve Piercy, Soquel, CA

Jonathan Vanasco

unread,
Jul 4, 2017, 10:22:40 PM7/4/17
to pylons-discuss
Aside from what Steve said, each of those Ubuntu distros use a different version of Python, and each will run a different version of Pyramid.

You should standardize your deployments (and dev) to only use virtualenv packages AND track/standardize which version of Python you use.  IMHO, you should almost-never rely on a linux distro's package installers for Python library items -- spec and manage them with a requirements.txt and pip.

Steve Piercy

unread,
Jul 5, 2017, 6:40:52 PM7/5/17
to pylons-discuss
FTR, I just so happened to be doing some work on webob.org which is a simple HTML file only that was exhibiting this exact issue. The fix was to simply add this to the <head>:

<meta charset="utf-8">

--steve


On Tuesday, July 4, 2017 at 2:24:49 PM UTC-7, Steve Piercy wrote:
UTF-8 all the things.

* Use a meta tag in the HTML to set a character set of UTF-8.
* Ensure your files are encoded with UTF-8.
* Avoid a byte order mark (BOM) in files as that may override
other settings.
* Ensure web servers serve text/html as UTF-8.  A simple text
file containing only HTML boilerplate with "©" or the classic
"räksmörgås" should be sufficient to test.
* If you hit a database, ensure the connection uses UTF-8.

There's may be a couple of other places to check, too, but you
get the idea.

--steve


On 7/4/17 at 2:05 PM, pylons-discuss@googlegroups.com ('Mathieu

Mathieu Dubois

unread,
Jul 6, 2017, 4:11:08 AM7/6/17
to pylons-discuss
Dear all,

Thanks for the answers. First of all I solved the problem by adding <meta charset="utf-8">.

But what puzzled me is the difference between the 2 machines that are fairly similar: same OS version (although not exact same history), deployment in virtualenv, same code source (the jinja files are UTF8), same locale, etc. So I was suspecting a subtle pyramid or python or apache configuration issue...

Mathieu

Steve Piercy

unread,
Jul 6, 2017, 4:56:13 AM7/6/17
to pylons-...@googlegroups.com
I would suspect one of your Apache's not setting the character
set to UTF-8, but something less robust or not at all. You'd
have to compare versions of Apache installed as well as all the
configuration files. Check for AddDefaultCharset:
https://httpd.apache.org/docs/current/mod/core.html#adddefaultcharset

--steve


On 7/6/17 at 1:11 AM, pylons-...@googlegroups.com ('Mathieu
Dubois' via pylons-discuss) pronounced:

Jonathan Vanasco

unread,
Jul 6, 2017, 11:32:37 AM7/6/17
to pylons-discuss


On Thursday, July 6, 2017 at 4:56:13 AM UTC-4, Steve Piercy wrote:
I would suspect one of your Apache's not setting the character set to UTF-8, but something less robust or not at all.  

I read the question above wrong, and thought the issue was in reading a file - not serving it.  Steve is right, this is most-likely the reason.

Mathieu Dubois

unread,
Jul 17, 2017, 4:15:08 PM7/17/17
to pylons-discuss
Sorry for the long delay. AFAIK the apache configuration about AddDefaultCharset is the same on both servers... But the error comes from Apache since when I connect to the pserve socket (not the public one redirected through Apache) the encoding is always correct (even without <meta charset="utf-8">). This is rather mysterious but since the problem is solved, let's forget it. Thanks for the help.
Reply all
Reply to author
Forward
0 new messages