Problem with unicode russian characters

758 views
Skip to first unread message

ver

unread,
May 31, 2011, 7:44:30 AM5/31/11
to Mojolicious
I have a web form for updating (by AJAX) entity data (name,
description, etc.). I've wrote simple method for update database:

sub update {
my $self = shift;
my $data = $self->req->params->to_hash;

FW::Core::Model->db->update('entity_table', $data);
}

When I receive string in russian - for example "Привет, Вася" (note
about comma and space characters), I have received it in my method as
"Привет, ����".

Data::Dumper:
{
msg' => "\x{<some_cyrillic_codes>}, \x{fffd}\x{fffd}\x{fffd}
\x{fffd}"
}

New generated mojo application works fine (welcome => sub { my $self =
shift; $data = $self->req->params->to_hash; $self->render(json =>
$data); }). What is wrong? I tried to solve this issue about week.

Vladimir Gusakov

unread,
May 31, 2011, 7:48:25 AM5/31/11
to mojol...@googlegroups.com
"use utf8;" pragma in your controller code will solve this issue

Vladimir


--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To post to this group, send email to mojol...@googlegroups.com.
To unsubscribe from this group, send email to mojolicious...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mojolicious?hl=en.


ver

unread,
May 31, 2011, 7:51:51 AM5/31/11
to Mojolicious

On May 31, 3:48 pm, Vladimir Gusakov <v.gusa...@perevedem.ru> wrote:
> "use utf8;" pragma in your controller code will solve this issue
I already put it ino my codo but nothing happened :( use encoding
'utf-8'; doesn't help too;

koleg

unread,
May 31, 2011, 8:48:33 AM5/31/11
to Mojolicious
Try Encode::_utf8_off like this

sub get_user {
my $self = shift;
my $params = $self->req->params->to_hash;
my $q = $params->{user};

Encode::_utf8_off($q);


my $json = Mojo::JSON->new;
my $user = $json->decode($q);
........
}

ver

unread,
May 31, 2011, 9:20:06 AM5/31/11
to Mojolicious
It's not working too because characters was "broken" before
controller's method =( I try to sniff HTTP headers but it have normal
cyrillic characters.

Lyle

unread,
May 31, 2011, 6:46:54 PM5/31/11
to mojol...@googlegroups.com
I suspect you may be saving the file in your native character encoding
for cyrillic characters, such as Windows-1251 (cp1251). Ensure you save
the file as utf8. Editors like KomodoEdit handle this kind of thing very
well.


Lyle

Oroszi, Róbert

unread,
May 31, 2011, 6:52:10 PM5/31/11
to mojol...@googlegroups.com
I had a similar problem with hungarian characters a few months ago. ( using DBI and DBD::mysql )
The problem was an old version of DBI, which came from CentOS repo.
I upgraded both DBI and DBD::mysql via cpan, and now it works like charm.:)

2011/5/31 ver <v...@0xff.su>

ver

unread,
Jun 2, 2011, 2:36:37 AM6/2/11
to Mojolicious


On Jun 1, 2:46 am, Lyle <webmas...@cosmicperl.com> wrote:
> I suspect you may be saving the file in your native character encoding
> for cyrillic characters, such as Windows-1251 (cp1251). Ensure you save
> the file as utf8. Editors like KomodoEdit handle this kind of thing very
> well.
No, I don't use files.Browser send POST request with data un UTF-8 and
when I receive it in my controller, I have already broken data (0xFFFD
unicode characters).

ver

unread,
Jun 2, 2011, 4:22:55 AM6/2/11
to Mojolicious
Hmm, strange...When I dumping $self->param('name'), it's all right,
but when I use $self->req->params->to_hash, I've got the broken
unicode characters...

ver

unread,
Jun 2, 2011, 5:12:44 AM6/2/11
to Mojolicious
I have seen Mojo::Parameters->parse() and found when data is breaking.
# Escaped value
if (index($value, '%') >= 0) {
print "1st: $value\n"; # %D0%B0%D0%B1%D0%B2%2C
%D0%B3%D0%B4%D0%B5
url_unescape $value;
print "2nd: $value\n"; # абв, ������
my $backup = $value;
decode $charset, $value if $charset;
$value = $backup unless defined $value;
}

but this works right:

perl -e 'use URI::Escape; print uri_unescape "%D0%B0%D0%B1%D0%B2%2C
%D0%B3%D0%B4%D0%B5"'

What's wrong? I have breaking my brain =\

ver

unread,
Jun 2, 2011, 7:25:10 AM6/2/11
to Mojolicious
At this moment I've replaced "url_decode $value;" to "$value =~ s/%
(D0)%([0-9A-Fa-f]{2})/chr(hex($1)).chr(hex($2))/eg; $value =~ s/%
([0-9A-Fa-f]{2})/chr(hex($1))/eg;" and it works but It's not good idea
to modifying Mojo source

Michael Ludwig

unread,
Jun 4, 2011, 10:24:19 AM6/4/11
to mojol...@googlegroups.com
ver schrieb am 02.06.2011 um 02:12 (-0700):
> I have seen Mojo::Parameters->parse() and found when data is breaking.
> # Escaped value
> if (index($value, '%') >= 0) {
> print "1st: $value\n"; # %D0%B0%D0%B1%D0%B2%2C
> %D0%B3%D0%B4%D0%B5
> url_unescape $value;
> print "2nd: $value\n"; # абв, ������
> my $backup = $value;
> decode $charset, $value if $charset;
> $value = $backup unless defined $value;
> }
>
> but this works right:
>
> perl -e 'use URI::Escape; print uri_unescape "%D0%B0%D0%B1%D0%B2%2C
> %D0%B3%D0%B4%D0%B5"'
>
> What's wrong? I have breaking my brain =\

Both work fine, at least using Mojolicious-1.41:

use v5.10;
use strict;
use warnings;
use URI::Escape 'uri_unescape';
use Mojo::Util 'url_unescape';

my $value = '%D0%B0%D0%B1%D0%B2%2C %D0%B3%D0%B4%D0%B5';
my $copy = $value;

say $value;
$value = uri_unescape $value;
say $value;
utf8::decode $value;
binmode STDOUT, 'encoding(UTF-8)';
say $value;

binmode STDOUT;

say $copy;
url_unescape $copy;
say $copy;
utf8::decode $copy;
binmode STDOUT, 'encoding(UTF-8)';
say $copy;

--
Michael Ludwig

Reply all
Reply to author
Forward
0 new messages