make_modules and UTF8

jfrm

unread,

Nov 20, 2018, 5:15:41 AM11/20/18

to Rose::DB::Object

G'day. I noticed that pound signs in my CGI scripts were outputting as diamonds with ? inside which is usually an indicator that a UTF8 declared page is attempting to output a Latin1 character.

Investigating further I can see that the Rose module .pm file that contains the pound sign within the code (and which correctly has use utf8; at the top) is indeed encoded Latin1.

This file is generated with Rose make_modules. The batch Perl script that generates the file using make_modules has all the following at the top:

use utf8;

use open ':encoding(utf8)';

binmode(STDOUT, ":utf8");

The make_modules incorporates 2 additional files via module_postamble but both of these are encoded and opened in UTF8.

I currently conclude that make_modules is creating a Latin1 encoded file even though all the input is UTF8. Is there a way to control this and have it output a UTF8 file?

many thanks for any tips. 8o)

Peter Karman

unread,

Nov 20, 2018, 10:45:52 AM11/20/18

to rose-db...@googlegroups.com

jfrm wrote on 11/20/18 4:15 AM:

Dealing with encodings is hard but well-known.

Can you link to the .pm Rose file you mention?

--
Peter Karman . he/him/his . 785.337.0405 . https://karpet.github.io/

jfrm

unread,

Nov 21, 2018, 3:23:59 PM11/21/18

to Rose::DB::Object

I can attach the file that generates the scripts. I've cleaned and taken out irrelevant stuff somewhat and attached it to this post. Is that what you mean?

Thanks for your help.

gendb.pl

Peter Karman

unread,

Dec 28, 2018, 2:07:53 PM12/28/18

to rose-db...@googlegroups.com

jfrm wrote on 11/21/18 2:23 PM:

Rose seems to open the files for writing without any encoding specified:

https://metacpan.org/source/JSIRACUSA/Rose-DB-Object-0.815/lib/Rose/DB/Object/Loader.pm#L409

which suggests to me that it assumes bytes (no encoding).

It's not clear to me that the UTF-8 multibyte codepoints you are seeing are
coming from Rose code or your code. In either case, I typically run all my
generated files through
https://metacpan.org/pod/Search::Tools::UTF8#to_utf8(-text,-charset-) just to be
sure, and then something like Encode::encode_utf8() before writing.

I always suggest writing the smallest possible example or test case possible to
demonstrate the problem. The sheer act of doing that often reveals to me what
the problem is, and if it doesn't, then it's easier for others to reproduce.

pek

jfrm

unread,

Dec 29, 2018, 4:20:15 PM12/29/18

to Rose::DB::Object

Thanks for the further feedback. I did some more work since last post and in doing that I noticed that most of the files generated by Rose::Loader do end up as UTF-8 format but two end up as Latin1. This seems very bizarre as I cannot see any fundamental difference if I compare a source file that ends up as Latin1 with one that ends up as UTF-8. Both have UTF-8 characters within them. Very probably it is due to something that I've caused somehow. What I did in the end was a horrible hack and just open the files that I know end up as Latin1 and re-write them as UTF-8. A better workaround would be to work out which are Latin1 and which aren't using code but there doesn't seem to be any easy and perfect way of doing that and my time is limited.

You are absolutely right that I could and should have reduced the problem down further and better but I did not have enough time when I posted. I posted in the hope that there was a quick and easy answer. One day I hope to come back and try to do the right thing...

Thanks for the pointer to Tools::UTF8 . Looks very useful.

Reply all

Reply to author

Forward