(Net::LDAP) Automatically convert attributes into utf8 when writting

122 views
Skip to first unread message

pe rl

unread,
Aug 25, 2015, 7:00:03 AM8/25/15
to perl...@perl.org
Hi, we are using an old version of Net::LDAP (0.39) in an old perl
installation (5.10.1). Recently we have changed the ldap server, and now it
uses utf8 in the entry attributes, so we are getting problems with reading and
writting attributes with Net::LDAP.

To solve it, I have read the documentation of Net::LDAP 0.39 ( at
https://metacpan.org/pod/release/GBARR/perl-ldap-0.39/lib/Net/LDAP.pod ), and
I see there is an option ("raw") in the constructor to indicate attributes
that should be treated as utf8. I have tested it, and it works por reading
from the ldap server (attribute strings are marked as utf8, so they a treated
correctly by our programs), but it doesn't work for writting (our latin1
strings are not being converted automatically into utf8 before being sent).

So it looks like the "raw" option works for reading but not for writting. Is
there any quick way to use the "raw" regex also for writting? The alternative
would be to review all of our code and manually encode all the values to utf8
before passing them to Net::LDAP, but it would mean a lot of work. It would be
better if we could change the Net::LDAP library itself to convert
automatically attributes into utf8, same as for reading. For example, a new
option "raw_for_writing" could be added to the constructor:

      $ldap = Net::LDAP->new(
                                $server,
                                port => $port,
                                raw => qr/(?i:^jpegPhoto|;binary)/,
                                raw_for_writing => 1,
                            )

I see that the automatic conversion for reading is done at the "decode"
function of "Entry.pm":

    sub decode {

      ...

      if (CHECK_UTF8 && $arg{raw}) {
        $result->{objectName} = Encode::decode_utf8($result->{objectName})
          if ('dn' !~ /$arg{raw}/);

      ...

        foreach my $elem (@{$self->{asn}{attributes}}) {
          map { $_ = Encode::decode_utf8($_) } @{$elem->{vals}}
            if ($elem->{type} !~ /$arg{raw}/);
        }
      }

And I see that there is an "encode" function in "Entry.pm", that doesn't do
the magic:

    sub encode {
      $LDAPEntry->encode( shift->{asn} );
    }

Would it be sufficient to add some similar code to the "Entry::encode"
function in order to automatically encode attributes to utf8 before being sent?

Any suggestion to reduce the amount of code to be changed in our programs?

Thank you

Keutel, Jochen (mlists)

unread,
Aug 25, 2015, 7:15:01 AM8/25/15
to perl...@perl.org
Hello,
instead of patching Net::LDAP you should use utf8::encode() and
utf8::decode() in your perl code.

See http://perldoc.perl.org/5.10.1/utf8.html .

Regards, Jochen.

pe rl

unread,
Aug 25, 2015, 7:45:02 AM8/25/15
to Keutel, Jochen (mlists), perl...@perl.org
Thank you, I already knew utf8::encode() and utf8::decode().

They are not necessary when reading/searching in the ldap server, since
Net::LDAP already has a "raw" option in the constructor to automatically
encode/decode strings. It is working for us, and the only change required has
been to add the "raw" option to the constructor.

The problem appears when writting to the ldap server. I have started to modify
our code with utf8::encode(), by adding it to every attribute in all of our
functions. The problem is that it is very inefficient, since I will have to
modify every attribute that appears in our programs. We have a lot of functions
that create/modify/delete entries in the ldap server, so I will have to change
a lot of code to manually encode attribs to utf8, and then test all of the
changes.

It would be much simpler if Net::LDAP would encode automatically the
attributes by using the regex passed into the "raw" option of the constructor,
since the changes in our programs would be zero. In my first message I pasted
the code in Net::LDAP that encodes the attributes when reading from the ldap
server, and it looks simple. Probably encoding attributes when writting to the
ldap server could be simple as well. Probably the changes required in
Net::LDAP are minimal compared to the changes required in our code.

Thank you


25.08.2015, 13:04, "Keutel, Jochen (mlists)" <mli...@keutel.de>:

pe rl

unread,
Aug 25, 2015, 3:00:02 PM8/25/15
to perl...@perl.org

pe rl

unread,
Aug 26, 2015, 3:30:03 AM8/26/15
to perl...@perl.org
I will try to explain it clearer. Currently you can read utf8 atributes out
of the box by using the "raW" option in the constructor:

$ldap = Net::LDAP->new(
$server,
port => $port,
raw => qr/not_utf_attrib/,
);

so you can look for entries, and all the attributes will be converted
automatically from utf8 (except "not_utf_attrib" attribute):

$mesg = $ldap->search(
base => "c=US",
filter => "(&(sn=Barr) (o=Texas Instruments))",
);
# Treat here entry attributes in $mesg without worrying on utf8 values.

But if you create an entry, you will have to manually convert all attributes
into utf8 (except "not_utf_attrib" attribute):

for my $c (keys %$attrs)
{
utf8::encode($attrs->{$c})
if (not ref $attrs->{$c} and $c ne 'not_utf_attrib');
}

$mesg = $ldap->add( $dn, attrs => [ @$attrs ] );

This manual conversion should be unnecesary, since Net::LDAP already knows the
attributes that must be treated as utf8 and the attributes that mustn't (with
the "raw" option), so they could be converted automatically by Net::LDAP.

So I'm proposing to add an option (raw_for_writing) to the constructor, that
will indicate Net::LDAP to automatically convert attributes into utf8 when
creating/modifing entries:

$ldap = Net::LDAP->new(
$server,
port => $port,
raw => qr/not_utf_attrib/,
raw_for_writing => 1,
);

Thank you


( Off topic: There is a delay, after a posted mail appears in the web page
http://www.nntp.perl.org/group/perl.ldap/2015/08.html . That is why I sent my
first post twice, I thought that subscription was required, since the mail
didn't appear in the web page )


25.08.2015, 20:42, "pe rl" <pe...@yandex.com>:

Peter Marschall

unread,
Aug 29, 2015, 8:00:02 AM8/29/15
to perl...@perl.org
Hi,

On Tuesday, 25. August 2015 13:37:15 pe rl wrote:
> They are not necessary when reading/searching in the ldap server, since
> Net::LDAP already has a "raw" option in the constructor to automatically
> encode/decode strings. It is working for us, and the only change required
> has been to add the "raw" option to the constructor.

I think you misinterpret the purpose of the raw option.

Its goal is to convert the byte strings coming from the LDAP server that
represent UTF-8 encoded directory strings from byte semantics to
Perl scalars with character semantics.

On the other hand, perl-ldap expects scalars in character semantics when
it comes to writing directory strings to an LDAP server.

It is not perl-ldap's job to translate between scalars in Perl's character
semantics and various input or output encodings of your application.


> The problem appears when writting to the ldap server. I have started to
> modify our code with utf8::encode(), by adding it to every attribute in all
> of our functions. The problem is that it is very inefficient, since I will
> have to modify every attribute that appears in our programs. We have a lot
> of functions that create/modify/delete entries in the ldap server, so I
> will have to change a lot of code to manually encode attribs to utf8, and
> then test all of the changes.

It is not perl-ldap's job to translate between scalars in Perl's character
semantics and various input or output encodings of your application.

This is the application's task.
If you - as you write - need to convert every attribute using ut8::encode(),
then your application seems to use a mixture of byte & character semantics.

In that case please do yourself a favour and switch over to character
semantics by correctly converting input to character semantics when it
happens:
- for file & console input you can use the ":encoding(...)" layer to make
sure you get character semantics instead of byte semantics
- for @ARGV a simple
$_ = Encode::decode('UTF-8' ,$_) for @ARGV;
should be sufficient.

You may also have a look at the 'utf8::all' package that does a lot of the
above for you automatically.

Please read the perlunicode manual page for more detailed information.

Best
PEter

--
Peter Marschall
pe...@adpm.de

pe rl

unread,
Aug 31, 2015, 4:00:01 AM8/31/15
to Peter Marschall, perl...@perl.org
Thank you for your information.

Finally I added "uf8::encode" to all the attribs, so now it works.

Converting our code (@_ and file i/o) into utf8 was an option, but I discarded it because we have a lot of files (our proyect is nearly a framework, not a few files), including modules that read translation string files for several languages, so converting everything into utf8 would be a lot of extra work.

Our proyect is rather old, it was created in the old times, when utf8 was still not used. This is the reason why it is so difficult for us to convert everyting into utf8. Anyway I believe we will have to convert it some day, as you proposed.

Thank you


29.08.2015, 13:54, "Peter Marschall" <pe...@adpm.de>:
Reply all
Reply to author
Forward
0 new messages