Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsers code for current L10n formats used in Mozilla

19 views
Skip to first unread message

Ricardo Palomares Martínez

unread,
Jul 18, 2015, 1:05:33 PM7/18/15
to mozilla-t...@lists.mozilla.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

I'm in the process of writing a replacement for MozillaTranslator.
Nothing disruptive in the frontend, since it keeps being a Java
Desktop tool. The changes are in the persistence (a Java SQL
database, Derby, managed with JPA), the internal datamodel, which I
expect to allow L20n in the future, and more extensive and standard
help features for users (a glossary manager and a better translation
memory).

I have tight constraints as this is my final year project (I'm 20
years delayed) and I have to reach a close-to MozillaTranslator
feature pairing by the end of August. I'm already using SAX to parse
DTDs in a very similar way than MozillaTranslator does, and I might
be forced to copy the parser for Properties files (which I also use
for INI files), but I wonder if Mozilla has some parser code for
Properties and INI files that I could browse to write a Java
version. BNF grammar files could work, too, as I could give it a try
to some parser generators like ANTLR.

TIA

- --
Proyecto NAVE
Mozilla Localization Project, es-ES Team
http://www.proyectonave.es/
Diaspora: rick...@diasp.eu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIVAwUBVaqHM4fHLsWWKpy0AQgoww/8CT0puoPj9XbhiixQF/zILhd8mYhPuPLx
V1QaDS8mKI11CI0IXxKYx/193C7fgGmR+36iGjU3vqjZ6HSdXejI8OMwQ6UJmlL5
PR3z8kLvyiKL6QjAtXY/OWQYRHJ9jRln7OI+LRIC6vOAqg1WTGfGtJYxIUkCy935
7VpOR9zedZSR9lR3MXgfUmVZcb6YYGoZpMhcnPvqE/+NhjI/e28zjAa3b3ZvOt0h
ZgkP4aQ8Nf/OvXGZi6RntnCphHSKdUSHi5NT6ov6lqzs1yeJdFW3jl7gXcoHdnzd
70xS/EoND4ODfLeh0T2ebwuJWtZnSMBo5ZDTOjt170v+1KPwLqK83PgsVLyugslr
EnX7ya77BF9urfWp6wFEwQkdPp/E/IJ67bq1LBi1qDpOvm58qkh57ePLms5PbRiq
UywSmV9LQUFFzyrTrljXFnoCGRGISSmxZ0jztiIjggfB10ot+ht3rOpqHMx1oTJV
yb3UdMRVjV3kU6mpwKLEwcHZLbkLjAo6iXbQ7ohGeVjiISLq5CznO72lhSaDOx17
L/A20edsWfSfKR+m3EHTAHI9xPUUE846eI0g3KuFcQDO0JCSmzHcaK3fe9rmxJvp
OK8N7jpwLvuexIXHABeZTBmb3+Waxg5otol3nGDgLmH8d6srqM1Z/S6f8IYksakm
1ZksOG/ArhA=
=qcF7
-----END PGP SIGNATURE-----

Matjaz Horvat

unread,
Jul 18, 2015, 2:41:20 PM7/18/15
to Ricardo Palomares Martínez, mozilla-t...@lists.mozilla.org
Ricardo,

In Pontoon we use patched Silme to parse/serialize most of Mozilla formats
(dtd, properties, ini):
http://hg.mozilla.org/l10n/silme/
https://wiki.mozilla.org/Silme

-Matjaž
> _______________________________________________
> tools-l10n mailing list
> tools...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/tools-l10n
>

Axel Hecht

unread,
Jul 20, 2015, 8:03:27 AM7/20/15
to mozilla-t...@lists.mozilla.org
..properties is hard, as there are effectively two incompatible parsers
used by our architectures.

There's is one class that tries to reverse engineer what
http://mxr.mozilla.org/mozilla/source/xpcom/ds/nsPersistentProperties.cpp#146
does, which is what's behind gecko's stringbundle API. Silme probably,
but also compare-locales.

Then there's the stuff that gaia does, which resembles properties only
on the surface, to be honest. We just recently learned that ':' is a
key-value separator in gecko, but totally part of the key (and used as
such) in gaia.

All very much sad faces. Yay l20n to get rid of the gaia variant.

The INI usage in gecko is pretty ad-hoc, only supports one category etc.

The compare-locales parsers are at
http://hg.mozilla.org/l10n/compare-locales/file/default/compare_locales/parser.py

Axel

Ricardo Palomares Martínez

unread,
Jul 21, 2015, 2:43:14 PM7/21/15
to tools...@lists.mozilla.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

El 20/07/15 a las 14:04, Axel Hecht escribió:
> ..properties is hard, as there are effectively two incompatible
> parsers used by our architectures.
>
> There's is one class that tries to reverse engineer what
> http://mxr.mozilla.org/mozilla/source/xpcom/ds/nsPersistentProperties.cpp#146



Thanks.

I'll
>
probably base my Java parser in the C++ code. Being a
Java guy, I'm increasingly seeing Python code as if it was written
in Cyrillic alphabet. :-)


> Then there's the stuff that gaia does, which resembles properties
> only on the surface, to be honest. We just recently learned that
> ':' is a key-value separator in gecko, but totally part of the
> key (and used as such) in gaia.


That's likely because C++ code follows more closely Java Properties
specification, that allows ":" to be used as key-value separator. :-)

http://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#load-java.io.Reader-


> All very much sad faces. Yay l20n to get rid of the gaia
> variant.
>
> The INI usage in gecko is pretty ad-hoc, only supports one
> category etc.
>
> The compare-locales parsers are at
> http://hg.mozilla.org/l10n/compare-locales/file/default/compare_locales/parser.py



Thank
>
you. The regexp for INI parser may be useful.

Once the application is mature enough to be used to translate
something else than DTD files, I'll let it the curtains currently
hiding it move aside a bit. :-)

Regards,

- --
Proyecto NAVE
Mozilla Localization Project, es-ES Team
http://www.proyectonave.es/
Diaspora: rick...@diasp.eu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIVAwUBVa6SkYfHLsWWKpy0AQil3RAAhHJLBE0bB/QnuXX+efUZ8ZuUoTml+giv
XnalxcrNA6DA7R4YMXMq9L/THiCfpMXcmrcswwo/sv1NgaK1kllG8uopaSER9FzZ
8WO+dW3Yg2vCZgkjzo9QWOIZg2EeLGPoMBTcT2zBend1fi5vyIHzGDP9X4UHCZiK
/K5GpKai2uzdhlk0kuIyHRgflzOPwvj94GTeHK2Buitr2wGTe0tMEXaxkWKfcsdX
jDJDU1dJ6JLdK2E5OaUKmN/zyc6HjJcmsC0iqGF0FkbeBKAk+j8Ygpl237a2rU63
3X+WOwSRtrUq0Z6zRT1uijCAkOj4b2uUaHxxG0XTUMlJ8HZSH1KmT3uxjnuFvSE3
5Jojv9K2LmQiQWhLLQoYO82CzOagsQcdF+qquiEvMooMzvn6dbvbeAAcCuDVxkGY
X7vnqtkqYvTgi1/N5SmVSrgD2q/fARTPmn3fzy3ez8x6IcjaoNfFBIiRdTp+qE1S
8Bo8FsU/cXtSQSDmTTSHjRKc5He8TRmVXSWWOyzIHBbyHv8lSx7giBs2IQHYH1hf
ZXSoapT/kGOtOWbiDHL0RA98+v/eNv4tsR+MwOgpFyGFKDMZ3ChrYmMluwyLPYT7
VPXmu6QpJTq/34/LKAt+iP8cA+iiamG7OG759tFhzTE/RWznEw4LscVw6K5um7be
0d/dLO3Mvoo=
=bTbE
-----END PGP SIGNATURE-----
0 new messages