Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Japanese (speaking) developer needed for a bit of regex magic

0 views
Skip to first unread message

Sebastian

unread,
Apr 20, 2010, 12:47:02 PM4/20/10
to python-amazon-p...@googlegroups.com
Hi all,

I'm working on Python bindings for the Amazon Product Advertising API
(http://pypi.python.org/pypi/python-amazon-product-api/) which
supports the different localised versions - among them a Japanese one
(for http://www.amazon.co.jp).

All locales return error messages in English. Only the Japanese uses
Japanese which my regular expressions cannot handle at the moment.

Is there anyone fluent enough in Japanese to give me a hand? The bit
of code that needed tweaking can be found here:
http://bitbucket.org/basti/python-amazon-product-api/src/tip/amazonproduct.py#cl-152

A simple diff would help me greatly.

Thanks for your effort!
Seb.

P.S. If you have questions, I've set up a mailing list at python-
amazon-produ...@googlegroups.com.

Ben Finney

unread,
Apr 21, 2010, 12:52:12 AM4/21/10
to
Sebastian <ba...@redtoad.de> writes:

> All locales return error messages in English. Only the Japanese uses
> Japanese which my regular expressions cannot handle at the moment.

What exactly are you expecting to happen, and what exactly happens
instead?

General advice with character sets in Python apply: always explicitly
declare the encoding of input, then decode to Unicode interally as early
as possible, and process all text that way. Only fix into an encoding
when it's time to output.

--
\ “I find the whole business of religion profoundly interesting. |
`\ But it does mystify me that otherwise intelligent people take |
_o__) it seriously.” —Douglas Adams |
Ben Finney

Chris Rebert

unread,
Apr 21, 2010, 3:07:36 AM4/21/10
to Ben Finney, pytho...@python.org
On Tue, Apr 20, 2010 at 9:52 PM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> Sebastian <ba...@redtoad.de> writes:
>> All locales return error messages in English. Only the Japanese uses
>> Japanese which my regular expressions cannot handle at the moment.
>
> What exactly are you expecting to happen, and what exactly happens
> instead?
>
> General advice with character sets in Python apply: always explicitly
> declare the encoding of input, then decode to Unicode interally as early
> as possible, and process all text that way. Only fix into an encoding
> when it's time to output.

I think he has more of a *literal* language problem: He doesn't know
Japanese and thus can't read the Japanese error message in order to
develop a regex for it. I assume there's some reason he can't just do
a blind equality test on the error message string(s).

Cheers,
Chris
--
http://blog.rebertia.com

Sebastian

unread,
Apr 21, 2010, 5:31:11 AM4/21/10
to
> General advice with character sets in Python apply: always explicitly
> declare the encoding of input, then decode to Unicode interally as early
> as possible, and process all text that way. Only fix into an encoding
> when it's time to output.

Maybe I was too vague when describing my problem. As Chris correctly
guessed, I have a literal language problem.

> > All locales return error messages in English. Only the Japanese uses
> > Japanese which my regular expressions cannot handle at the moment.
>
> What exactly are you expecting to happen, and what exactly happens
> instead?

My regular expressions turn the Amazon error messages into Python
exceptions.

This works fine as long as they are in English: "??? is not a valid
value for BrowseNodeId. Please change this value and retry your
request.", for instance, will raise an InvalidParameterValue
exception. However, the Japanese version returns the error message "???
は、BrowseNodeIdの値として無効です。値を変更してから、再度リクエストを実行してください。" which will not be
successfully handled.

This renders the my module pretty much useless for Japanese users.

I'm was therefore wondering if someone with more knowledge of Japanese
than me can have a look at my expressions. Maybe the Japanese messages
are completely different...

I have a collection of sample messages here (all files *-jp-*.xml):
http://bitbucket.org/basti/python-amazon-product-api/src/tip/tests/2009-11-01/

Any help is appreciated!

Cheers,
Sebastian

Ben Finney

unread,
Apr 21, 2010, 7:28:04 AM4/21/10
to
Sebastian <ba...@redtoad.de> writes:

> My regular expressions turn the Amazon error messages into Python
> exceptions.
>
> This works fine as long as they are in English: "??? is not a valid
> value for BrowseNodeId. Please change this value and retry your
> request.", for instance, will raise an InvalidParameterValue
> exception. However, the Japanese version returns the error message
> "??? は、BrowseNodeIdの値として無効です。値を変更してから、再度リクエス
> トを実行してください。" which will not be successfully handled.
>
> This renders the my module pretty much useless for Japanese users.

Your problem, then, appears to be that you're attacking the issue at the
wrong layer. Parsing messages in natural language and hoping to
reconstruct a structure is going to be an exercise in frustration.

Doesn't the API have defined response codes and parameters that you can
use, instead of parsing error strings in various natural languages?

--
\ “Giving every man a vote has no more made men wise and free |
`\ than Christianity has made them good.” —Henry L. Mencken |
_o__) |
Ben Finney

Sebastian

unread,
Apr 21, 2010, 7:46:48 AM4/21/10
to
> > My regular expressions turn the Amazon error messages into Python
> > exceptions.
>
> > This works fine as long as they are in English: "??? is not a valid
> > value for BrowseNodeId. Please change this value and retry your
> > request.", for instance, will raise an InvalidParameterValue
> > exception. However, the Japanese version returns the error message
> > "??? は、BrowseNodeIdの値として無効です。値を変更してから、再度リクエス
> > トを実行してください。" which will not be successfully handled.
>
> > This renders the my module pretty much useless for Japanese users.
>
> Your problem, then, appears to be that you're attacking the issue at the
> wrong layer. Parsing messages in natural language and hoping to
> reconstruct a structure is going to be an exercise in frustration.
>
> Doesn't the API have defined response codes and parameters that you can
> use, instead of parsing error strings in various natural languages?

No, unfortunately not. If it did, I would have used it.

The Amazon API returns an XML response which contains error messages
if a request fails. These messages consist of an error code and an
error description in natural language. Luckily, the description seems
to stick to the same format and is (in all but one case) in plain
English. Much to my dismay I discovered that the Japanese locale
translates the error message!

For example, this is the bit of XML returned for the German locale:

<Errors>
<Error>
<Code>AWS.InvalidParameterValue</Code>
<Message>??? is not a valid value for BrowseNodeId. Please
change this value and retry your request.</Message>
</Error>
</Errors>

The corresponding part from the Japanese locale looks like this:

<Errors>
<Error>
<Code>AWS.InvalidParameterValue</Code>
<Message>???
&#12399;&#12289;BrowseNodeId&#12398;&#20516;&#12392;&#12375;&#12390;&#28961;&#21177;&#12391;&#12377;&#12290;&#20516;&#12434;&#22793;&#26356;&#12375;&#12390;&#12363;&#12425;&#12289;&#20877;&#24230;&#12522;&#12463;&#12456;&#12473;&#12488;&#12434;&#23455;&#34892;&#12375;&#12390;&#12367;&#12384;&#12373;&#12356;&#12290;</
Message>
</Error>
</Errors>

Of course, one could argue that the type of error (in this case
"AWS.InvalidParameterValue") would be enough. However, in order to
return a maeningful error message, I would like to parse the
description itself - and for this some knowledge of Japanese would be
helpful.

Chris Rebert

unread,
Apr 21, 2010, 8:09:29 AM4/21/10
to Sebastian, pytho...@python.org

Just throwing this out there, but perhaps you could grep for the
relevant terms in the error message and intuit it from there?
For example:

# terms = whatever the actual param names are
terms = "BrowseNodeId FooNodeId FooQueryType".split()
for term in terms:
if term in err_msg:
raise AmazonError, err_code + " for " +repr(term)

Terry Reedy

unread,
Apr 21, 2010, 12:48:10 PM4/21/10
to pytho...@python.org
On 4/21/2010 7:46 AM, Sebastian wrote:

> The Amazon API returns an XML response which contains error messages
> if a request fails. These messages consist of an error code and an
> error description in natural language. Luckily, the description seems
> to stick to the same format and is (in all but one case) in plain
> English. Much to my dismay I discovered that the Japanese locale
> translates the error message!

Could you, when you get an error message, resubmit the request in the
standard locale so you get the messages in English? Or is the 'locale'
set by the url -- amazon.com versus amazon.co.jp?

After you parse, are you trying to formulate a substitute message in
Japanese?

Terry Jan Reedy

Terry Reedy

unread,
Apr 21, 2010, 12:54:05 PM4/21/10
to pytho...@python.org
On 4/21/2010 5:31 AM, Sebastian wrote:

> This works fine as long as they are in English:
> "??? is not a valid value for BrowseNodeId.
> Please change this value and retry your request.",
> for instance, will raise an InvalidParameterValue
> exception. However, the Japanese version returns the error message "???
> は、BrowseNodeIdの値として無効です。値を変更してから、再度リクエストを実行してください。"

My daughter, in 2nd year college Japanese, says that the above is
basically a translation of the English boilerplate. The only variable
info is 'BrowserNodeId', which you can read just fine already.
So we do not understand what your problem is and what you want to
accomplish.

> I have a collection of sample messages here (all files *-jp-*.xml):
> http://bitbucket.org/basti/python-amazon-product-api/src/tip/tests/2009-11-01/

Is this a commercial product? Are you willing to pay for serious help,
if needed?

Terry Jan Reedy


Sebastian

unread,
Apr 22, 2010, 2:27:46 AM4/22/10
to
> > This works fine as long as they are in English:
> > "??? is not a valid  value for BrowseNodeId.
>
>  >  Please change this value and retry your request.",
>  > for instance, will raise an InvalidParameterValue
>
> > exception. However, the Japanese version returns the error message "???
> > は、BrowseNodeIdの値として無効です。値を変更してから、再度リクエストを実行してください。"
>
> My daughter, in 2nd year college Japanese, says that the above is
> basically a translation of the English boilerplate. The only variable
> info is 'BrowserNodeId', which you can read just fine already.
> So we do not understand what your problem is and what you want to
> accomplish.
>
> > I have a collection of sample messages here (all files *-jp-*.xml):
> >http://bitbucket.org/basti/python-amazon-product-api/src/tip/tests/20...

>
> Is this a commercial product? Are you willing to pay for serious help,
> if needed?
>
> Terry Jan Reedy

I just wanted to know if the Japanese version said the same. I'll
probably simply return the error message in full. Any Japanese
(speaking) developer will then know what caused the exception.

Thanks for your help.

0 new messages