SOAP XML error for Chinese Names in ManagedCustomerService

186 views
Skip to first unread message

maxSonic Sun

unread,
Jan 20, 2016, 7:22:09 PM1/20/16
to AdWords API Forum
Hi,

I have the error as shown in the first pic below. It seems that it is caused by the decoding of the soap xm. The second pic  is my code. However, if I remove the "name" from the selector, everything works fine.
My customers have lots of Chinese Name in their MCC accounts. I think it is caused by the unicode problem of python and both the Chinese. How can I solve this problem to avoid this kind of error?



Mark Saniscalchi

unread,
Jan 21, 2016, 8:39:42 PM1/21/16
to AdWords API Forum
Hello,

We aren't able to replicate this issue with account names containing unicode characters. In my case, I ran the get_account_hierarchy.py sample and got the expected result:

CustomerId, Name
--Redacted, マーク Test Account

I think this might have to do with something else. In the stacktrace, I noticed that you aren't using the sax parser embedded in the suds-jurko library. On even closer observation, I noticed that you weren't even using the suds-jurko library. This is almost definitely the cause of the problem, as I don't think our library is compatible with the original suds library, which is now woefully unmaintained. I suggest uninstalling both googleads and the suds library you're using currently. Then reinstall googleads with pip using the following command:

pip install googleads

Our setup.py file should then install all of the necessary dependencies automatically, which should resolve this problem.

Regards,
Mark Saniscalchi

maxSonic Sun

unread,
Feb 1, 2016, 7:51:27 AM2/1/16
to AdWords API Forum
Hi,

There is something weird here, I am able to parsed some other unicode characters and also other accounts of my customer. I have listed all the pip lib in my machine, I think I have installed all the lib correctly. By the way, I didn't change any of the import in the py file, how can it be changed to other libs if I do nothing in the lib?

This issue is also happened in the Java lib for the same account of our customer.

After some debugging, it seems like there will be noway to parsed the id like: 15-abc-1-mmep-中国-IOS. Please notice that the - is different from -.

Best Regards,
Sonic Sun

Mark Saniscalchi

unread,
Feb 1, 2016, 3:32:44 PM2/1/16
to AdWords API Forum
Hello Sonic,

I'd just like to confirm, could you tell me what the version number is in the following file:

/project/apiservice/venvdocker/local/lib/python2.7/site-packages/suds/version.py

Thanks,
Mark

maxSonic Sun

unread,
Feb 2, 2016, 2:12:24 AM2/2/16
to AdWords API Forum
Hi Mark,

Here is the version:
__version__ = "0.6"
__build__ = ""

Best Regards
Sonic Sun

Mark Saniscalchi

unread,
Feb 5, 2016, 5:08:31 PM2/5/16
to AdWords API Forum
Hello Sonic,

That looks correct, so you are using the right version of suds-jurko at least.

On my workstation, I created a campaign with the name that you're having issues parsing and ran the get_campaigns.py example. I ran (sort of) without issues:

INFO:oauth2client.client:Refreshing access_token
DEBUG:suds.transport.http:opening (https://adwords.google.com/api/adwords/cm/v201509/CampaignService?wsdl)
DEBUG:suds.transport.http:sending:
URL: https://adwords.google.com/api/adwords/cm/v201509/CampaignService
HEADERS: {'Soapaction': '""', 'SOAPAction': '""', 'Content-Type': 'text/xml; charset=utf-8', 'Content-type': 'text/xml; charset=utf-8', 'Authorization': u'REDACTED'}
MESSAGE:
<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="https://adwords.google.com/api/adwords/cm/v201509" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Header><tns:RequestHeader><tns:clientCustomerId>REDACTED</tns:clientCustomerId><tns:developerToken>REDACTED</tns:developerToken><tns:userAgent>REDACTED</tns:userAgent><tns:validateOnly>false</tns:validateOnly><tns:partialFailure>false</tns:partialFailure></tns:RequestHeader></SOAP-ENV:Header><ns0:Body><ns1:get><ns1:serviceSelector><ns1:fields>Id</ns1:fields><ns1:fields>Name</ns1:fields><ns1:fields>Status</ns1:fields><ns1:paging><ns1:startIndex>0</ns1:startIndex><ns1:numberResults>100</ns1:numberResults></ns1:paging></ns1:serviceSelector></ns1:get></ns0:Body></SOAP-ENV:Envelope>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/logging/__init__.py", line 859, in emit
    msg = self.format(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 732, in format
    return fmt.format(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 471, in format
    record.message = record.getMessage()
  File "/usr/local/lib/python2.7/logging/__init__.py", line 335, in getMessage
    msg = msg % self.args
  File "/usr/local/lib/python2.7/dist-packages/suds/__init__.py", line 168, in <lambda>
    __str__ = lambda x: unicode(x).encode('utf-8')
  File "/usr/local/lib/python2.7/dist-packages/suds/transport/__init__.py", line 96, in __unicode__
    %s""" % (self.code, self.headers, self.message)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 577: ordinal not in range(128)
Logged from file http.py, line 89
Campaign with id '374649914', name '15-abc-1-mmep-中国-IOS', and status 'ENABLED' was found.

The error you're seeing there is a known issue that the suds logger has with unicode characters, but it doesn't prevent the sample from completing. As you can see, the Campaign's name can be retrieved and printed without issues. This definitely seems like an issue specific to the environment the code is run on. I went a step further and ran this on a separate VM instance, and was able to reproduce it in that case.

The workstation has its default character encoding set to utf-8 and the VM has it set to ANSIX3.4-1968, which is probably related. I think suds might have some flaky behavior here depending on the environment used. In the meantime, you can avoid this error by encoding the output of string fields such as:

    # Display results.
    if 'entries' in page:
      for campaign in page['entries']:
        print ('Campaign with id \'%s\', name \'%s\', and status \'%s\' was '
               'found.' % (campaign['id'], campaign['name'].encode('utf-8'),
                           campaign['status'].encode('utf-8')))

I'll continue investigating this, as a better fix may need to come upstream from the suds library.

Regards,
Mark

maxSonic Sun

unread,
Feb 10, 2016, 7:20:56 AM2/10/16
to AdWords API Forum
Hi Mark,

Thanks for your reply :)

I will try your solution to walk around the issue. Will report to you if I encounter any other problem.

Best Regards,
Sonic Sun

maxSonic Sun

unread,
Feb 10, 2016, 2:09:18 PM2/10/16
to AdWords API Forum
Hi Mark,

I have tried and it seems that the code will not run into the suds client :(
In the code below, the graph will always be {}, leaving no way to have any content in it. And the error message is "<unknown>:1:29168: not well-formed (invalid token)", I think this message is from the xml parsing part, not the logger.
        graph = dict()
        try:
            graph = customer_service.get(selector)
        except Exception as e:
            print str(e)

I tried to print the xml by using the MessagePlugin's receive method and the xml could be printed without error. And I tried to encode and decode it into utf-8, it was fine in the receive method, but outside that method, the error message occurred, i.e "<unknown>:1:29168: not well-formed (invalid token)".
After debugging for a while, it turned out that it may be the problem of suds, however, as I have the same problem with the java client, I think that the SOAP XML from google may contain some illegal thing.

I have noticed that you are using the campaign service, could you use the ManageCustomerService, though I cannot see the code from google side, I suspect that there is a problem with the generation for the customer service xml. The code example is https://github.com/googleads/googleads-python-lib/blob/master/examples/adwords/v201509/account_management/get_account_hierarchy.py

Best Regards,
Sonic Sun

maxSonic Sun

unread,
Feb 19, 2016, 2:42:16 PM2/19/16
to AdWords API Forum
Hi Mark, 

Any update?

Best Regards,
Sonic

Mark Saniscalchi

unread,
Feb 19, 2016, 4:44:58 PM2/19/16
to AdWords API Forum
Hello Sonic,

Sorry for the delay. If you look to my initial response, I did run and successfully retrieve an account name containing utf-8 characters using the get_account_hierarchy.py example in v201509 using Python 2. This was on a machine where the default encoding is utf-8. I provided the code above as a potential work-around, but if that still fails to work, I would suggest modifying the default encoding used by your VM to utf-8.

You may recall earlier that I was able to replicate your issue on a VM using ANSII encoding. I was also able to resolve the problem on that VM by modifying the default encoding to utf-8. The suds-jurko library doesn't seem to state this as a requirement, but it doesn't seem to handle utf-8 characters well otherwise. Going forward, we'll be suggesting in our documentation that setting the default encoding to utf-8 is a requirement.

Regards,
Mark

maxSonic Sun

unread,
Feb 20, 2016, 7:14:47 AM2/20/16
to AdWords API Forum
Hi Mark,

On my server machine, the locale is
ubuntu@ip-10-0-24-250:~$ locale -a
C
C.UTF-8
en_US.utf8
POSIX

But I still have the problem. The system I have is ubuntu 14.04 server on aws. That should not be the ANSII encoding problem. 
And by the way, my problem is not caused by the logger as you pointed out in the previous post, it is caused by the decoding of the xml. The scenario you pointed out happened nowhere on my machine.

As you can see, is a SAX parser problem.

Mark Saniscalchi

unread,
Feb 23, 2016, 8:47:30 PM2/23/16
to AdWords API Forum
Hello Sonic,

I can see that you have en_US.utf8 as an available locale, but is it actually being used? What is the output of echo $LANG? I've reproduced and resolved your issue by setting the locale to a different value, so I suspect you might not actually have that set correctly on the VM.

I pointed out the logger issue earlier because it appeared in the output and was not related to your case, so I wanted to make that clear. Sorry if that instead caused any confusion.

Regards,
Mark

maxSonic Sun

unread,
Feb 24, 2016, 11:27:19 AM2/24/16
to AdWords API Forum
Hi Mark,

Same encoding.

ubuntu@ip-10-0-26-250:~$ echo $LANG
en_US.UTF-8

Best Regards
Sonic

maxSonic Sun

unread,
Mar 7, 2016, 3:03:06 PM3/7/16
to AdWords API Forum
Hi Mark,

Any update?

Best Regards,
Sonic Sun
Reply all
Reply to author
Forward
0 new messages