Character sets and CSV report file downloads

724 views
Skip to first unread message

Reed

unread,
Jan 12, 2011, 4:38:28 PM1/12/11
to AdWords API Forum
I'm using the reporting service to define and then download keyword
performance files in CSV format. I've started loading data for a
client in China, and am running into a character-set problem. My
calls to the 201008 API to pull XML files for campaign, etc., data
using my exsiting application works fine - the XML files contain the
chinese characters. But my application to define and then download
the reporting service CSV files is bring back the chinese character
strings as mangled data. How do I control the character set for the
report service's data files?

Thanks!
-reed

Jai

unread,
Jan 12, 2011, 5:28:27 PM1/12/11
to adwor...@googlegroups.com
Hi Reed,
csv files are do not handle multi-byte characters well. You could try having the multi-byte field within double quotes or export it as utf-8 txt file.

--
Jai

Reed

unread,
Jan 12, 2011, 8:42:14 PM1/12/11
to AdWords API Forum
I disagree. I pull csv formatted data files from Baidu's API (which
looks a lot like the adWords API interface from a few generations ago)
with no problem.
-reed

AdWords API Advisor

unread,
Jan 12, 2011, 11:14:28 PM1/12/11
to AdWords API Forum
Hi Reed,

Do you have a report definition id I can refer to?

Cheers,
Anash P. Oommen,
AdWords API Advisor.

Reed

unread,
Jan 13, 2011, 8:30:13 AM1/13/11
to AdWords API Forum
The report ID is 60440936 - I have sent a sample file to the private
email address.

thanks
Reed

Reed

unread,
Jan 13, 2011, 12:06:27 PM1/13/11
to AdWords API Forum
FYI, I have already tried specifying a character set of either "BIG5"
or "GB2312" in the HTTP GET to download the file, but that doesn't
seem to help. When I pull the csv files with Chinese characters in
them from the Baidu API that's what I do to get the correct encoding.
I think I need some way to tell the AdWords API how to encode the data
on its end...

thanks
Reed

Reed

unread,
Jan 13, 2011, 1:53:53 PM1/13/11
to AdWords API Forum
More details. Below are the http headers from AdWords and from Baidu
for pulling CSV files via http GET that have Chinese characters in
them. Note that AdWords forces the character set to UTF-8, where
Baidu doesn't specify it. I think this is pertinent, because I can
tell Java what character set to interpret the data as for the Baidu
download, and that seems to work. But it is also important to
remember that the only character set the Baidu API has to deal with is
for Chinese characters, whereas AdWords has to deal with all character
sets. As we start pulling other non-Latin character set performance
data in CSV format files, this is going to become an important need.

Adwords:
HTTP/1.1 200 OK Content-Disposition: attachment; filename="Keyword
Perf Rpt Without Conversions.csv" Connection: close Expires: Fri, 01
Jan 1990 00:00:00 GMT Date: Thu, 13 Jan 2011 18:38:06 GMT Server: GSE
X-Frame-Options: SAMEORIGIN Pragma: no-cache Cache-Control: no-cache,
no-store, max-age=0, must-revalidate X-XSS-Protection: 1; mode=block
Content-Type: text/csv; charset=UTF-8 X-Content-Type-Options: nosniff

Baidu:
HTTP/1.1 200 OK Content-Disposition: attachment;
filename=20110113-190532.csv Connection: close Accept-Ranges: bytes
Date: Thu, 13 Jan 2011 11:15:21 GMT Server: Apache Content-Length:
27308 ETag: "1ed4076-6aac-499b84bd31700" Last-Modified: Thu, 13 Jan
2011 11:05:32 GMT Content-Type: text/csv

Hope this helps narrow things down,
-reed

Reed

unread,
Jan 14, 2011, 8:01:29 AM1/14/11
to AdWords API Forum
This problem just gets stranger and stranger. I think I found a
"solution" - if I specify CSVFOREXCEL as the report format instead of
CSV, then I get a double-byte file that Excel opens and shows the
correct Chinese characters. BUT - the file is tab-delimited, not
comma delimited, despite the format being CSVFOREXCEL.

??
-reed

AdWords API Advisor

unread,
Jan 18, 2011, 2:06:32 AM1/18/11
to AdWords API Forum
Hi Reed,

I've opened issues with the AdWords team for both the cases you
reported. I'll update you once I hear from them.

Cheers,
Anash P. Oommen,
AdWords API Advisor.

AdWords API Advisor

unread,
Jan 19, 2011, 1:18:49 AM1/19/11
to AdWords API Forum
Hi Reed,

A quick followup - I tried running reports against a test account with
Chinese ads yesterday and noticed that CSV report format is actually
encoded in UTF-8. So if your program were UTF-8 aware, then the data
wouldn't be mangled when you read it.

Excel does not recognize the UTF-8 csv file format, it expects csv
files to be ascii encoded, and hence displays mangled characters when
you open the downloaded csv in excel. That's a different issue, and is
not due to report contents encoded incorrectly.

Cheers,
Anash P. Oommen,
AdWords API Advisor.

On Jan 18, 12:06 pm, AdWords API Advisor
Reply all
Reply to author
Forward
0 new messages