postBody text encoding

bdolman

unread,

Nov 18, 2011, 3:03:18 PM11/18/11

to HTTP Archive Specification

First, a thank you to all those who have worked hard to define the HAR
spec. I use the Charles web debugging proxy and having HAR support has
been really great.

I want to create a HAR file of a PUT request. The PUT request uploads
a binary file. As expected, in the resulting HAR file I can see my
uploaded file data in "request.postData.text". "text" is defined in
the spec as "Plain text posted data". Therefore, there is some
encoding that has to happen to represent my binary file as text. But
since there is not an "encoding" attribute in the "postData"
structure, how do I know how to decode "text" to get back to the
original binary data?

In the case of Charles, when it creates the HAR file it treats the
original binary data as text in some platform-specific encoding (on OS
X it chooses MacRoman), then converts that to Unicode and encodes non-
ASCII characters as Unicode escape sequences in the form "\u00ce".
This is bad, because if I'm writing a HAR parser I would need to know
that the original encoding was MacRoman so I can do the Unicode ->
MacRoman -> Binary transformation. But again, this isn't Charles
fault, it's the spec's.

It seems this problem was solved in 1.2 for response data by adding an
"encoding" attribute. Could this also be added for request data?

Thanks,
Ben

Jan Honza Odvarko

unread,

Nov 21, 2011, 3:53:00 AM11/21/11

to http-archive-...@googlegroups.com

Yep, I think this is really missing.

Included in a list of possible suggestions for HAR 1.3 (just posted)
https://groups.google.com/d/topic/http-archive-specification/9wYrqin3Fsc/discussion

Honza

Karl von Randow

unread,

Nov 19, 2011, 1:28:38 PM11/19/11

to HTTP Archive Specification

On Nov 19, 9:03 am, bdolman <ben.dol...@gmail.com> wrote:
> <snip>

> I want to create a HAR file of a PUT request. The PUT request uploads
> a binary file. As expected, in the resulting HAR file I can see my
> uploaded file data in "request.postData.text". "text" is defined in
> the spec as "Plain text posted data". Therefore, there is some
> encoding that has to happen to represent my binary file as text. But
> since there is not an "encoding" attribute in the "postData"
> structure, how do I know how to decode "text" to get back to the
> original binary data?
>
> In the case of Charles, when it creates the HAR file it treats the
> original binary data as text in some platform-specific encoding (on OS
> X it chooses MacRoman), then converts that to Unicode and encodes non-
> ASCII characters as Unicode escape sequences in the form "\u00ce".
> This is bad, because if I'm writing a HAR parser I would need to know
> that the original encoding was MacRoman so I can do the Unicode ->
> MacRoman -> Binary transformation. But again, this isn't Charles
> fault, it's the spec's.
>
> It seems this problem was solved in 1.2 for response data by adding an
> "encoding" attribute. Could this also be added for request data?

I'm the developer of Charles. It's interesting that Charles just
chooses the platform default encoding in this instance. I'm going to
change Charles to use ISO-8859-1 consistently. I like ISO-8859-1 for
this purpose as it isn't a lossy conversion when interpreting an array
of bytes, whereas UTF-8 can be in the instance of invalid sequences I
think. MacRoman has the same properties, so this change shouldn't have
any material impact.

I agree an "encoding" attribute for postData seems reasonable and
consistent, we'd then include the body as base64 (if necessary) and no
encoding issues would exist. We might need to use base64 whenever we
don't know the encoding, as otherwise we end up doing some undefined
transcoding.

cheers,
Karl

Reply all

Reply to author

Forward