wxHTTP does not support unicode post buffer

113 views
Skip to first unread message

Eran Ifrah

unread,
Jan 15, 2012, 11:18:45 AM1/15/12
to wx-u...@googlegroups.com
I found 2 bugs in wxHTTP:

1) When a POST buffer is provided and there is no "Content-Lengh" header set, wxHTTP set it automatically for you however, the value set by this line:

  SetHeader( wxT("Content-Length"), wxString::Format( wxT("%lu"), (unsigned long)m_post_buf.Len() ) );

However, when sending the buffer, the content sent is done like this (which might result in different length() depends on the buffer content):

  const wxScopedCharBuffer buf(m_post_buf.To8BitData());
  Write(buf, buf.length());

2) If the buffer contains unicode string (e.g. Hebrew strings) the conversion "m_post_buf.To8BitData()" fails (i.e. buf.length() returns 0)
Which obviously breaks the HTTP protocol.

Attached is a patch that fixes both problems by checking the conversion result of m_post_buf.To8BitData() and if it fails, it tries to use the method:
m_post_buf.mb_str(wxConvUTF8)

It is not perfect, but it fixes most problems I encountered + it enables me to send unicode strings using the post buffer to my WebAPI service using wxHTTP.

An ideal solution will be to change the API SetPostBuffer(const wxString& buf) => SetPostBuffer(const wxMemoryBuffer& buf) and let the user make sure that the buffer is converted properly

I will try to send another patch later that puts some comments on http.h API (it was hard to find a good example of how to send a post buffer using this class)

--
Eran Ifrah
Author of the cross platform, open source C++ IDE: http://www.codelite.org
TimeWarden, a parental control software: http://www.smartkoders.com
http.cpp.patch

Vadim Zeitlin

unread,
Jan 15, 2012, 2:07:04 PM1/15/12
to wx-u...@googlegroups.com
[Please follow up to wx-dev if you're subscribed there]

On Sun, 15 Jan 2012 18:18:45 +0200 Eran Ifrah <eran....@gmail.com> wrote:

EI> 1) When a POST buffer is provided and there is no "Content-Lengh" header
EI> set, wxHTTP set it automatically for you however, the value set by this
EI> line:
EI>
EI> SetHeader( wxT("Content-Length"), wxString::Format( wxT("%lu"), (unsigned
EI> long)m_post_buf.Len() ) );
EI>
EI> However, when sending the buffer, the content sent is done like this (which
EI> might result in different length() depends on the buffer content):
EI>
EI> const wxScopedCharBuffer buf(m_post_buf.To8BitData());
EI> Write(buf, buf.length());
EI>
EI> 2) If the buffer contains unicode string (e.g. Hebrew strings) the
EI> conversion "m_post_buf.To8BitData()" fails (i.e. buf.length() returns 0)
EI> Which obviously breaks the HTTP protocol.

Both are wrong but I'm not sure about the fix:

EI> Attached is a patch that fixes both problems by checking the conversion
EI> result of m_post_buf.To8BitData() and if it fails, it tries to use the
EI> method:
EI> m_post_buf.mb_str(wxConvUTF8)

This means that the data sent will use different encoding depending on the
buffer contents which just doesn't seem right.

Also, I'm a bit hazy on HTTP here and don't have time to reread RFC 2616
now but don't we need to specify Content-Type if we encode the contents as
UTF-8?

EI> An ideal solution will be to change the API SetPostBuffer(const wxString&
EI> buf) => SetPostBuffer(const wxMemoryBuffer& buf) and let the user make sure
EI> that the buffer is converted properly

We could add such an overload. And/or we could also add wxMBConv parameter
to SetPostBuffer() to explicitly select the encoding. The main problem I
see is that we'd need to continue using Latin-1 and not UTF-8 as default
for compatibility.

EI> I will try to send another patch later that puts some comments on http.h
EI> API (it was hard to find a good example of how to send a post buffer using
EI> this class)

I'd appreciate if you could upload patches to the Trac. Please also see
http://trac.wxwidgets.org/wiki/HowToSubmitPatches, in particular the parts
about making the patch from wxWidgets directory or its parent and not using
the hard TABs.

TIA,
VZ

--
TT-Solutions: wxWidgets consultancy and technical support
http://www.tt-solutions.com/

Eran Ifrah

unread,
Jan 15, 2012, 2:48:36 PM1/15/12
to wx-u...@googlegroups.com
Indeed, in my code I manually set it using the SetHeader method:
SetHeader(wxT("Content-Type"), wxT("text/html; charset=UTF-8"));

It seems like a good idea to add it inside wxHTTP as well when using the conversion to wxUTF8
 
EI> An ideal solution will be to change the API SetPostBuffer(const wxString&
EI> buf) => SetPostBuffer(const wxMemoryBuffer& buf) and let the user make sure
EI> that the buffer is converted properly

 We could add such an overload. And/or we could also add wxMBConv parameter
to SetPostBuffer() to explicitly select the encoding. The main problem I
see is that we'd need to continue using Latin-1 and not UTF-8 as default
for compatibility.
 
IMO, adding new API is the way to go: "SetPostBuffer(const wxMemoryBuffer &buffer)" this way, it frees the wxHTTP class from guessing which conversion to use (it also does not make sense to me to use wxString when sending .gzip file, or images) and we are not bound by compatibility issues

EI> I will try to send another patch later that puts some comments on http.h
EI> API (it was hard to find a good example of how to send a post buffer using
EI> this class)

 I'd appreciate if you could upload patches to the Trac. Please also see
http://trac.wxwidgets.org/wiki/HowToSubmitPatches, in particular the parts
about making the patch from wxWidgets directory or its parent and not using
the hard TABs.

Will do 
 TIA,
VZ

--
TT-Solutions: wxWidgets consultancy and technical support
              http://www.tt-solutions.com/

Vadim Zeitlin

unread,
Jan 15, 2012, 5:29:11 PM1/15/12
to wx-u...@googlegroups.com
On Sun, 15 Jan 2012 21:48:36 +0200 Eran Ifrah <eran....@gmail.com> wrote:

EI> > EI> Attached is a patch that fixes both problems by checking the conversion
EI> > EI> result of m_post_buf.To8BitData() and if it fails, it tries to use the
EI> > EI> method:
EI> > EI> m_post_buf.mb_str(wxConvUTF8)
EI> >
EI> > This means that the data sent will use different encoding depending on the
EI> > buffer contents which just doesn't seem right.
EI> >
EI> > Also, I'm a bit hazy on HTTP here and don't have time to reread RFC 2616
EI> > now but don't we need to specify Content-Type if we encode the contents as
EI> > UTF-8?
EI> >
EI> Indeed, in my code I manually set it using the SetHeader method:
EI> SetHeader(wxT("Content-Type"), wxT("text/html; charset=UTF-8"));
EI>
EI> It seems like a good idea to add it inside wxHTTP as well when using the
EI> conversion to wxUTF8

Well, the point is that you can only call SetHeader() in your code if you
actually know what the charset is. And if SetPostBuffer() used whatever
encoding works, you wouldn't know it.

EI> IMO, adding new API is the way to go: "SetPostBuffer(const wxMemoryBuffer
EI> &buffer)" this way, it frees the wxHTTP class from guessing which
EI> conversion to use (it also does not make sense to me to use wxString when
EI> sending .gzip file, or images) and we are not bound by compatibility issues

SetPostBuffer(wxMemoryBuffer) or even SetPostBuffer(const void*, size_t)
would be fine. If you could make a patch for this it would be gratefully
accepted.

I still think we must do something in the existing SetPostBuffer(wxString)
though. Maybe we should just deprecate it? We could also add some
SetPostText(const wxString& mimetype, const wxString& text) which would add
the Content-Type header with the appropriate MIME type and tell people to
use this instead of SetPostBuffer(wxString) for text data.

In fact, arguably even SetPostBuffer(wxMemoryBuffer) should also take a
"mimetype" argument, shouldn't it?

Regards,

Eran Ifrah

unread,
Jan 15, 2012, 11:22:53 PM1/15/12
to wx-u...@googlegroups.com
I created new ticket with a modified patch that includes:
- Depreactes the current SetPostBuffer
- Adds new SetPostBuffer(const wxMemoryBuffer &postBuffer, const wxString &contentType);
- Adds new SetPostText(const wxString &postBuffer, const wxString &contentType, const wxMBConv& conv = wxConvLibc);
- Changed: wxString wxHTTP::m_post_buf => wxMemoryBuffer wxHTTP::m_postBuffer
- Added doxygen comments on top of the new methods

Here is the patch:

 I still think we must do something in the existing SetPostBuffer(wxString)
though. Maybe we should just deprecate it? We could also add some
SetPostText(const wxString& mimetype, const wxString& text) which would add
the Content-Type header with the appropriate MIME type and tell people to
use this instead of SetPostBuffer(wxString) for text data.

 In fact, arguably even SetPostBuffer(wxMemoryBuffer) should also take a
"mimetype" argument, shouldn't it?

Indeed. The new patch does that now
 
 Regards,
VZ

--
TT-Solutions: wxWidgets consultancy and technical support
              http://www.tt-solutions.com/

Catalin

unread,
Jan 16, 2012, 2:02:32 AM1/16/12
to wx-u...@googlegroups.com
From: Eran Ifrah <eran....@gmail.com>

- Adds new SetPostBuffer(const wxMemoryBuffer &postBuffer, const wxString &contentType);
- Adds new SetPostText(const wxString &postBuffer, const wxString &contentType, const wxMBConv& conv = wxConvLibc);

These 2 methods will be somewhat special as they will also set "Content-Type" header and there will be multiple ways of setting this header which might break some existing code. If this is final maybe there should be a note in the wxHTTP::SetHeader() docs mentioning that  "Content-Type" could be set/overwritten if SetPostXXX() is used.
Also, would they be more readable if the order of the [first] two parameters is reversed?

- Added doxygen comments on top of the new methods

These should be in `interface\wx\protocol\http.h` and mention that they are new, since 2.9.4.

Nitpick: IIRC the spacing for if-s is `if ( condition )`.
m_postBuffer does not need to be initialized explicitly, does it?

HTH,
C

Eran Ifrah

unread,
Jan 16, 2012, 2:12:42 AM1/16/12
to wx-u...@googlegroups.com
On Mon, Jan 16, 2012 at 9:02 AM, Catalin <cat...@yahoo.com> wrote:
From: Eran Ifrah <eran....@gmail.com>

- Adds new SetPostBuffer(const wxMemoryBuffer &postBuffer, const wxString &contentType);
- Adds new SetPostText(const wxString &postBuffer, const wxString &contentType, const wxMBConv& conv = wxConvLibc);

These 2 methods will be somewhat special as they will also set "Content-Type" header
Yes they will set it, however the comment I placed on top of the each function explicitly mentions that

 and there will be multiple ways of setting this header which might break some existing code. If this is final maybe there should be a note in the wxHTTP::SetHeader() docs mentioning that  "Content-Type" could be set/overwritten if SetPostXXX() is used.
No it will not break current code, since I only set it if it is not empty.

          if(!m_contentType.IsEmpty())
            {
                SetHeader(wxT("Content-Type"), m_contentType);
            }
So in the current code, m_contentType member is empty so it will not break anything in this regard

Also, would they be more readable if the order of the [first] two parameters is reversed?
Not to me... but it a matter of personal taste here
 
- Added doxygen comments on top of the new methods

These should be in `interface\wx\protocol\http.h` and mention that they are new, since 2.9.4.

Nitpick: IIRC the spacing for if-s is `if ( condition )`.
m_postBuffer does not need to be initialized explicitly, does it?

HTH,
C

--
Please read http://www.wxwidgets.org/support/mlhowto.htm before posting.
 
To unsubscribe, send email to wx-users+u...@googlegroups.com
or visit http://groups.google.com/group/wx-users

Eran Ifrah

unread,
Jan 16, 2012, 6:09:34 AM1/16/12
to wx-u...@googlegroups.com
FYI: I updated the patch at trac so it is now against trunk and not 2.9.3

Vadim Zeitlin

unread,
Jan 16, 2012, 8:11:14 AM1/16/12
to wx-u...@googlegroups.com
On Sun, 15 Jan 2012 23:02:32 -0800 (PST) Catalin <cat...@yahoo.com> wrote:

C> From: Eran Ifrah <eran....@gmail.com>
C> >
C> >- Adds new SetPostBuffer(const wxMemoryBuffer &postBuffer, const wxString &contentType);
C> >
C> >- Adds new SetPostText(const wxString &postBuffer, const wxString &contentType, const wxMBConv& conv = wxConvLibc);
C>
C> These 2 methods will be somewhat special as they will also
C> set "Content-Type" header and there will be multiple ways of setting
C> this header which might break some existing code. If this is final maybe
C> there should be a note in the wxHTTP::SetHeader() docs mentioning
C> that  "Content-Type" could be set/overwritten if SetPostXXX() is used.

Not only this but I think that the current attempts to play nice with
explicit SetHeader("Content-Type") calls are more confusing than useful
because I'd expect SetPostXXX() to override "Content-Type" even if it had
been set before but even more importantly I'd expect an explicit
SetHeader() to override the content type set by SetPostXXX() if it's called
after it but this doesn't happen now.

I'm not sure what's the best way to deal with this however. We probably
should always use whatever was passed to SetHeader(), even if it was called
before SetPostXXX() as this would be the simplest (to explain, if not to
implement) and would avoid changing the behaviour of old code using
SetPostBuffer(wxString) when it's updated to use one of the new overloads.
This means that the patch needs to be modified to remember if
SetHeader("Content-Type") was called however.

C> Also, would they be more readable if the order of the [first] two
C> parameters is reversed?

I think so, yes. The string and its encoding should go together. However
then it becomes inconsistent with SetPostBuffer(wxMemoryBuffer)... So the
best would be probably to use the following signatures:

SetPostBuffer(wxString contentType, wxMemoryBuffer)
SetPostText(wxString contentType, wxString buf, wxMBConv)

What do you think?


C> > - Added doxygen comments on top of the new methods
C>
C> These should be in `interface\wx\protocol\http.h` and mention that they
C> are new, since 2.9.4.

Exactly. It's another thing which must be done before applying this.

C> Nitpick: IIRC the spacing for if-s is `if ( condition )`.

Yes, this would be more consistent with the existing code in this file.

C> m_postBuffer does not need to be initialized explicitly, does it?

No. Neither does m_contentType.


Eran, would it be possible for you to update the patch as discussed if you
agree with the points above?

TIA,

Catalin

unread,
Jan 16, 2012, 9:23:37 AM1/16/12
to wx-u...@googlegroups.com
From: Vadim Zeitlin <va...@wxwidgets.org>

So the
best would be probably to use the following signatures:

    SetPostBuffer(wxString contentType, wxMemoryBuffer)
    SetPostText(wxString contentType, wxString buf, wxMBConv)

Yes, this is what I had in mind myself.

Sounds good that SetHeader() should have higher priority than SetPostXXX in setting the header. However calling SetPostXXX twice without any SetHeader() should still update contentType.

Regards,
C

Eran Ifrah

unread,
Jan 16, 2012, 10:52:01 AM1/16/12
to wx-u...@googlegroups.com
I updated the patch so now if a "Content-Type" header is present, it will not be overwritten
 
C> Also, would they be more readable if the order of the [first] two
C> parameters is reversed?

 I think so, yes. The string and its encoding should go together. However
then it becomes inconsistent with SetPostBuffer(wxMemoryBuffer)... So the
best would be probably to use the following signatures:

       SetPostBuffer(wxString contentType, wxMemoryBuffer)
       SetPostText(wxString contentType, wxString buf, wxMBConv)

What do you think?

Done
 

C> > - Added doxygen comments on top of the new methods
C>
C> These should be in `interface\wx\protocol\http.h` and mention that they
C> are new, since 2.9.4.

I added @since 2.9.4 to both of the new functions
 
 Exactly. It's another thing which must be done before applying this.

C> Nitpick: IIRC the spacing for if-s is `if ( condition )`.

 Yes, this would be more consistent with the existing code in this file.

C> m_postBuffer does not need to be initialized explicitly, does it?

 No. Neither does m_contentType.


 Eran, would it be possible for you to update the patch as discussed if you
agree with the points above?

Ticket updated
 
 TIA,
VZ

--
TT-Solutions: wxWidgets consultancy and technical support
              http://www.tt-solutions.com/
Reply all
Reply to author
Forward
0 new messages