wxTextFile: write BOM at the beginning

111 views
Skip to first unread message

Alexandru Ciprian Branescu

unread,
Jan 8, 2014, 9:42:19 AM1/8/14
to wx-u...@googlegroups.com
Is it possible to write a BOM at the beginning of a Unicode encoded file? (I cannot find a way currently).

I found this ticket (http://trac.wxwidgets.org/ticket/11196) which proposes adding a special class for that, but it's surprising it is not present in wxTextFile.
Because it could be very easily added to the constructor or Write() function.

Can someone (dis) confirm this?

Alex.

Vadim Zeitlin

unread,
Jan 8, 2014, 10:04:11 AM1/8/14
to wx-u...@googlegroups.com
On Wed, 8 Jan 2014 06:42:19 -0800 (PST) Alexandru Ciprian Branescu wrote:

ACB> Is it possible to write a BOM at the beginning of a Unicode encoded file?
ACB> (I cannot find a way currently).

Well, you can always do it manually, it's not difficult at all, especially
if you always write the file in some fixed encoding.

ACB> I found this ticket (http://trac.wxwidgets.org/ticket/11196) which proposes
ACB> adding a special class for that, but it's surprising it is not present in
ACB> wxTextFile.
ACB> Because it could be very easily added to the constructor or Write()
ACB> function.
ACB>
ACB> Can someone (dis) confirm this?

I don't think it belongs to wxTextFile, this is something you should be
able to do with wxFile or wxFFile as well. I think we should just extract
the existing code in wxBOM class, as mentioned in the ticket, and provide a
way to create the BOM corresponding to the given encoding. Then you'd write
it into a file yourself trivially...

Regards,
VZ

--
TT-Solutions: wxWidgets consultancy and technical support
http://www.tt-solutions.com/

Alexandru Ciprian Branescu

unread,
Jan 8, 2014, 10:18:28 AM1/8/14
to wx-u...@googlegroups.com


On Wednesday, January 8, 2014 5:04:11 PM UTC+2, Vadim Zeitlin wrote:
 Well, you can always do it manually, it's not difficult at all, especially
if you always write the file in some fixed encoding.

Thanks for the answer.
Yes, of course, I just didn't want to touch code that doesn't need to be touched (replacing wxTextFile with wxFile) when there should be such a simple solution.
 
 I don't think it belongs to wxTextFile, this is something you should be
able to do with wxFile or wxFFile as well.

Well, why not? Since you are able to specify the conv object at Write() time, a little step further is only natural.

Vadim Zeitlin

unread,
Jan 8, 2014, 10:53:43 AM1/8/14
to wx-u...@googlegroups.com
On Wed, 8 Jan 2014 07:18:28 -0800 (PST) Alexandru Ciprian Branescu wrote:

ACB> > I don't think it belongs to wxTextFile, this is something you should be
ACB> > able to do with wxFile or wxFFile as well.
ACB>
ACB> Well, why not? Since you are able to specify the conv object at Write()
ACB> time, a little step further is only natural.

I'm not sure if using a conversion is the most intuitive way to handle
this. Granted, it's probably the most natural one from the point of view of
the existing API, but from an outside point of view, would you really
expect to have to create some wxBOMConv object and write to the file using
it?

Alexandru Ciprian Branescu

unread,
Jan 8, 2014, 1:19:37 PM1/8/14
to wx-u...@googlegroups.com
To be more clear, what I've done locally is to modify the header of wxTextFile/Buffer::Write():

bool Write(wxTextFileType typeNew = wxTextFileType_None, const wxMBConv& conv = wxConvAuto(), const char* charBOM = NULL, size_t lengthBOM = 0);

then at application level:

// write the UTF-8 BOM
size_t count=0;
const char* bom( wxConvAuto::GetBOMChars( wxBOM_UTF8, &count ) );
bool ok = file.Write( wxTextFileType_None, wxMBConvUTF8(), bom, count );

Of course, this introduces some redundancy (in theory the BOM could be deduced from wxMBConv type) but I prefer it over nothing.

Alex.

Vadim Zeitlin

unread,
Jan 8, 2014, 7:49:52 PM1/8/14
to wx-u...@googlegroups.com
On Wed, 8 Jan 2014 10:19:37 -0800 (PST) Alexandru Ciprian Branescu wrote:

ACB> To be more clear, what I've done locally is to modify the header of
ACB> wxTextFile/Buffer::Write():
ACB>
ACB> bool Write(wxTextFileType typeNew = wxTextFileType_None, const wxMBConv&
ACB> conv = wxConvAuto(), const char* charBOM = NULL, size_t lengthBOM = 0);
ACB>
ACB> then at application level:
ACB>
ACB> // write the UTF-8 BOM
ACB> size_t count=0;
ACB> const char* bom( wxConvAuto::GetBOMChars( wxBOM_UTF8, &count ) );
ACB> bool ok = file.Write( wxTextFileType_None, wxMBConvUTF8(), bom, count );
ACB>
ACB> Of course, this introduces some redundancy (in theory the BOM could be
ACB> deduced from wxMBConv type) but I prefer it over nothing.

But why do you need to modify anything to do it like this? Just prepend
the BOM data to the first line... The whole point of any specific API would
be to determine the BOM to use automatically.

One thing which we could definitely improve would be to allow obtaining
the BOM bytes in a less ugly way than by using wxConvAuto::GetBOMChars()
(which was never meant to be public anyhow BTW). As previously mentioned,
any simple patches adding such an API would be welcome.
Reply all
Reply to author
Forward
0 new messages