UTF-8 to EBCDIC conversion w/ iconv()

Mr. K.V.B.L.

unread,

Feb 11, 2009, 5:31:32 PM2/11/09

to

I need a resource to find the correct CCSID string for the iconv()
routine.

Either something like
"IBMCCSID00367"
or
"IBMCCSID000370000101"

works or me but I can't seem to get a UTF-8 thing going. I tried

IBMCCSID012080000101 along with
IBMCCSID01208

but the iconv() routine seems to be barfing on anything with 1208 in
it. I saw an older post about this but haven't found any solution yet.

Message has been deleted

WDS

unread,

Feb 11, 2009, 8:00:53 PM2/11/09

to

Here's part of a program that I use when I need to figure out
something about iconv. I think it shows what you need to know.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iconv.h>
#include <errno.h>

/* iconv conversions */
iconv_t ASCII_to_EBCDIC;
iconv_t ASCII_to_UTF8;
iconv_t EBCDIC_to_ASCII;
iconv_t EBCDIC_to_UTF8;
iconv_t UTF8_to_ASCII;
iconv_t UTF8_to_EBCDIC;

bool createConversion(iconv_t& theConversion, char *from, char* to) {
theConversion = iconv_open(to, from);
if (-1 == theConversion.return_value) {
printf("iconv_open for %s to %s failed, errno=%d\n",
from, to,
errno);
return false;
}
printf("iconv_open for %s to %s worked\n", from, to);
return true;

}

bool iconvSetup() {
char ASCII_to[33];
char ASCII_fr[33];
char EBCDIC_to[33];
char EBCDIC_fr[33];
char UTF8_to[33];
char UTF8_fr[33];

printf("Setting up iconv\n");
/* Note: iconv assumes job CCSID (probably 37) so we have to force
strings to that CCSID */
#pragma convert(37)
memset(ASCII_to, '\0', 33);
memset(ASCII_fr, '\0', 33);
memset(EBCDIC_to, '\0', 33);
memset(EBCDIC_fr, '\0', 33);
strcpy(ASCII_to, "IBMCCSID01252");
strcpy(ASCII_fr, "IBMCCSID012520000000");
strcpy(EBCDIC_to, "IBMCCSID00037");
strcpy(EBCDIC_fr, "IBMCCSID000370000000");
strcpy(UTF8_to, "IBMCCSID01208");
strcpy(UTF8_fr, "IBMCCSID012080000000");
#pragma convert(0)

bool everythingWorked = createConversion(ASCII_to_EBCDIC,
ASCII_fr,
EBCDIC_to);
everythingWorked = createConversion(ASCII_to_UTF8, ASCII_fr,
UTF8_to) && everythingWorked;
everythingWorked = createConversion(EBCDIC_to_ASCII,
EBCDIC_fr,
ASCII_to) && everythingWorked;
everythingWorked = createConversion(EBCDIC_to_UTF8,
EBCDIC_fr,
UTF8_to) && everythingWorked;
everythingWorked = createConversion(UTF8_to_ASCII, UTF8_fr,
ASCII_to) && everythingWorked;
everythingWorked = createConversion(UTF8_to_EBCDIC, UTF8_fr,
EBCDIC_to) && everythingWorked;

return everythingWorked;

}

...etc...

(corrected, I hope...)

Mr. K.V.B.L.

unread,

Feb 12, 2009, 10:22:49 AM2/12/09

to

I scanned through your code and I immediately saw your comment that
iconv() assumes strings to be in EBCDIC. I didn't know that.

I got these results however. Anything with 1208, she don't fly.
errno 3021 is invalid argument. I'm running on a V6R1 system here.

iconv_open for IBMCCSID012520000000 to IBMCCSID00037
worked
iconv_open for IBMCCSID012520000000 to IBMCCSID01208 failed,
errno=3021
iconv_open for IBMCCSID000370000000 to IBMCCSID01252
worked
iconv_open for IBMCCSID000370000000 to IBMCCSID01208 failed,
errno=3021
iconv_open for IBMCCSID012080000000 to IBMCCSID01252 failed,
errno=3021
iconv_open for IBMCCSID012080000000 to IBMCCSID00037 failed,
errno=3021

I have a shell script called setccsid. It can convert a file to 1208
and it will display correctly, so I know UTF-8 is supported on that
system. Just don't know why it's having trouble here.

WDS

unread,

Feb 12, 2009, 11:38:11 AM2/12/09

to

On Feb 12, 9:22 am, "Mr. K.V.B.L." <kenverybigl...@gmail.com> wrote:
> I got these results however. Anything with 1208, she don't fly.
> errno 3021 is invalid argument. I'm running on a V6R1 system here.

I apologize. I extracted parts from the whole test program and I left
out some crucial bits in what I posted. Here is more of it that has
the missing parts that were causing you grief. I just tried it on
both v5r4 and v6r1 and it worked on both.

Note that the PASE/AIX parts are what I was experimenting with and I
don't remember if they worked OK or not so no guarantees there.

/*
*/
#ifdef __OS400__
#warning __Compiling for native IBM i
#else
#warning __Compiling for PASE/AIX
#endif

/* Please note that this example was created in a very short amount of
time and is not tested thoroughly */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iconv.h>
#include <errno.h>

/* iconv conversions */
iconv_t ASCII_to_EBCDIC;
iconv_t ASCII_to_UTF8;
iconv_t EBCDIC_to_ASCII;
iconv_t EBCDIC_to_UTF8;
iconv_t UTF8_to_ASCII;
iconv_t UTF8_to_EBCDIC;

bool createConversion(iconv_t& theConversion, char *from, char* to) {
theConversion = iconv_open(to, from);

#ifdef __OS400__
if (-1 == theConversion.return_value) {
#else
if ((iconv_t)-1 == theConversion) {
#endif

printf("iconv_open for %s to %s failed, errno=%d\n", from, to,
errno);
return false;
}
printf("iconv_open for %s to %s worked\n", from, to);
return true;
}

bool iconvSetup() {
char ASCII_to[99];
char ASCII_fr[99];
char EBCDIC_to[99];
char EBCDIC_fr[99];
char UTF8_to[99];
char UTF8_fr[99];

printf("Setting up iconv\n");

#ifdef __OS400__

/* Note: iconv assumes job CCSID (probably 37) so we have to force
strings to that CCSID */
#pragma convert(37)

#endif
memset(ASCII_to, '\0', 99);
memset(ASCII_fr, '\0', 99);
memset(EBCDIC_to, '\0', 99);
memset(EBCDIC_fr, '\0', 99);
memset(UTF8_to, '\0', 99);
memset(UTF8_fr, '\0', 99);
#ifdef __OS400__

strcpy(ASCII_to, "IBMCCSID01252");
strcpy(ASCII_fr, "IBMCCSID012520000000");
strcpy(EBCDIC_to, "IBMCCSID00037");
strcpy(EBCDIC_fr, "IBMCCSID000370000000");
strcpy(UTF8_to, "IBMCCSID01208");
strcpy(UTF8_fr, "IBMCCSID012080000000");

#else
strcpy(ASCII_to, "ISO8859-1");
strcpy(ASCII_fr, "ISO8859-1");
strcpy(EBCDIC_to, "IBM-037");
strcpy(EBCDIC_fr, "IBM-037");
strcpy(UTF8_to, "UTF-8");
strcpy(UTF8_fr, "UTF-8");
#endif
#ifdef __OS400__
#pragma convert(0)
#endif

bool everythingWorked = true;
everythingWorked = createConversion(ASCII_to_EBCDIC, ASCII_fr,
EBCDIC_to) && everythingWorked;

everythingWorked = createConversion(ASCII_to_UTF8, ASCII_fr,
UTF8_to) && everythingWorked;
everythingWorked = createConversion(EBCDIC_to_ASCII, EBCDIC_fr,
ASCII_to) && everythingWorked;
everythingWorked = createConversion(EBCDIC_to_UTF8, EBCDIC_fr,
UTF8_to) && everythingWorked;
everythingWorked = createConversion(UTF8_to_ASCII, UTF8_fr,
ASCII_to) && everythingWorked;
everythingWorked = createConversion(UTF8_to_EBCDIC, UTF8_fr,
EBCDIC_to) && everythingWorked;

return everythingWorked;
}

int convert(iconv_t conversion, char *input, char *output, int
outBufSize) {
char * tempIn = 0;
char * tempOut = 0;
size_t outBytes = 0;
size_t inBytes = 0;
int error;

printf("Converting '%s' via iconv\n", input);
memset(output, '\0', outBufSize);
inBytes = strlen(input);
outBytes = outBufSize;
tempIn = input;
tempOut = output;
error = iconv(conversion,
&tempIn, &inBytes,
&tempOut, &outBytes);
if ((-1 == error) || (inBytes != 0) /*|| (outBytes != 0)*/) {
printf("iconv failed, errno=%d, inBytes=%d, outBytes=%d\n", errno,
inBytes, outBytes);
return -1;
} else if (0 != error) {
printf("iconv succeeded but had non-zero rc=%d\n", error);
return -1;
}
return 0;
}

int main()
{
// Initialize stuff
if (!iconvSetup()) {
return -1;
};
...etc...

Mr. K.V.B.L.

unread,

Feb 13, 2009, 11:38:49 AM2/13/09

to

I think I'm on my way again, thanks for your help. After using iconv
() a few times I still get myself mixed up from time to time. I think
it's time the routine should get an upgrade. You shouldn't need to
pre-allocate your output buffer when C++ is so good at appending to an
object like std::string.

Something like iconv(iconv_t cd, const string input, string& output);

But I digress.

WDS

unread,

Feb 13, 2009, 1:39:55 PM2/13/09

to

On Feb 13, 10:38 am, "Mr. K.V.B.L." <kenverybigl...@gmail.com> wrote:
> I think I'm on my way again, thanks for your help. After using iconv
> () a few times I still get myself mixed up from time to time. I think
> it's time the routine should get an upgrade. You shouldn't need to
> pre-allocate your output buffer when C++ is so good at appending to an
> object like std::string.
>
> Something like iconv(iconv_t cd, const string input, string& output);

Look up ICU.

Mr. K.V.B.L.

unread,

Feb 17, 2009, 10:22:27 AM2/17/09

to

I'm going to look at that today. Our V6R1 machine has it installed
for version 3.8 I think. In the meantime, do you know of any issue
with iconv() that prevents it from being called repeatedly on the same
data? I have a test program where I read a file of UTF-8 dataim run
it through an iconv() to convert to EBCDIC, manipulate the data,
convert it back to UTF-8 and write a new file. I have this in a
loop. It gets to the forth or fifth iteration then it shuts down
because there are some bytes left over for some reason. It should be
the same run every time. Anyway I'm researching this issue today and
looking at ICU. Thanks for all the previous help.

#include <iostream>
#include <string>
#include <sstream>
#include <fstream>
#include <stdexcept>
#include <iconv.h>

using namespace std;

#pragma convert(37)
char from_ascii_ccsid[33] = "IBMCCSID008190000000",
from_ebcdic_ccsid[33] = "IBMCCSID000370000000",
from_utf8_ccsid[33] = "IBMCCSID012080000000",
to_ascii_ccsid[33] = "IBMCCSID00819",
to_ebcdic_ccsid[33] = "IBMCCSID00037",
to_utf8_ccsid[33] = "IBMCCSID01208";
#pragma convert(0)

class ICONV {
private :
iconv_t cd;

public :
ICONV(const string fromEncoding, const string toEncoding);
~ICONV();
void convert(char *input, char *output, int inBufSize, int
outBufSize);
};

ICONV::ICONV(const string fromEncoding, const string toEncoding)
{
stringstream sstr;
cd = iconv_open(toEncoding.c_str(), fromEncoding.c_str());
if (cd.return_value < 0) {
sstr << "iconv_open() returned errno=" << errno << " (" <<
strerror(errno) << ").";
throw runtime_error(sstr.str());
}
}

ICONV::~ICONV()
{
iconv_close(cd);
}

void ICONV::convert(char *input, char *output, int inBufSize, int
outBufSize)
{
stringstream sstr;
char *tempIn;
char *tempOut;
size_t outBytes;
size_t inBytes;
int error;

memset(output, '\0', outBufSize);
inBytes = inBufSize;

outBytes = outBufSize;
tempIn = input;
tempOut = output;

error = iconv(cd, &tempIn, &inBytes, &tempOut, &outBytes);
if (-1 == error || inBytes != 0) {
sstr << "iconv() failed, errno=" << errno << ", inBytes=" <<
inBytes << ", outBytes=" << outBytes << " (" << strerror(errno) <<
")";
throw runtime_error(sstr.str());

} else if (0 != error) {

// Might change this to throw errors instead.
}
}

void readUTF8File(string filename, string& str, size_t &length)
{
_CCSID_T CCSID_utf8(1208); // 1208 is UTF-8
ifstream file;

file.open(filename.c_str(), fstream::in, CCSID_utf8);
if (file.is_open()) {
file.seekg(0, ios::end);
length = file.tellg();
file.seekg(0, ios::beg); // These three lines tell
you the size of the file.

char *buffer = new char[length + 1]; // allocate memory for
the input

file.read(buffer, length); // read data as a block:
file.close(); // close the file
str.assign(buffer); // Stuff the char* data
into a C++ std::string
delete[] buffer; // close out our char*
string.
}
else {
throw runtime_error("readUTF8File(): " + filename + " did not
open.");
}
}

int main(int argc, char *argv[])
{
char logfileName[100];
_CCSID_T CCSID_utf8(1208); // 1208 is UTF-8
ICONV UTF8_to_EBCDIC(from_utf8_ccsid, to_ebcdic_ccsid);
ICONV EBCDIC_to_UTF8(from_ebcdic_ccsid, to_utf8_ccsid);
string EBCDIC_str, newstr;

try {
for (int i = 0; i < 100; ++i) {
size_t length;
string buffer;
buffer.clear();
readUTF8File("david.xml", buffer, length); // Read
the contents of a file encoded with CCSID 1208.

// Convert the data into EBCDIC
char *out_buffer = new char[length]; // I use a
buffer size that is equal to UTF-8 since we're going backwards.
UTF8_to_EBCDIC.convert(const_cast<char *>(buffer.c_str()),
out_buffer, length, length);
EBCDIC_str.assign(out_buffer);
delete[] out_buffer;

// This is just a segment of code to strip out anything
that comes before the opening <?xml version...>. I'm
// using it to remove HTTP headers.
newstr = EBCDIC_str.substr(EBCDIC_str.find("<?"));

// Now convert our stripped data to UTF-8.
char *new_buffer = new char[newstr.length() * 2];
memset(new_buffer, 0, newstr.length() * 2);
EBCDIC_to_UTF8.convert(const_cast<char *>(newstr.c_str()),
new_buffer, newstr.length(), newstr.length() * 2);

// Now crap gets saved back to the same file in UTF-8.
ofstream outfile;
stringstream fileNamestream;

fileNamestream << "david_" << i << ".XML";
outfile.open(fileNamestream.str().c_str(), fstream::out,
CCSID_utf8);
if (outfile.is_open()) {
outfile.write(new_buffer, strlen(new_buffer));
outfile.close();
}

delete[] new_buffer;
}
}
catch (runtime_error e) {
cout << e.what() << endl;
}
}

WDS

unread,

Feb 17, 2009, 11:29:12 AM2/17/09

to

On Feb 17, 9:22 am, "Mr. K.V.B.L." <kenverybigl...@gmail.com> wrote:
> I'm going to look at that today. Our V6R1 machine has it installed
> for version 3.8 I think. In the meantime, do you know of any issue
> with iconv() that prevents it from being called repeatedly on the same
> data? I have a test program where I read a file of UTF-8 dataim run
> it through an iconv() to convert to EBCDIC, manipulate the data,
> convert it back to UTF-8 and write a new file. I have this in a
> loop. It gets to the forth or fifth iteration then it shuts down
> because there are some bytes left over for some reason. It should be
> the same run every time. Anyway I'm researching this issue today and
> looking at ICU. Thanks for all the previous help.

As far as I know it doesn't maintain any state and only works on the
operands you pass in. Are you sure everything is identical when
passed in?

That said, iconv is extremely quirky and I count myself lucky whenever
it works.

I've never used ICU but I read the documentation and it can be kind of
overwhelming because it does so much beyond "just" character set
translations.

Mr. K.V.B.L.

unread,

Feb 17, 2009, 12:38:06 PM2/17/09

to

I've loaded my previous code with debugs. Everything stays the same
until the 5th iteration. I too can't find anything that would suggest
that iconv() is good for only one conversion then you need to clean up
with iconv_close() and start over. I guess there is a slight bug in
there somewhere but I'll be darned if I can find it. I need to put it
aside for a while and wait for fresh eyes to look at it again.

I'm just using CRTBNDCPP to compile. No weird options or anything. I
might try the IBM Qiconv routine and give that a whirl.

Mr. K.V.B.L.

unread,

Feb 17, 2009, 4:05:07 PM2/17/09

to

On Feb 17, 11:38 am, "Mr. K.V.B.L." <kenverybigl...@gmail.com> wrote:
> On Feb 17, 10:29 am, WDS <B...@seurer.net> wrote:
>
>
>
> > On Feb 17, 9:22 am, "Mr. K.V.B.L." <kenverybigl...@gmail.com> wrote:
>
> > > I'm going to look at that today. Our V6R1 machine has it installed
> > > for version 3.8 I think. In the meantime, do you know of any issue
> > > with iconv() that prevents it from being called repeatedly on the same
> > > data? I have a test program where I read a file of UTF-8 dataim run
> > > it through an iconv() to convert to EBCDIC, manipulate the data,
> > > convert it back to UTF-8 and write a new file. I have this in a
> > > loop. It gets to the forth or fifth iteration then it shuts down
> > > because there are some bytes left over for some reason. It should be
> > > the same run every time. Anyway I'm researching this issue today and
> > > looking at ICU. Thanks for all the previous help.
>
> > As far as I know it doesn't maintain any state and only works on the
> > operands you pass in. Are you sure everything is identical when
> > passed in?
>
> > That said, iconv is extremely quirky and I count myself lucky whenever
> > it works.
>
> > I've never used ICU but I read the documentation and it can be kind of
> > overwhelming because it does so much beyond "just" character set
> > translations.
>

Well after beating this thing to death I think I've got it. No word
yet on why regular iconv_open() and iconv() fail, but I'm sure it has
to do with how those char* strings are set up. If I use QtqIconvOpen
() and set things up that way, my loop runs through and completes 100
iterations and creates 100 identical files. I'm posting this code
again for my benefit, cos I'm sure 8-9 years down the road something
will come up and I will have forgotten the nuances of using iconv().

Actually I just found http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/apis/iconvopn.htm.
This tells you how to format the char strings for regular iconv_open
(). I've fixed the strings up according to the directions and
something is still going on.

Oh well, I have actual work to do and I've found a method that works
reliably. I'll fool with this thing later.

#include <iostream>
#include <string>
#include <sstream>
#include <fstream>
#include <stdexcept>

#include <qtqiconv.h>

using namespace std;

// There is something wrong with these strings. I need to find the
documentation on how to set them up.
#pragma convert(37)
char from_ascii_ccsid[99] = "IBMCCSID008190000101",
from_ebcdic_ccsid[99] = "IBMCCSID000370000101",
from_utf8_ccsid[99] = "IBMCCSID012080000101",
to_ascii_ccsid[99] = "IBMCCSID00819",
to_ebcdic_ccsid[99] = "IBMCCSID00037",
to_utf8_ccsid[99] = "IBMCCSID01208";
#pragma convert(0)

class ICONV {
private :
iconv_t cd;

public :
ICONV(const string fromEncoding, const string toEncoding);

ICONV(QtqCode_T *fromEncoding, QtqCode_T *toEncoding);

~ICONV();
void convert(char *input, char *output, int inBufSize, int
outBufSize);
};

ICONV::ICONV(const string fromEncoding, const string toEncoding)
{
stringstream sstr;
cd = iconv_open(toEncoding.c_str(), fromEncoding.c_str());
if (cd.return_value < 0) {
sstr << "iconv_open() returned errno=" << errno << " (" <<
strerror(errno) << ").";
throw runtime_error(sstr.str());
}
}

ICONV::ICONV(QtqCode_T *fromEncoding, QtqCode_T *toEncoding)
{
stringstream sstr;
cd = QtqIconvOpen(toEncoding, fromEncoding);
if (cd.return_value < 0) {
sstr << "QtqIconvOpen() returned errno=" << errno << " (" <<
strerror(errno) << ").";
throw runtime_error(sstr.str());
}
}

ICONV::~ICONV()
{
iconv_close(cd);
}

string EBCDIC_str, newstr;
string buffer;
QtqCode_T fromUTF8, toUTF8, fromEBCDIC, toEBCDIC;

fromUTF8.CCSID = 1208;
fromUTF8.cnv_alternative = 0;
fromUTF8.subs_alternative = 0;
fromUTF8.shift_alternative = 1;
fromUTF8.length_option = 0;
fromUTF8.mx_error_option = 0;
memset(fromUTF8.reserved, 0, 8);

toEBCDIC.CCSID = 37;
toEBCDIC.cnv_alternative = 0;
toEBCDIC.subs_alternative = 0;
toEBCDIC.shift_alternative = 0;
toEBCDIC.length_option = 0;
toEBCDIC.mx_error_option = 0;
memset(toEBCDIC.reserved, 0, 8);

toUTF8.CCSID = 1208;
toUTF8.cnv_alternative = 0;
toUTF8.subs_alternative = 0;
toUTF8.shift_alternative = 0;
toUTF8.length_option = 0;
toUTF8.mx_error_option = 0;
memset(toUTF8.reserved, 0, 8);

fromEBCDIC.CCSID = 37;
fromEBCDIC.cnv_alternative = 0;
fromEBCDIC.subs_alternative = 0;
fromEBCDIC.shift_alternative = 1;
fromEBCDIC.length_option = 0;
fromEBCDIC.mx_error_option = 0;
memset(fromEBCDIC.reserved, 0, 8);

try {
// This way works everytime.
ICONV UTF8_to_EBCDIC(&fromUTF8, &toEBCDIC);
ICONV EBCDIC_to_UTF8(&fromEBCDIC, &toUTF8);

/*
Using this method causes failure.

ICONV UTF8_to_EBCDIC(from_utf8_ccsid, to_ebcdic_ccsid);
ICONV EBCDIC_to_UTF8(from_ebcdic_ccsid, to_utf8_ccsid);

*/

for (int i = 0; i < 100; ++i) {

//buffer.clear();
//EBCDIC_str.clear();
//newstr.clear();

size_t length;