Agreed. I'll probably add a streaming API in the next release.
> /Users/nlafon/Dropbox/code/njs/sacacoles.js:48
> var iconv = new Iconv('ISO-8859-1', 'UTF8');
> ^
> Error: EINVAL, Conversion not supported.
Your libiconv does not know how to convert from latin1 to utf8. Make
sure the proper character sets are compiled in.
--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
I might just add GNU libiconv as a static dependency to avoid all this
craziness.
> on linux the script starts using 95% CPU for the conversion and never
> finishes.
I wager that 10 MB buffer isn't full. You are passing it in whole so
in that case iconv is trying to recode undefined data.
--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
I'll see what I can do. Could you perhaps do a `make distclean` and
post the output of `make install` and `nm iconv.node`?
Thanks!
Ben
--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
I've pushed a commit[1] that applies a little heuristics to the
character encoding name, massaging it into something libiconv likes.
[1] http://github.com/bnoordhuis/node-iconv/commit/9012b03b441ced4349b8a7590369ec02ca9bac87
--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
fs = require('fs'), Iconv = require('iconv').Iconv;
fs.readFile('404.html', function(ex, data) {
iv = new Iconv('iso-8859-15', 'utf-8');
console.log(iv.convert(data).toString());
});
request.on('response', function (response) {
var responseBuffers = [], responseLength = 0;
response.on('data', function (chunk) {
responseBuffers.push(chunk);
});
response.on('end', function() {
var totalLength = 0, index = 0, concatBuffer, iconv = new Iconv('ISO-8859-1', 'UTF-8'), outputBuffer;
for (var i=0, currentBuffer; currentBuffer = responseBuffers[i]; i++){
totalLength += currentBuffer.length;
}
concatBuffer = new Buffer(totalLength);
while ((currentBuffer = responseBuffers.shift())){
currentBuffer.copy(concatBuffer, index, 0);
index += currentBuffer.length;
}
try{
outputBuffer = iconv.convert(concatBuffer);
} catch(e){
errback('Trouble converting text: ' + e);
}
callback(outputBuffer.toString('utf8'));
});
});
Yes, this is the preferred method. Encoding the chunks one-by-one may
not work as expected with stateful or multi-byte character sets. This
is something of a trap door, I'll update the documentation.
Side note: I've added a buffertools.concat() method to
node-buffertools that lets you concatenate buffers (and strings) as a
one-liner.
buffertools.concat(a, b, 'foo', new Buffer('bar'));
I'll probably add something similar to iconv.convert():
// decode a+b+c into a single result buffer
r = iconv.convert(a, b, c);
I've had an idea:
An analogy: when you are adding 15 + 17 you first do 5+7 that gives 2 and a carry of one so that the next sum becomes carry+1+1.
In the same manner, chunks that arrive ondata encoded in utf8 could generate a carry to be prepended to the next chunk on the next ondata.
That does not sound too difficult to do, and would help to avoid the need to buffer the chunks.
Maybe the `new Buffer()`s should be built with a few free bytes of padding prepended, whose only use would be to accommodate the carry -if any- from a previous chunk (the "0" of the buffer would have to be reset accordingly).
I might be perfectly wrong, though, very likely.
--
Jorge.
BTW, in order to tell if there's a carry, one would just need to sniff the last few - 4 at most - bytes of the buffer, istm : not the whole buffer.
--
Jorge.
Just make the carry an object that includes the state as well... (?)
--
Jorge.
Easy in theory, somewhat harder in practice. :-)