The position can *not* be in the middle of a character (the docs say "however, the will not write only partially encoded characters."*), however that does mean that it may not have completely filled the buffer (will prefer to leave the last bytes unused rather than write a partial character).
When needing to do this same thing ourselves in the past, I believe we found the most efficient solution was to just scan the string to calculate, for a given number of bytes, how many UTF8 characters in it was (was much more efficient than allocating a new Buffer every time - the whole point of why we're writing into an existing buffer was to avoid allocating buffers ;). There's probably a module somewhere, but I think we used the snippet below. I think you should be able to do var remaining = str.slice(bytesToOffs(str, buf.write(str)));
function bytesToOffs(str, num_bytes) {
var idx = 0;
while (num_bytes > 0 && idx < str.length) {
var c = str.charCodeAt(idx);
if (c <= 0x7F) {
--num_bytes;
} else if (c <= 0x07FF) {
num_bytes -= 2;
} else if (c <= 0xFFFF) {
num_bytes -= 3;
} else if (c <= 0x1FFFFF) {
num_bytes -= 4;
} else if (c <= 0x3FFFFFF) {
num_bytes -= 5;
} else {
num_bytes -= 6;
}
if (num_bytes >= 0) {
++idx;
}
}
return idx;
}
Note: I last looked at this closely on Node 0.6 or so, so there might be better approaches now. Also, if you have a reasonable upper bound to your string size, it might be more efficient to keep one, large re-used buffer to first write your string into, and use that for other operations (including writing bytes directly from that temp buffer to the other buffer for more efficiency, if you don't mind partial characters being written), though depending on what exactly you need the character index for, that might not help.
Jimb
* Quoted bad grammar in the current live docs,
PR sent.