Buffer.write - Is there an efficient way to get the last index of the string that was written to the buffer

25 views
Skip to first unread message

Chethiya Abeysinghe

unread,
Jan 23, 2016, 11:07:17 PM1/23/16
to nodejs

In case Buffer has no enough space to fit in all the bytes this function returns the number of bytes it could write to the buffer. But is there an efficient way to find last index of the string written to that buffer so that rest of the string can be processed separately? 

P.S. I get that creating another string with buffer and getting it's length solves this. But my understanding is it's not that efficient as it allocates more memory for the new String? 

Thanks

Matt

unread,
Jan 25, 2016, 2:48:08 AM1/25/16
to nod...@googlegroups.com
Because the string could contain non-ascii data, you'd have to create a buffer from the string to get that position. Be aware that the position could be in the middle of a character, so you'll have to deal with that.

You can use Buffer.byteLength(thestring) == buf.write(thestring) to find out if the write didn't copy the entire string.

--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/54b20633-8f0c-4460-98d6-4f9f403d9c0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jimb Esser

unread,
Jan 26, 2016, 11:37:38 PM1/26/16
to nodejs
The position can *not* be in the middle of a character (the docs say "however, the will not write only partially encoded characters."*), however that does mean that it may not have completely filled the buffer (will prefer to leave the last bytes unused rather than write a partial character).

When needing to do this same thing ourselves in the past, I believe we found the most efficient solution was to just scan the string to calculate, for a given number of bytes, how many UTF8 characters in it was (was much more efficient than allocating a new Buffer every time - the whole point of why we're writing into an existing buffer was to avoid allocating buffers ;).  There's probably a module somewhere, but I think we used the snippet below.  I think you should be able to do var remaining = str.slice(bytesToOffs(str, buf.write(str)));

function bytesToOffs(str, num_bytes) {
 
var idx = 0;
 
while (num_bytes > 0 && idx < str.length) {
   
var c = str.charCodeAt(idx);
   
if (c <= 0x7F) {
     
--num_bytes;
   
} else if (c <= 0x07FF) {
      num_bytes
-= 2;
   
} else if (c <= 0xFFFF) {
      num_bytes
-= 3;
   
} else if (c <= 0x1FFFFF) {
      num_bytes
-= 4;
   
} else if (c <= 0x3FFFFFF) {
      num_bytes
-= 5;
   
} else {
      num_bytes
-= 6;
   
}
   
if (num_bytes >= 0) {
     
++idx;
   
}
 
}
 
return idx;
}

Note: I last looked at this closely on Node 0.6 or so, so there might be better approaches now.  Also, if you have a reasonable upper bound to your string size, it might be more efficient to keep one, large re-used buffer to first write your string into, and use that for other operations (including writing bytes directly from that temp buffer to the other buffer for more efficiency, if you don't mind partial characters being written), though depending on what exactly you need the character index for, that might not help.

  Jimb

* Quoted bad grammar in the current live docs, PR sent.
Reply all
Reply to author
Forward
0 new messages