Stream Cracked

4 views
Skip to first unread message

Ilario Grijalva

unread,
Jul 26, 2024, 12:48:35 AM7/26/24
to liquid-galaxy

This document contains two primary sections and a third section for notes. Thefirst section explains how to use existing streams within an application. Thesecond section explains how to create new types of streams.

The stream/promises API provides an alternative set of asynchronous utilityfunctions for streams that return Promise objects rather than usingcallbacks. The API is accessible via require('node:stream/promises')or require('node:stream').promises.

Remember to handle the signal argument passed into the async generator.Especially in the case where the async generator is the source for thepipeline (i.e. first argument) or the pipeline will never complete.

It is possible, however, for streamimplementations to work with other types of JavaScript values (with theexception of null, which serves a special purpose within streams).Such streams are considered to operate in "object mode".

The amount of data potentially buffered depends on the highWaterMark optionpassed into the stream's constructor. For normal streams, the highWaterMarkoption specifies a total number of bytes. For streams operatingin object mode, the highWaterMark specifies a total number of objects. Forstreams operating on (but not decoding) strings, the highWaterMark specifiesa total number of UTF-16 code units.

Data is buffered in Readable streams when the implementation callsstream.push(chunk). If the consumer of the Stream does notcall stream.read(), the data will sit in the internalqueue until it is consumed.

Once the total size of the internal read buffer reaches the threshold specifiedby highWaterMark, the stream will temporarily stop reading data from theunderlying resource until the data currently buffered can be consumed (that is,the stream will stop calling the internal readable._read() method that isused to fill the read buffer).

Data is buffered in Writable streams when thewritable.write(chunk) method is called repeatedly. While thetotal size of the internal write buffer is below the threshold set byhighWaterMark, calls to writable.write() will return true. Oncethe size of the internal buffer reaches or exceeds the highWaterMark, falsewill be returned.

A key goal of the stream API, particularly the stream.pipe() method,is to limit the buffering of data to acceptable levels such that sources anddestinations of differing speeds will not overwhelm the available memory.

The highWaterMark option is a threshold, not a limit: it dictates the amountof data that a stream buffers before it stops asking for more data. It does notenforce a strict memory limitation in general. Specific stream implementationsmay choose to enforce stricter limits but doing so is optional.

Because Duplex and Transform streams are both Readable andWritable, each maintains two separate internal buffers used for reading andwriting, allowing each side to operate independently of the other whilemaintaining an appropriate and efficient flow of data. For example,net.Socket instances are Duplex streams whose Readable side allowsconsumption of data received from the socket and whose Writable side allowswriting data to the socket. Because data may be written to the socket at afaster or slower rate than data is received, each side shouldoperate (and buffer) independently of the other.

The mechanics of the internal buffering are an internal implementation detailand may be changed at any time. However, for certain advanced implementations,the internal buffers can be retrieved using writable.writableBuffer orreadable.readableBuffer. Use of these undocumented properties is discouraged.

Applications that are either writing data to or consuming data from a streamare not required to implement the stream interfaces directly and will generallyhave no reason to call require('node:stream').

The 'close' event is emitted when the stream and any of its underlyingresources (a file descriptor, for example) have been closed. The event indicatesthat no more events will be emitted, and no further computation will occur.

The primary intent of writable.cork() is to accommodate a situation in whichseveral small chunks are written to the stream in rapid succession. Instead ofimmediately forwarding them to the underlying destination, writable.cork()buffers all the chunks until writable.uncork() is called, which will pass themall to writable._writev(), if present. This prevents a head-of-line blockingsituation where data is being buffered while waiting for the first small chunkto be processed. However, use of writable.cork() without implementingwritable._writev() may have an adverse effect on throughput.

Destroy the stream. Optionally emit an 'error' event, and emit a 'close'event (unless emitClose is set to false). After this call, the writablestream has ended and subsequent calls to write() or end() will result inan ERR_STREAM_DESTROYED error.This is a destructive and immediate way to destroy a stream. Previous calls towrite() may not have drained, and may trigger an ERR_STREAM_DESTROYED error.Use end() instead of destroy if data should flush before close, or wait forthe 'drain' event before destroying the stream.

Calling the writable.end() method signals that no more data will be writtento the Writable. The optional chunk and encoding arguments allow onefinal additional chunk of data to be written immediately before closing thestream.

When using writable.cork() and writable.uncork() to manage the bufferingof writes to a stream, defer calls to writable.uncork() usingprocess.nextTick(). Doing so allows batching of allwritable.write() calls that occur within a given Node.js event loop phase.

The writable.write() method writes some data to the stream, and calls thesupplied callback once the data has been fully handled. If an erroroccurs, the callback will be called with the error as itsfirst argument. The callback is called asynchronously and before 'error' isemitted.

The return value is true if the internal buffer is less than thehighWaterMark configured when the stream was created after admitting chunk.If false is returned, further attempts to write data to the stream shouldstop until the 'drain' event is emitted.

While a stream is not draining, calls to write() will buffer chunk, andreturn false. Once all currently buffered chunks are drained (accepted fordelivery by the operating system), the 'drain' event will be emitted.Once write() returns false, do not write more chunksuntil the 'drain' event is emitted. While calling write() on a stream thatis not draining is allowed, Node.js will buffer all written chunks untilmaximum memory usage occurs, at which point it will abort unconditionally.Even before it aborts, high memory usage will cause poor garbage collectorperformance and high RSS (which is not typically released back to the system,even after the memory is no longer required). Since TCP sockets may neverdrain if the remote peer does not read the data, writing a socket that isnot draining may lead to a remotely exploitable vulnerability.

Writing data while the stream is not draining is particularlyproblematic for a Transform, because the Transform streams are pausedby default until they are piped or a 'data' or 'readable' event handleris added.

If the data to be written can be generated or fetched on demand, it isrecommended to encapsulate the logic into a Readable and usestream.pipe(). However, if calling write() is preferred, it ispossible to respect backpressure and avoid memory issues using the'drain' event:

Readable streams effectively operate in one of two modes: flowing andpaused. These modes are separate from object mode.A Readable stream can be in object mode or not, regardless of whetherit is in flowing mode or paused mode.

The important concept to remember is that a Readable will not generate datauntil a mechanism for either consuming or ignoring that data is provided. Ifthe consuming mechanism is disabled or taken away, the Readable will attemptto stop generating the data.

For backward compatibility reasons, removing 'data' event handlers willnot automatically pause the stream. Also, if there are piped destinations,then calling stream.pause() will not guarantee that thestream will remain paused once those destinations drain and ask for more data.

If a Readable is switched into flowing mode and there are no consumersavailable to handle the data, that data will be lost. This can occur, forinstance, when the readable.resume() method is called without a listenerattached to the 'data' event, or when a 'data' event handler is removedfrom the stream.

Adding a 'readable' event handler automatically makes the streamstop flowing, and the data has to be consumed viareadable.read(). If the 'readable' event handler isremoved, then the stream will start flowing again if there is a'data' event handler.

When readable.readableFlowing is null, no mechanism for consuming thestream's data is provided. Therefore, the stream will not generate data.While in this state, attaching a listener for the 'data' event, calling thereadable.pipe() method, or calling the readable.resume() method will switchreadable.readableFlowing to true, causing the Readable to begin activelyemitting events as data is generated.

Calling readable.pause(), readable.unpipe(), or receiving backpressurewill cause the readable.readableFlowing to be set as false,temporarily halting the flowing of events but not halting the generation ofdata. While in this state, attaching a listener for the 'data' eventwill not switch readable.readableFlowing to true.

The Readable stream API evolved across multiple Node.js versions and providesmultiple methods of consuming stream data. In general, developers should chooseone of the methods of consuming data and should never use multiple methodsto consume data from a single stream. Specifically, using a combinationof on('data'), on('readable'), pipe(), or async iterators couldlead to unintuitive behavior.

Reply all
Reply to author
Forward
0 new messages