practical application for files with holes

Skip to first unread message

Rainer Weikusat

Feb 4, 2022, 2:09:11 PMFeb 4
The first I've found, actually.

WebSockets is mechanism for negotiating a HTTP protocol switch to a
message-boundary preserving, binary full-duplex protocol. ActionCable is
a Ruby-on-Rails framework for implementing chat-applications on top of
WebSockets via exchange of JSON messages. The basic idea is that clients
connect to an ActionCable cloud instance via HTTP, negotiate WebSockets
and then subscribe to so-called channels which can be additionally
subdivided into rooms. Clients can then send messages to channels/ rooms
they're subscribed to and ActionCable acts a seriously dumb multiport
repeater ('hub') in between, forwarding all messages received from any
subscriber to a room/ channel to all other subscribers.

In the context of some application, this communication arrangement is
being (ab-)used to implement various remote system management functions
enabling control of software appliance via a cloud-based web UI, eg,
interactive remote shell access and file uploads. For the usual
(pseudo-) reasons, ActionCable message processing is naively
multithreaded, ie, an incoming message gets processed by the next 'free'
message processing thread independently of all other messages. IOW,
messages sent to the ActionCable cloud instance will be rebroadcasted
('sent to subscribers') randomly reordered.

The file upload bit is based on sending the file content as sequence of
n (n >= 0) fixed-size chunks (8192 bytes, ie, 6144 BASE64-encoded
content bytes -- asking for an 8-bit clean transport in the world of
practical unicode in 2022 is just asking for too much!), followed by a
single, possibly undersized final chunk. Random reordering of chunks is
obviously not desirable. Because of this, each chunk has a sequence
number and the receiver must use these numbers to restore the correct
ordering. UNIX supports so-called 'files with holes', ie seeking to an
arbitrary position beyond the current end of a regular file and then
writing some data there. Considering this, the following algorithm can
be used to restore the correct chunk ordering.

fpos.seq denotes the sequence number of the last chunk which was written
to the file and fpos.bzs its (decoded) size. Both are initialized to
0. seq is the sequence number of the next received block, block_size
the (decoded) chunk size.

1. Calculate t_pos = (seq - fpos.seq) * block_size - fpos.bsz
2. Do a relative seek (SEEK_CUR) to t_pos
3. Write the data from the new block to the file

Nice property of this is that the algorithm is O(1) for each individual
block, regardless of how severe it (or any previous blocks) were received
out of order and that it doesn't need any in-memory state for that
except the last sequence number and block size.
Reply all
Reply to author
0 new messages