Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Meta: a usenet server just for sci.math

Skip to first unread message

Ross A. Finlayson

Dec 1, 2016, 11:24:47 PM12/1/16

I have an idea here to build a usenet server
only for sci.math and sci.logic. The idea is
to find archives of sci.math and sci.logic and
to populate a store of the articles in a more
or less enduring form (say, "on the cloud"),
then to offer some usual news server access
then to, say, 1 month 3 month 6 month retention,
and then some cumulative retention (with a goal
of unlimited retention of sci.math and sci.logic
articles). The idea would be to have basically
various names of servers then reflect those
retentions for various uses for a read-only
archival server and a read-only daily server
and a read-and-write posting server. I'm willing
to invest time and effort to write the necessary
software and gather existing archives and integrate
with existing usenet providers to put together these

Then, where basically it's in part an exercise
in vanity, I've been cultivating some various
notions of how to generate some summaries or
reports of various post, articles, threads, and
authors, toward the specialization of the cultivation
of summary for reporting and research purposes.

So, I wonder others' idea about such a thing and
how they might see it as a reasonably fruitful
thing, basically for the enjoyment and for the
most direct purposes of the authors of the posts.

I invite comment, as I have begun to carry this out.

Ross A. Finlayson

Dec 2, 2016, 2:19:23 PM12/2/16
So far I've read through the NNTP specs and looked
a bit at the INND code. Then, the general idea is
to define a filesystem layout convention, that then
would be used for articles, then for having those
on virtual disks (eg, "EBS volumes") or cloud storage
(eg, "S3") in essentially a Write-Once-Read-Many
configuration, where the goal is to implement data
structures that have a forward state machine so that
they remain consistent with unreliable computing
resources (eg, "runtimes on EC2 hosts"), and that
are readily cacheable (and horizontally scaleable).

Then, the runtimes are of the collection and maintenance
of posts ("infeeds" and "outfeeds", backfills), about
summary generation (overview, metadata, key extraction,
information content, working up auto-correlation), then
reader servers, then some maintenance and admin. As a
usual software design principle there is a goal of the
both "stack-on-a-box" and also "abstraction of resources"
and a usual separation of domain, library, routine, and
runtime logic.

So basically it looks like:
1) gather mbox files of sci.math and sci.logic
2) copy those to archive inputs
3) break those out into a filesystem layout for each article
(there are various filesystems that support this many files
these days)
4) generate partition and overview summaries
5) generate various revisioning schemes (the "article numbers"
of the various servers)
6) figure out the incremental addition and periodic truncation
7) establish a low-cost but high-availability endpoint runtime
8) make elastic/auto-scaling service routine behind that
9) have opportunistic / low cost periodic maintenance
10) emit that as a configuration that anybody can run
as "stack-on-a-box" or with usual "free tier" cloud accounts

Ross A. Finlayson

Dec 4, 2016, 9:45:04 PM12/4/16
I've looked into this a bit more and the implementation is
starting to look along these lines.

First there's the ingestion side, or "infeed", basically
the infeed connects and pushes articles. Here then the
basic store of the articles will be an object store (or
here "S3" as an example object store). This is durable
and the object keys are the article's "unique" message-id.

If the message-id already exists in the store, then the
infeed just continues.

The article is stored with matching the message-id, noting
the body offset, and counting the lines, and storing that
with the object. Then, the message-id pushed to
a queue, can also have the headers as extracted from
the article, that are relevant to the article and overview,
and the arrival date or effective arrival date. The slow-
and-steady database worker (or, distributed data structure
on "Dynamo tables") then retrieves a queue item, at some
metered rate, and gets an article number for each of the
newsgroups (by some conditional update that might starve a thread)
for each group that is in the newsgroups of the article and
some "all" newsgroup, so that each article also has a (sequential) number.

Assigning a sequence is a bit the wicket, because, here
there's basically "eventual consistency" and "forward safe"
operations. Any of the threads, connections, or boxes
could die at any time, then the primary concern is "no
drops, then, no dupes". So, there isn't really a transactional
context to make atomic "for each group, give it the next
sequence value, doing that together for each groups' numbering
of articles in an atomic transaction". Luckily, while NNTP
requires strictly increasing values, it allows gaps in the
sequences. So, here, when mapping article-number to message-id
and message-id to article-number, if some other thread has
already stored a value for that article-number, then it can
be re-tried until there is an unused article-number. Updating
the high-water mark can fail if it was updated by another thread,
then to re-try again with the new, which could lead to starvation.

(There's a notion then, when an article-number is assigned, to
toss that back onto queue for the rest of the transaction to
be carried out.)

Then, this having established a data structure for the message
store, these are basically the live data structures, distributed,
highly available, fault-tolerant and maintenance free, this
implements the basic function for getting feeds (or new articles)
and also the reader capability, which is basically a protocol
listener that maintains the reader's current group and article.

To implement then some further features of NNTP, there's an idea
to store the article numbers for each group and "all" basically
a bucket for each time period (eg, 1 day), then, that scans over
the articles by their numbers find those as the partitions, then
that sequentially (or rather, increasingly) the rest follow.

To omit or remove articles or expire them for no-archive, that
is basically ignored, but the idea is to maintain for the all
group series of 1000 or 10000 articles then for what offsets in
those series are cancelled. Basically the object store is
write-once, immutable, and flat, where it's yet to be determined
how to backfill the article store from archive files or suck
feeds from live servers with long retentions. Then there's an
idea to start the numbering at 1 000 000 or so an then have
plenty of ranges where to fill in articles as archived or
according to their receipt date header.

Then, as the primary data stores would basically just implement
a simple news server, there are two main notions of priority,
to implement posting and to implement summaries and reports.

Then, as far as I can tell, this pretty much fits within the
"free tier" then that it's pretty economical.

David Melik

Dec 4, 2016, 9:57:19 PM12/4/16
What about including all the sci.math.* and alt.math.*?

Ross A. Finlayson

Dec 4, 2016, 11:05:10 PM12/4/16
It's a matter of scale and configuration.

It should scale quite well enough, though at some point
it would involve some money. In rough terms, it looks
like storing 1MM messages is ~$25/month, and supporting
readers is a few cents a day but copying it would be
twenty or thirty dollars. (I can front that.)

I'm for it where it might be useful, where I hope to
establish an archive with the goal of indefinite retention,
and basically to present an archive and for my own
purposes to generate narratives and timelines.

The challenge will be to get copies of archives of these
newsgroups. Somebody out of news.admin.peering might
have some insight into who has the Dejanews CDs or what
there might be in the Internet Archive Usenet Archive,
then in terms of today's news servers which claim about
ten years retention. Basically I'm looking for twenty
plus years of retention.

Now, some development is underway, and in no real hurry.
Basically I'm looking at the runtimes and a software
library to be written, (i.e., interfaces for the components
above and local file-system versions for stack-on-a-box,
implementing a subset of NNTP, in a simple service runtime
that idles really low).

Then, as above, it's kind of a vanity project or author-centric,
about making it so that custom servers could be stood up with
whatever newsgroups you want with the articles filtered
however you'd so care, rendered variously.

Yuri Kreaton

Dec 5, 2016, 12:19:39 AM12/5/16
talk to the other news group server admins out there, read their web
information, call the dudes or email them, some will hook you up free.
you attach,and you get all sci.math they have and get updates, I think
for free, suck, feed the linkup are called. they may also want to link
up with you for redundancy

not much sw to write either, look for newsgroups w server in the name
for info, some were very good 5 years ago

Ross A. Finlayson

Dec 6, 2016, 7:40:26 PM12/6/16
I've been studying this a bit more.

I set up a linux development environment
by installing ubuntu to a stick PC, then
installing vim, gcc, java, mvn, git. While
ubuntu is a debian distribution and Amazon
Linux (a designated target) is instead along
the lines of RedHat/Yellowdog (yum, was rpm,
instead of apt-get, for component configuration),
then I'm pretty familiar with these tools.

Looking to the available components, basically
the algorithm is being designed with data
structures that can be local or remote. Then,
these are usually that much more complicated
than just the local or just the remote, and
here also besides the routine or state machine
also the exception or error handling and the
having of the queues everywhere for both
throttling and delay-retries (besides the
usual inline re-tries and about circuit
breaker). So, this is along the lines of
"this is an object/octet store" (and AWS
has an offering "Elastic File System" which
is an NFS Networked File System that looks
quite the bit more economical than S3 for
this purpose), "this is a number allocator"
(without sequence.nextVal in an RDBMS, the
requirements allow some gaps in the sequence,
here to use some DynamoDB table attribute's
"atomic counter"), then along the lines of
"this is a queue" and separately "I push to
queues" and "I pop queues", and about "queue
this for right now" and "queue this for later".
Then, there's various mappings, like id to number
and number to id, where again for no-drops / no-dupes
/ Murphy's-law that the state of the mappings is
basically "forward-safe" and that retries make
the system robust and "self-healing". Other mappings
include a removed/deleted bag, this basically looks
like a subset of a series or range of the assigned
numbers, of the all-table and each group-table,
basically numbers are added as attributes to the
item for the series or range.

Octet Store

Then, as noted above, with Murphy's law, any of the
edges of the flowgraph can break at any time, about
the request/response each that defines the boundary
(and a barrier), there is basically defined an abstract
generic exception "TryableException" that has only two
subclasses, "Retryable" and "Nonretryable". Then, the
various implementations of the data structures in the
patterns of their use variously throw these in puking
back the stack trace, then for inline re-tries, delay
re-tries, and fails. Here there's usually a definition
of "idempotence" for methods that are re-tryable besides
exceptions that might go away. The idea is to build
this into the procedure, so it's all built at compile-
time the correctness of the composition of the steps
of the flowgraph of the procedure.

Then, for the runtime, basically it will be some Java
container on the host or in a container, with basically
a cheap simple watchdog/heartbeat that uses signals on
unix (posix) to be keeping the service/routine nodes
(that can fail) up, to bounce (restart) them with signals,
and to reasonably fail and alarm if thrashing of the
child process of the watchdog/nanny, with maybe some
timer update up to the watchdog/heartbeat. Then basically
this runner executes the routine/workflow logic in the jar,
besides that then a mount of the NFS being the only admin
on the box, everything else being run up out of the
environment from the build artifact.

The build artifact then looks that I'd use Spring for
wiring a container and also configuration profiles and
maybe Spring AOP and this kind of thing, i.e., just
spring-core (toward avoiding "all" of spring-boot).

Then, with local (in-memory and file) and remote
(distributed) implementations, basically the
design is to the distributed components, making
abstract those patterns then implementing for the
usual local implementation as standard containers
and usual remote implementation as building transactions
and defined behavior over the network.

Ross A. Finlayson

Dec 9, 2016, 4:46:54 PM12/9/16
Having been researching this a bit more, and
tapping at the code, I've written out most of
the commands then to build a state machine of
the results, and, having analyze the algorithm
of article ingestion and group and session state,
have defined interfaces suitable either for local
or remote operation, with the notion that local
operation would be self-contained (with a quite
simple file backing) while remote operation would
be quite usually durable and horizontally scalable.

I've written up a message reader/writer interface
or ("Scanner" and "Printer") for non-blocking I/O
and implementing reading Commands and writing Results
via non-blocking I/O. This should allow connection
scaling, with threads on accepter/closer and reader/
writer and an execution pool for the commands. The
Scanner and Printer use some BufferPool (basically
abut 4*1024 or 4K buffers), with an idea that that's
pretty much all the I/O usage of RAM and is reasonably
efficient, and that if RAM is hogged it's simple enough
to self-throttle the reader for the writer to balance

About the runtime, basically the idea is to have it
installable as a "well-known service" for "socket
activation" as via inetd or systemd. The runtime is
really rather lean and starts quickly, here on-demand,
that it can be configured as "on-demand" or "long-running".
For some container without systemd or the equivalent,
it could have a rather lean nanny. There's some notion
of integrating heartbeat or status about Main.main(),
then that it runs as "java -jar nntp.jar".

Where the remote backing store or article file system
is some network file system, it also seems that the
runtime would so configure dependency on its file system
resource with quite usual system configuration tools,
for a fault-tolerant and graceful box that reboots as activable.

It interests me that SMTP is quite similar to NNTP. With
an idea of an on-demand server, which is quite rather usual,
these service nodes run on the smallest cloud instances
(here the "t2.nano") and scale to traffic, with a very low
idle or simply the "on-demand" (then for "containerized").

About usenet them I've been studying what it would mean to
be compliant and example what to do with some "control" or
"junk" (sideband) groups and otherwise what it would mean
and take to make a horizontally scalable elastic cloud
usenet server (and persistent store). This is where the
service node is quite lean, the file store and database
(here of horizontally scalable "tables") is basically unbounded.

Ross A. Finlayson

Dec 11, 2016, 7:19:27 PM12/11/16
I've collected what RFC's or specs there are for usenet,
then having surveyed the most of the specified use cases,
have cataloged descriptions of the commands about the protocol
that they are self-contained descriptions within the protocol
of each command. Then, for where there is the protocol and
perhaps any exchange or change of the protocol, for example
for TLS, then that is also being worked into the state machine
of sorts (simply enough a loop over the input buffer to generate
command values from the input given the command descriptions),
for that then as commands are generated (and maintained in their
order) that the results (eg, in the parallel) are thus computed
and returned (again back in the order).

Then, within the protocol, and basically for encryption and
compression, these are established within the protocol instead
of, for example, externally to the protocol. So, there is
basically a filter between the I/O reader and I/O writer and
the scanner and the printer, as it were, that scans input data
to commands and writes command results to output data. This is
again with the "non-blocking I/O" then about that the blocks or
buffers I've basically settled to 4 kibibyte (4KB) buffers, where,
basically an entire input or output in the protocol (here a message
body or perhaps a list of up to all the article numbers) would be
buffered (in RAM), so I'm looking to spool that off to disk if it
so results that essentially unbounded inputs and outputs are to be
handled gracefully in the limited CPU, RAM, I/O, and disk resources
of the usually quite reliable but formally unreliable computing node
(and at cost).

The data structures for access and persistence evolve as the in-memory
and file-based local editions and networked or cloud remote editions.
The semantics are built out to the remote editions, as then they can be
erased in the difference for efficiencies of the local editions.
The in-memory structures (with the article bodies themselves yet
actually written to a file store) are quite efficient and bounded
by RAM or the heap, the file-based structures which makes use of the
memory-mapped files as you may well know comprise all the content of
"free" RAM caching the disk files may be mostly persistent with
a structure that can be bounded by disk size, then the remote network-
based structures here have a usual expectation of being highly reliable
(i.e., that the remote files, queues, and records have a higher reliability
than any given component in their distributed design, at the corresponding
cost in efficiency and direct performance, but of course, this is design
for correctness).

So, that said, then I'm tapping away at the implementation of a queue of
byte buffers, or the I/O RAM convention. Basically, there is some I/O,
and it may or may not be a complete datum or event in the protocol, which
is 1-client-1-server or a stateful protocol. So, what is read off the
I/O buffer, so the I/O controller can service that and other I/O lines,
is copied to a byte buffer. Then, this is to be filtered as above as
necessary, that it is copied to a list of byte buffers (a double ended
queue or linked list). These buffers maintain their current position
and limit, from their beginning, the "buffer" is these pointers and the
data itself. So, that's their concrete type already, then the scanner
or printer also maintains its scan or print position, that the buffer can
be filled and holds some data, then that as the scan pointer moves past
a buffer boundary, that buffer can be reclaimed, with only moving the
scan pointer when a complete datum is read (here as defined for the scanner
in small constant terms by the command descriptions as above).

So, that is pretty much sorted out, then about that basically it should
ingest articles just fine and be a mostly compliant NNTP server.

Then, generating the overview and such is another bit to get figured out,
which is summary.

Another thing in this design to get figured out is how to implement the
queue and database action for the remote, where, the cost efficiency of
the (managed, durable, redundant) remote database, is on having a more-or-
less constant (and small) rate of reads and writes. Then the distributed
queue will hold the backlog, but, the queue consumer is to be constant
rate not for the node but for the fleet, so I'm looking at how to implement
some leader election (fault-tolerance) or otherwise to have loaner threads
of the runtime for any service of the queue. This is where, ingestion is
de-coupled from inbox, so, there's an idea of having a sentinel queue consumer
(because this data might be high volume or low or zero) on a publish/subscribe,
it listens to the queue and if it gets an item it refuses it and wakes up
the constant-rate (or spiking) queue consumer workers, that then proceed
with the workflow items and then retire themselves if and when traffic drops
to zero again, standing back up the sentinel consumer.

Anyways that's just about how to handle variable load but here there's
that it's OK for the protocol to separate ingestion and inbox, otherwise
establishing the completion of the workflow item from the initial request
involves usual asynchronous completion considerations.

So, that said, then, the design is seeming pretty flexible, then about,
what extension commands might be suitable. Here the idea is about article
transfer and which articles to transfer to other servers. The idea is to
add some X-RETRANSFER-TO command or along these lines,

X-RETRANSFER-TO host [group [dateBegin [dateEnd]]]

then that this simply has the host open a connection to the other host
and offer via IHAVE/CHECK/TAKETHIS all the articles so in the range
or until the connection is closed. This way then, for example, if this
NNTP system was running, and, someone wanted a subset of the articles,
then this command would have them sent out-of-band, or, "automatic out-feed".
Figuring out how to re-distribute or message routing besides simple
message store and retrieval is its own problem.

Another issue is expiry, I don't really intend to delete anything, because
the purpose is archival, but people still use usenet in some corners of
the internet for daily news, again that's its own problem. Handling
out-of-order ingestion with the backfilling or archives as they can be
discovered is another issue, with that basically being about filling a
corpus of the messages, then trying to organize them that the message
date is effectively the original injection date.

Anyways, it proceeds along these lines.

Ross A. Finlayson

Dec 13, 2016, 3:05:13 AM12/13/16
One of the challenges of writing this kind of system
is vending the article-id's (or article numbers) for
each newsgroup of each message-id. The message-id is
received with the article as headers and body, or set
as part of the injection info when the article is posted.
So, vending a number means that there is known a previous
number to give the next. Now, this is clear and simple
in a stand-alone environment, with integer increment or
"x = i++". It's not so simple in a distributed environment,
with that the queuing system does not "absolutely guarantee"
no dupes, with the priority being no drops, and also, the
independent workers A and B can't know the shared value of
x to make and take atomic increments, without establishing
a synchronization barrier, here over the network, which is
to be avoided (eg, blocking and locking on a database's
critical transactional atomic sequence.nextval, with, say,
a higher guarantee of no gaps). So, there is a database
for vending strictly increasing numbers, each group of
an article has a current number and there's an "atomic
increment" feature thus that A working on A' will get
i+1 and B working on B' will get i+2 (or maybe i+3, if
for example the previous edition of B died). If A working
on A' and B working on A' duplicated from the queue get
i+1 and i+2, then, there is as mentioned above a conditional
update to make sure the article number always increases,
so there is a gap from the queue dupe or a gap from the
worker drop, but then A or B has a consistent view of the
article-id of A' or B'.

So, then with having the number, once that's established,
then all's well and good to associate the message-id, and
the article-id.

group: article-id -> message-id
message: groups -> article-ids

Then, looking at the performance, this logical association
is neatly maintainable in the DB tables, with consistent
views for A and B. But it's a limited resource, in this
implementation, there are actually only so many reads and
writes per period. So, workers can steadily chew away the
intake queue, assigning numbers, but then querying for the
numbers is also at a cost, which is primarily what the
reader connections do.

Then, the idea is to maintain the logical associations, of
the message-id <-> article-id, also in a growing file, with
a write-once read-many file about the NFS file system. There's
no file locking, and, writes to the file that are disordered
or contentious could (and by Murphy's law, would) write corrupt
entries to the file. There are various notions of leader election
or straw-pulling for exactly one of A or B to collect the numbers
in order and write them to the article-ids file, one "row" (or 64
byte fixed length record) per number, at the offset 64*number
(as from some 0 or the offset from the first number). But,
consensus and locking for serialization of tasks couples A and B
which are otherwise running entirely independently. So, then
the idea is to identify the next offset for the article-ids file,
and collect a batch of numbers as make a block-sized block of
the NFS implementation (eg 4Kb or 8Kb and hopefully configurably
and not 1Mb which is about 64Kb records of 64b each). So, as
A and B each collect the numbers (and detect if there were gaps
now) then either (or both) completes a segment to append to the
file. There aren't append modes of the NFS files, which is fine
because actually the block now is written to the computed offset,
which is the same for A and B. In the off chance A and B both
make writes, file corruption doesn't follow because it's the
same content, and it's block size, and it's an absolute offset.

So, in this way, it seems that over time, the contents of the DB
are written out to the sequence by article-id of message-id for
each group

group: article-id -> message-id

besides that the message-id folder contains the article-ids

message-id: groups -> article-id

the content of which is known when the article-id numbers for
the groups of the message are vended.

Then, in the usual routine of looking up the message-id or
article-id given the group, the DB table is authoritative,
but, the NFS file is also correct, where a value exists.
(Also it's immutable or constant and conveniently a file.)
So, readers can map into memory the file, and consult the
offset in the file, to find the message-id for the requested
article-id, if that's not found, then the DB table, where it
would surely be, as the message-id had vended an article-id,
before the groups article-id range was set to include the
new article.

When a range of the article numbers is passed, then effectively,
the lookup will always be satisfied by the file lookup instead
of the DB table lookup, so there won't be the cost of the DB
table lookup. In some off chance the open files of the NFS
(also a limited resource, say 32K) are all exhausted, there's
still a DB table to read, that is a limited and expensive
resource, but also elastic and autoscalable.

Anyways, this design issue also has the benefit of keeping it
so that the file system has a convention with that all the data
remains in the file system, with then usual convenience in
backup and durability concerns, while still keeping it correct
and horizontally scalable, basically with the notion of then
even being able to truncate the database in any lull of traffic,
for that the entire state is consistent on the file system.

It remains to be figured out that NFS is OK with writing duplicate
copies of a file block, toward having this highly reliable workflow

That is basically the design issue then, I'm tapping away on this.

Ross A. Finlayson

Dec 14, 2016, 11:31:59 PM12/14/16
Tapping away at this idea of a usenet server system,
I've written much of the read routine that is the
non-blocking I/O with the buffer passing and for the
externally coded data and any different coded data
like the unencrypted or uncompressed. I've quite
settled on 4KiB (2^12B) as the usual buffer page,
and it looks that the NFS offering can be so tuned
that its wsize (write size) is 4096 and with an
async NFS write option that that page size will
have that writes are incorruptible (though for
whatever reason they may be lost), and that 4096B
or 256 entries of 64B (2^6B) for a message-id or oversize-
message-id entry will spool off the message-id's of
the group's articles at an offset in the file that
is article-id * (1 << 6). The MTU of Ethernet packets
is often 1500 so having a wsize of 1KiB is not
nonsensible, as many of the writes are of this
granularity, the MTU might be 9001 or jumbo, which
would carry 2 4KiB NFS packets in one Ethernet packet.
Having the NFS rsize (read size) say 32KiB seems not
unreasonable, with that the reads will be pages of the
article-id's, or, the article contents themselves (split
to headers, xrefs, body) from the filesystem that are
mostly some few key and mostly quite altogether > 32 KiB,
which is quite a lot considering that's less than a JPEG
the size of "this". (99+% of Internet traffic was JPEG
and these days is audio/video traffic, often courtesy JPEG.)

Writing the read routine is amusing me with training the
buffers and it amuses me to write code with quite the
few +1 and -1 in the offsets. Usually having +-1 in
the offset computations is a good or a bad thing, rarely
good, with that often it's a sign that the method signature
just isn't being used quite right in terms of the locals,
if not quite as bad as "build a fence a mile then move it
a foot". When +-1 offsets is a good thing, here the operations
on the content of the buffers are rather agnostic the bounds
and amount of the buffers, thus that I/O should be quite
expedient in the routine.

(Written in Java, it should run quite the same on any
runtime with Java 1.4+.)

That said then next I'm looking to implement the Executor pool.

Acceptor -> Reader -> Scanner -> Executor -> Printer -> Writer

The idea of the Executor pool is that there are many connections
or sessions (the protocol is stateful), then that for one session,
its command's results are returned in order, but, that doesn't say
that the commands are executed in order, just that their results
are returned in order. (For some commands, which affect the state
of the session like current group or current article, that being
pretty much it, those also have to be executed sequentially for
consistency's sake.) So, I'm looking to have the commands be
executed in any possible order, for the usual idea of saturating
the bandwidth of the horizontally scalable backend. (Yeah, I
know NFS has limits, but it's unbounded and durable, and there's
overall a consistent, non-blocking toward lock-free view.)
Anyways, basically the Session has a data structure of its
outstanding commands, as they're enqueued to the task executor,
then whether it can go into the out-of-order pool or must stay
in the serial pool. Then, as the commands complete, or for
example timeout after retries on some network burp, those are
queued back up as the FIFO of the Results and as those arrive
the Writer is re-registered with the SocketChannel's Selector
for I/O notifications and proceeds to fill the socket's output
buffer and retire the Command and Result. One aspect of this
is that the Printer/Writer doesn't necessarily get the data on
the heap, the output for example an article is composed from
the FileChannels of the message-id's header, xref, body. Now,
these days, the system doesn't have much of a limit in open
file handles, but as mentioned above there are limits on NFS
file handles. Basically then the data is retrieved as from the
object store (or here an octet store but the entire contents of
the files are written to the output with filesystem transfer
direct to memory or the I/O channel). Then, releasing the
NFS file handles expeditiously basically is to be figured out
with caching the contents, for any retransmission or simply
serving copies of the current articles to any number of
connections. As all these are, read-only, it looks like the
filesystems' built-in I/O caching with, for example, a read-only
client view and no timeout, basically turns the box into a file
cache, because that is what it is.

Then, it looks like there is a case for separate reader and
writer implementations altogether of the NFS or octet store
(that here is an object store for the articles and their
sections, and an octet store for the pages of the tables).
This is with the goal of minimizing network access while
maintaining the correct view. But, an NFS export can't
be mounted twice from the same client (one for reads and
one for writes), and, while ingesting the message can be
done separately the client, intake has to occur from the
client, then what with a usual distributed cloud queue
implementation having size and content limits, it seems
like it'll be OK.

Ross A. Finlayson

Dec 17, 2016, 5:58:16 PM12/17/16
On Tuesday, December 13, 2016 at 12:05:13 AM UTC-8, Ross A. Finlayson wrote:
> That is basically the design issue then, I'm tapping away on this.

The next thing I'm looking at is how to describe the "range",
as a data structure or in algorithms.

Here a "range" class in the runtime library is usually a
"bounds" class. I'm talking about a range, basically a
1-D range, about basically a subset of the integers,
then that the range is iterating over the subset in order,
about how to maintain that in the most maintainable and
accessible terms (in computational complexity's space and time

So, I'm looking to define a reasonable algebra of individuals,
subsets, segments, and rays (and their complements) that
naturally compose to objects with linear maintenance and linear
iteration and constant access of linear partitions of time-
series data, dense or sparse, with patterns and scale.

This then is to define data structures as so compose that
given a series of items and a predicate, establish the
subset of items as a "range", that then so compose as
above (and also that it has translations and otherwise
is a fungible iterator).

I don't have one of those already in the runtime library.

punch-out <- punches have shapes, patterns? eg 1010
knock-out <- knocks have area
pin-out <- just one
drop-out <-
fall-out <- range is out

Then basically there's a coalescence of all these,
that they have iterators or mark bounds, of the
iterator of the natural range or sequence, for then
these being applied in order

push-up <- basically a prioritization
fill-in <- for a "sparse" range, like the complement upside-down

Then all these have the basic expectation that a range
is the combination of each of these that are expressions
then that they are expressions only of the value of the
iterator, of a natural range.

Then, for the natural range being time, then there is about
the granularity or fine-ness of the time, then that there is
a natural range either over or under the time range.

Then, for the natural range having some natural indices,
the current and effective indices are basically one and
zero based, that all the features of the range are shiftable
or expressed in terms of these offsets.

0 - history

a - z


Whether there are pin-outs or knock-outs rather varies on
whether removals are one-off or half-off.

Then, pin-outs might build a punch-out,
While knock-outs might build a scaled punch-out

Here the idea of scale then is to apply the notions
of stride (stripe, stribe, striqe) to the range, about
where the range is for example 0, 1, .., 4, 5 .., 8, 9
that it is like 1, 3, 5, 7 scaled out.

Then, "Range" becomes quite a first-class data structure,
in terms of linear ranges, to implement usual iterators
like forward ranges (iterators).

Then, for time-forward searches, or to compose results in
ranges from time-forward searches, without altogether loading
into memory the individuals and then sorting them and then
detecting their ranges, there is to be defined how ranges
compose. So, the Range includes a reference to its space
and the Bounds of the Space (in integers then extended
precision integers).

"Constructed via range, slices, ..." (gslices), ....

Then, basically I want that the time series is a range,
that expressions matching elements are dispatched to
partitions in the range, that the returned or referenced
composable elements are ranges, that the ranges compose
basically pair-wise in constant time, thus linearly over
the time series, then that iteration over the elements
is linear in the elements in the range, not in the time
series. Then, it's still linear in the time series,
but sub-linear in the time series, also in space terms.

Here, sparse or dense ranges should have the same small-
linear space terms, with there being maintenance on the
ranges, about there being hysteresis or "worst-case 50/50"
(then basically some inertia for where a range is "dense"
or "sparse" when it has gt or lt .5 elements, then about
where it's just organized that way because there is a re-

So, besides composing, then the elements should have very
natural complements, basically complementing the range by
taking the complement of the ranges parts, that each
sub-structure has a natural complement.

Then, pattern and scale are rather related, about figuring
that out some more, and leaving the general purpose, while
identifying the true primitives of these.

Then eventually there attachment or reference to values
under the range, and general-purpose expressions to return
an iteration or build a range, about the collectors that
establish where range conditions are met and then collapse
after the iteration is done, as possible.

So, there is the function of the range, to iterate, then
there is the building of the range, by iterating. The
default of the range and the space is its bounds (or, in
the extended, that there are none). Then, segments are
identified by beginning and end (and perhaps a scale, about
rigid translations and about then that the space is
unsigned, though unbounded both left and right see
some use). These are dense ranges, then for whether the
range is "naturally" or initially dense or sparse. (The
usual notion is "dense/full" but perhaps that's as
"complement of sparse/empty".) Then, as elements are
added or removed in the space, if they are added range-wise
then that goes to a stack of ranges that any forward
iterator checks before it iterators, about whether the
natural space's next is in or out, or, whether there is
a skip or jump, or a flip then to look for the next item
that is in instead of out.

This is where, the usual enough organization of the data
as collected in time series will be bucketed or partitioned
or sharded into some segment of the space of the range,
that buiding range or reading range has the affinity to
the relevant bucket, partition, or shard. (This is all
1-D time series data, no need to make things complicated.)

Then, the interface basically "builds" or "reads" ranges,
building given an expression and reading as a read-out
(or forward iteration), about that then the implementation
is to compose the ranges of these various elements of a
topological sort about the bounds/segments and scale/patterns
and individuals.

This is interesting, for an algebra of intervals, or
segments, but here so far I'd been having that the
segments of contiguous individuals are eventually
just segments themselves, but composing those would
see the description as of this algebra. Clearly the
goal is the algebra of the contents of sets of integers
in the integer spaces.

An algebra of sets and segments of integers in integer spaces

An integer space defines elements of a type that are ordered.

An individual integer is an element of this space.

A set of integers is a set of integers, a segment of integers
is a set containing a least and greatest element and all elements
between. A ray of integers of a set containing a least element
and all greater elements or containing a greatest element and
all lesser elements.

A complement of an individual is all the other individuals,
a complement of a set is the intersection of all other sets,
a complement of a segment is all the elements of the ray less
than and the ray greater than all individuals of the segment.

What are the usual algebras of the compositions of individuals,
sets, segments, and rays?

Then basically all kinds of things that are about subsets
of thing in a topological or ordered space should basically
have a first-class representation as (various kinds of)
elements in the range algebra.

So, I'm wondering what there is already for
"range algebra" and "range calculus".

Ross A. Finlayson

Dec 18, 2016, 8:48:15 PM12/18/16
Some of the features of this subsets of a
range of integers is available as a usual
bit vector, eg with ffs ("find-first-set")
memory scan instructions memory scan instructions,
and as well usual notions of compressed bitmap
indices, with some notion of random access to
the value of a bit by its index and variously
iterating over the elements. Various schemes
to compress the bitmaps down to uncompressed
regions with representing words' worths of bits
may suit parts of the implementation, but I'm
looking for a "pyramidal" or "multi-resolution"
organization of efficient bits, and also flags,
about associating various channels of bits with
the items or messages.

Then, with having narrowed down the design for
what syntax to cover, and, mostly selected data
structures for the innards, then I've been looking
to the data throughput, then some idea of support
of client features.

Throughput is basically about how to keep the
commands moving through. For this, there's a
single thread that reads off the network interface'
I/O buffers, it was also driving the scanner, but
adding encryption and compression layers, then there's
also adding a separate thread to drive the scanner
thus that the network interface is serviced on demand.
Designing a concurrent data structure basically has
a novel selector (as of the non-blocking I/O) to
then pick off a thread from the pool to run the
scanner. Then, on the "printer" side and writing
off to the network interface, it is similar, with
having the session or connection's resources run
the compression and encryption, then for the I/O
thread as servicing the network interface. Basically
this is having put a collator/relay thread between
the I/O threads and the scanner/printer threads
(where the commands are run by the executor pool).

Then, a second notion has been the support of TLS.
It looks I would simply sign a certificate and expect
users to check and install it themselves in their
trust-store for SSL/TLS. That said, it isn't really
a great solution, because, if someone compromises any
of the CA's, certificate authorities, in the trust
store (any of them), then a man-in-the-middle could
sign a cert, and it would be on the server to check
that the content hash reflected the server cert from
the handshake. What might be better would be to have
that each client, signs their own certificate, for the
server to present. This way, the client and server
each sign a cert, and those are exchanged. When the
server gets the client cert, it restarts the negotiation
now with using the client-signed cert as the server
cert. This way, there's only a trust anchor of depth
1 and the trust anchors are never exchanged and can
not be cross-signed nor otherwise would ever share
a trust root. Similarly the server get's the server-
signed cert back from the client then that TLS could
proceed with a session ticket and that otherwise there
would be a stronger protection from compromised CA
certs. Then, this could be pretty automatic with
a simple enough browser interface or link to set up TLS.
Then the server and client would only trust themselves
and each other (and keep their secrets private).

Then, for browsing, a reading of IMAP, the Internet
Message Access Protocol, shows a strong affinity with
the organization of Usenet messages, with newsgroups
as mailboxes. As well, implementing an IMAP server
that is backed by the NNTP server has then that the
search artifacts and etcetera (and this was largely
a reason why I need this improved "range" pattern)
would build for otherwise making deterministic date-
oriented searches over the messages in the NNTP server.
IMAP has a strong affinity with NNTP, and is a very
similar protocol and is implemented much the same
way. Then it would be convenient for users with
an IMAP client to simply point to ""
or what and get usenet through their email browser.

Ross A. Finlayson

Dec 24, 2016, 1:21:16 AM12/24/16
About implementing usenet with reasonably
modern runtimes and an eye toward
unlimited retention, basically looking
into "microtasks" for the routine or
workflow instances, as are driven with
non-blocking I/O throughout, basically
looking to memoize the steps as through
a finite state machine, for restarts as
of a thread, then to go from "service
oriented" to "message oriented".

This involves writing a bit of an
HTTP client for rather usual web
service calls, but with high speed
non-blocking I/O (less threads, more
connections). Also this involves a
sufficient abstraction.

Ross A. Finlayson

Jan 6, 2017, 4:57:00 PM1/6/17
This writing some software for usenet service
is coming along with the idea of how to implement
the fundamentally asynchronous non-blocking routine.
This is crystallizing in pattern as a: re-routine,
in reference to computing's usual: co-routine.

The idea of the re-routine is that there are only
so many workers, threads, of the runtime. The usual
runtimes (and this one, Java, say) support preemptive
multithreading as a means of implementing cooperative
multithreading, with the maintenance of separate stacks
(of, the stack machine of usual C-like procedural runtimes)
and some thread-per-connection model. This is somewhat
reasonable for the composition of blocking APIs, but
not so much for the composition of non-blocking APIs
and about how to not have many thread-per-connection
resources with essentially zero duty cycle that instead
could maintain for themselves the state machine of their
routine (with simplified forward states and a general
exception and error routine), for cooperative multi-threading.

The idea of this re-routine then is to connect functions,
there's a scope for variables in the scope, there is
execution of the functions (or here the routines, as
the "re-routines") then the instance of the re-routine
is re-entrant in the sense that as partial results are
accumulated the trace of the routine is marked out, with
leaving in the scope the current or partial or intermediate
results. Then, the asynchronous workers that fulfill each
routine (eg, with a lookup, a system call, or a network
call) are separate worker units dedicated to their domain
(of the routine, not the re-routine, and they can be blocking,
polling for their fleet, or callback with the ticket).

Then, this is basically a network machine and protocol,
here about NNTP and IMAP, and its resources are often
then of network machines and protocols (eg networked
file systems, web services). Then, these "machines"
of the "re-routine" being built (basically for the
streaming model instead of the batch model if you
know what I'm talking about) defining the logical
outcomes of the composition of the inputs and the
resulting outputs in terms of scopes as a model of
the cooperative multithreading, these re-routines
then are seeing for the pattern then that the
source template is about implicitly establishing
the scope and the passing and calling convention
(without a bunch of boilerplate or "callback confusion",
"async hell"). This is where the re-routine, when
a routine worker fills in a partial result and resubmits
the re-routine (with the responsibility/ownership of
the re-routine) that it is re-evaluated from the beginning,
because it is constant linear in reading forward for the
item the state of its overall routine, thusly implicit
without having to build a state machine, as it is
declaratively the routine.

So, I am looking at this as my solution as to how to
establish a very efficient (in resource and performance
terms) formally correct protocol implementation (and
with very simple declarative semantics of usual forward,
linear routines).

This "re-routine" pattern then as a model of cooperative
multithreading sees the complexity and work into the
catalog of blocking, polling, and callback support,
then for usual resource injection of those as all
supported with references to usual sequential processes
(composition of routine).

Ross A. Finlayson

Jan 21, 2017, 5:33:23 PM1/21/17
I've about sorted out how to implement the re-routine.

Basically a re-routine is a suspendable composite
operation, with normal declarative flow-of-control
syntax, that memo-izes its partial results, and
re-executes the same block of statements then to
arrive at its pause, completion, or exit.

Then, the command and executor are passed to the
implementation that has its own (or maybe the
same) execution resources, eg a thread or connection
pool. This resolves the value of the asynchronous
operation, and then re-submits the re-routine to
its originating executor. The re-routine re-runs
(it runs through the branching or flow-of-control
each time, but that's small in the linear and all
the intermediate products are already computed,
and the syntax is usual and in the language).
The re-routine then either re-suspends (as it
launches the next task) or completes or exits (errors).
Whether it suspends, completes or exits, the
re-routine just returns, and the executor then
is specialized and just checks the re-routine
whether it's suspended (and just drops it, the
new responsible launched will re-submit it),
or whether it's completed or errored (to call
back to the originating commander the result of
the command).

In this manner, it seems like a neat way to basically
establish the continuation, for this "non-blocking
asynchronous operation", while at the same time
the branching and flow of control is all in the
language, with the usual un-suprising syntax and
semantics, for cooperative multi-threading. The
cost is in wrapping the functional callers of the
routine and setting up their factories and otherwise
as via injection (and they can block the calling
thread, or have their own threads and block, or
be asynchronous, without changing the definition
of the routine).

So, having sorted this mostly out, then the usual
work as of implementing the routines for the protocol
can so proceed then with a usual notion of a framework
of support for both the simple declaration of routine
and the high performance (and low resource usage) of
the delegation of routine, and support for injection
for test and environment, and all in the language
with minimal clutter, no byte-code modification,
and a ready wrapper for libraries of arbitrary
run-time characteristic.

This solves some problems.

j4n bur53

Jan 22, 2017, 3:50:02 PM1/22/17
Try this one maybe:

Ross A. Finlayson schrieb:

John Gabriel

Jan 22, 2017, 5:01:21 PM1/22/17
Yes please. Take all your fellow cranks with you to a new usenet. The sooner the better.

Ross A. Finlayson

Jan 22, 2017, 8:01:13 PM1/22/17
No, thanks, that does not appear to meet my requirements.

j4n bur53

Jan 22, 2017, 8:35:57 PM1/22/17
Something else, what are your hardware specs?

An Inferno on the Head of a Pin

Ross A. Finlayson

Jan 22, 2017, 8:57:42 PM1/22/17
Thanks for your interest, if you read the thread,
I'm talking about an implementation of usenet,
with modern languages and runtimes, but, with
a filesystem convention, and a distributed redundant
store, and otherwise of very limited hardware and
distributed software resources or the "free tier"
of cloud computing (or, any box).

When it comes to message formats, usenet isn't
limited to plain text, it's as simply usual
MIME multimedia. (The user-agent can render
text however it would so care.)

A reputation system is pretty simply implemented
with forwarding posts to various statistics groups
that over time build profiles of authors that
readers may adopt.

Putting an IMAP interface in front of a NNTP gateway
makes it pretty simple to have cross-platform user
interfaces from any IMAP (eg, email) client.

Then, my requirements include backfilling a store
with the groups of interest for implementing summary
and search for archival and research purposes.

Ross A. Finlayson

Jan 22, 2017, 9:03:53 PM1/22/17
(About the 2nd law of thermodynamics, Moore's
law, and the copper process with regards to the
cross-talk about the VLSI or "ultra" VLSI or
the epoch these days, and burning bits, what
you might if interest is the development of
the "reversible computing", which basically
recycles the bits, and then also that besides
the usual electronic transistor, and besides that
today there can be free-form 3-D IC's or "custom
logic", instead of just the planar systolic clock-
driven chip, there are also "systems on chip" with
regards to electron, photon, and heat pipes as
about the photo-electic and Seebeck/Peltier,
with various remarkably high efficiency models
of computation, this besides the very novel
serial and parallel computational units and
logical machines afforded by 3-D IC' and optics.

About "reasonably simple declaration of routine
in commodity languages on commodity hardware
for commodity engineers for enduring systems",
at cost, see above.)

Ross A. Finlayson

Feb 7, 2017, 3:16:14 AM2/7/17
Not _too_ much progress, has basically seen the adaptation
of this re-routine pattern to the command implementations,
with basically usual linear procedural logic then the
automatic and agnostic composition of the asynchronous
tasks in the usual declarative syntax that then the
pooled (and to be metered) threads are possibly by
design entirely non-blocking and asynchronous, and
possibly by design blocking or otherwise agnostic of
implementation, with then the design of the state
machine of the routine as "eventually consistent"
or forward and making efficient use of the computational
and synchronization resources.

The next part has been about implementing a client "machine"
as complement to the server "machine", where a machine here
is an assembly as it were of threads and executors about the
"reactive" (or functional, event-driven) handling of the
abstract system resources (small pojos, file name, and
linked lists of 4K buffers). The server basically starts
up listening on a port then accepts and starts a session
for any connection and then a reader fills and moves buffers
to each of the sessions of the connections, and signals the
relay then for the scanning of the inputs and then composing
the commands and executing those as these re-routines, that
as they complete, then the results of the commands are then
printed out to buffers (eg, encoded, compressed, encrypted)
then the writer sends that back on the wire. The client
machine then is basically a model of asynchronous and
probably serial computation or a "web service call", these
days often and probably on a pooled HTTP connections. This
then is pretty simple with the callbacks and the addressing/
routing of the response back to the re-routine's executor
to then re-submit the re-routine to completion.

I've been looking at other examples of continuations, the
"reactive" programming or these days' "streaming model"
(where the challenge is much in the aggregations), that
otherwise non-blocking or asynchronous programming is
often rather ... recursively ... rolled out where this
re-routine gains even though the flow-of-control is
re-executed over the memoized contents of the re-routines
as they are so composed declaratively, that this makes
what would be "linear" at worst "n squared", but that is
only on how many commands there are in the procedure,
not combined over their execution because all the
intermediate results are memoized (as needed, because
if the implementation is local or a mock instead, the
re-routine is agnostic of asychronicity and just runs
through linearly, but the relevant point is that the
number of composable units is a small constant thus
that it's square is a small constant, particularly
as otherwise being a free model of cooperative multi-
threading, here toward a lock-free design). All the
live objects remain on the heap, but just the objects
and not for example the stack as a serialized continuation.
(This could work out to singleton literals or "coding"
but basically it will have to auto-throttle off heap-max.)

So, shuffling and juggling the identifiers and organizations
around and sifting and sorting what elements of the standard
concurrency and functional libraries (of, the "Java" language)
to settle on for usual neat and concise (and re-usable and
temporally agnostic) declarative flow-of-control (i.e., with
"Future"'s everywhere and as about reasonable or least-surprising
semantics, if any, with usual and plain code also being "in
the convention"), then it is settling on a style.

Well, thanks for reading, it's a rather stream-of-consciousness
narrative, here about the design of pretty re-usable software.

Julio Di Egidio

Feb 7, 2017, 4:05:47 AM2/7/17
On Tuesday, February 7, 2017 at 9:16:14 AM UTC+1, Ross A. Finlayson wrote:

> Not _too_ much progress, has basically seen the adaptation
> of this re-routine pattern to the command implementations,

I do not understand what you are trying to achieve here. As long as Usenet the
protocol is fine per se, the technical problem at least is already solved, i.e.
there is plenty of Usenet server software available... OTOH, the "problem with
Usenet" such that one would want to build an entirely new network seems to me
is more of a socio-cybernetic kind, so I'd rather find interesting discussing,
say, the merits but also the limitations of moderation as an approach, and maybe
even what better could be done. But, again, the technical problem is not really
a problem, in fact that is the easy part....

(Also, I do not see why discuss this in sci.math. Maybe, as
for collective intelligence?)


Ross A. Finlayson

Feb 7, 2017, 2:18:07 PM2/7/17
Sure, I'll limit this.

There is plenty of usenet server software, but it is mostly
INND or BNews/CNews, or a few commercial cousins. The design
of those systems is tied to various economies that don't so much
apply these days. (The use-case, of durable distributed message-
passing, is still quite relevant, and there are many ecosystems
and regimes small and large as about it.) In the days of managed
commodity network and compute resources or "cloud computing", here
as above about requirements, then a modernization is relevant, and
for some developers with the skills, not so distant.

Another point is that the eventual goal is archival, my goal isn't
to start an offshoot, instead to build the system as a working
model of an archive, basically from the author's view as a working
store for extracting material, and from the developer's view as
an example in design with low or no required maintenance and
"scalable" operation for a long time.

You mention, these days there's a lot more
automated reasoning (or, mockingbird generators), as computing
and development affords more and different forms of automated
reasoning, here again the point is for an archival setting to
give them something to read.

Thanks, then, I'll limit this.

Julio Di Egidio

Feb 9, 2017, 2:00:32 AM2/9/17
On Tuesday, February 7, 2017 at 8:18:07 PM UTC+1, Ross A. Finlayson wrote:

> There is plenty of usenet server software, but it is mostly
> INND or BNews/CNews, or a few commercial cousins.

There is plenty of free and open news server software:

> Another point is that the eventual goal is archival, my goal isn't
> to start an offshoot, instead to build the system as a working
> model of an archive, basically from the author's view as a working
> store for extracting material,

I'd have qualms as to what the degree-zero is, namely, I'd think more of hyper-
texts hence a Wiki (or, in the larger, the web itself) as the basic structure.
OTOH, Usenet is a conversational model, for discussions, not even forums.

Regardless, even at that most basic level, you already face the fundamental
problem of the "quality" of the content (for some to be properly defined notion
of quality). For one thing, consider that garbage is garbage even under the
best microscope...

> You mention, these days there's a lot more

I mentioned partly because I do not have a better reference,
partly because, for how basic you want to keep it (and I am all for building
incrementally), I would think it is only considerations at that level that can
provide the fundamental requirements.


Ross A. Finlayson

Mar 21, 2017, 7:10:21 PM3/21/17
I continued tapping away at this.

The re-routines now sit beyond a module or domain definition.
This basically defines the modules' value types like session,
message, article, group, content, wildmat. Then, it also
defines a service layer, as about the relations of the elements
of the domain, so that then the otherwise simple value types
have natural methods as relate them, all implemented behind
a service layer, that implemented with these re-routines is
agnostic of synchronous or asynchronous convention, and
is non-blocking throughout with cooperative multithreading.
This has a factory of factories or industry pattern that provides
the object graph wiring and dynamic proxying to the routine
implementations, that are then defined as traits, that the re-
routine composes the routines as mixins (of the domain's

(This is all "in the language" in Java, with no external dependencies.)

The transport mechanism is basically having abstracted the
attachment for a usual non-blocking I/O framework for the
transport types as of the scattering/gathering or vector I/O
as about then the interface between transport and protocol
(here NNTP, but, generally). Basically in a land of 4K byte buffers,
then those are fed from the Reader/Writer that is the endpoint to
a Feeder/Scanner that is implemented for the protocol and usual
features like encryption and compression, then making Commands
and Results out of those (and modelling transactions or command
sequences as state machines which are otherwise absent), those
systolically carrying out as primitive or transport types to a Printer/
Hopper, that also writes the response (or rather, consumes the buffers
in a highly concurrent highly efficient event and selection hammering).
The selector is another bounded resource, so it's configurable the
SelectorAssignment and there might be a thread for each group of
selectors about FD_SETSIZE, but that's not really at issue as select
went to epoll, but provides an option for that eventuality.

The transport and protocol routines are pretty well decoupled this
way, and then the protocol domain, modules, and routines are as
well so decoupled (and fall together pretty naturally), much using
quite usual software design patterns (if not necessarily so formally,
quite directly).

The protocol then (here NNTP) then is basically in a few files detailing
the semantics of the commands to the scanner as overriding methods
of a Command class, and implementing the action in the domain from
extending the TraitedReRoutine then for a single definition in the NNTP
domain that is implemented in various modules or as collections of services.

Ross A. Finlayson

Apr 9, 2017, 11:20:50 PM4/9/17
I'm still tapping away at this if rather more slowly (or, more sporadically).

The "re-routine" async completion pattern is more than less
figured out (toward high concurrency as a model of cooperative
multi-threading, behind also a pattern of a domain layer, with mix-in
nyms that is also some factory logic), a simple non-blocking I/O socket
service routine is more than less figured out (the server not the client,
toward again high concurrency and flexible and efficient use of machine
or virtualized resources as they are), the commands and their bodies are
pretty much typed up, then I've been trying to figure out some data
structures basically in I/O (Input/Output), or here mostly throughput
as it is about the streams.

I/O datum FIFOs and holders:

buffer queue
handles queue
buffer+handles queue
buffer/buffer[] or buffer[]/buffer in loops
byte[]/byte[] in steps
Input/Output in Streams

Basically any of the filters or adapters is specialized to these input/output
data holders. Then, there are logically enough queues or FIFOs as there are
really implicitly between any communicating sequential processes that are
rate-limited or otherwise non-systolic ("real-time"), here for some ideas about
data structures, as either implement or adapt unbounded single producer/
single consumer (SPSC) queues.

One idea is the making the linked container with then sentinel nodes
and otherwise making it thread-safe (for a single producer and single
consumer). This is where the queue (or, "monohydra" or "slique") is
rather generally a container, and that here iterations are usually
consuming the queue, but sometimes there are aggregates collected
then to go over the queue. The idea then is that the producer and
consumer have separate views of the queue that the producer does
atomic swap on the tail of the queue and that a consumer's iterator
of elements (as iterable and not just a queue, for using the queue as
a holder and not just a FIFO) returns a marker to the end of the iteration,
for example in computing bounds over the buffers then re-iterating and
flipping the buffers then given the bounds moving the buffers' references
to an output array thus consuming the FIFO.

This then combines with the tasks that the tasks driving the I/O (as events
drive the tasks) are basically constant tasks or runnables (constant to the
session or attachment) that just have incremented a count of times to run
thus that there's always a service of the FIFO after the atomic append.

Another idea is this hybrid or serial mix-and-match (SPSC FIFO), of buffers
and handles. This is where the buffer in the data in-line, the handle is a
reference to the data. This is about passing through the handles where
the channels support their transfer, and converting them to inline data
where they don't. That's then about all the combined cases as the above
I/O datum FIFOs and holders, with adapting them so the filter chain blasts
(eg specialized operation), loops (transferring in and out of buffers), steps
(statefully filling and levelling data), or moves (copying the references, the
data in or out or on or off, then to perform the I/O operations) over them.

It seems rather simpler to just adapt the data types to the boundary I/O data
types which are byte buffers (here size-4K pooled memory buffers) and for
that the domain shouldn't know concrete types so much as interfaces, but
the buffers and handles (file handles) and arrays as they are are pretty much
fungible to the serialization of the elements of the domain, that can then
specialize how they build logical inputs and outputs of the commands.

Apr 10, 2017, 8:18:09 AM4/10/17
You could use camel.

Camel is a rule-based routing and mediation engine that provides a
object-based implementation of the Enterprise Integration Patterns
using an application programming interface (or declarative domain-
specific language) to configure routing and mediation rules.

Its name is derived from the camel humps, since the pakets
might take flippy-floppy routes. It also provides automatic
integration of the Gamma Functions, so that Archies post could
be automatically verified whether he

computes the factorial correctly.

Ross A. Finlayson

May 19, 2017, 1:40:27 AM5/19/17
I haven't much worked on this.

Message has been deleted

Jul 16, 2017, 3:35:17 PM7/16/17
Doing an NNTP server could be complicated, on the other
hand it seems to be specified what to do, you might not
need to invent much by yourself. You could start with a
WILDMAT matcher, the rest doesn't look so difficult:

[C] NEWNEWS news.*,sci.* 19990624 000000 GMT
[S] 230 list of new articles by message-id follows
[S] <>
[S] <>
[S] .

Except for the storage & concurrency problem...
But the vocabulary of the server should be reflected
somewhere, so that we can do the next step:

Call by Meaning Hesam Samimi, Chris Deaton,
Yoshiki Ohshima, Allesandro Warth, Todd Millstein
VPRI Technical Report TR-2014-003

The goal would be to have a server that we could
ask "when did BKK the last time hold a torch for
Pythagoras" or ask "is it the first time that AP
reinvents Maxwell equations". Etc...

Ross A. Finlayson

Jul 16, 2017, 6:12:38 PM7/16/17
Implementing search is rather a challenge.

Besides accepter/rejector and usual notions of matching
(eg the superscalar on closed categories), find and query
seems for where besides usual notions of object hashes
as indices that there is to be built up from the accepter/
rejector all sorts of indices as do/don't/don't-matter the
machines of the accepters and rejectors, vis-a-vis going
over input data and the corpus and finding relations (to
the input, or here space of inputs), of the corpus.

That's where, after finding an event for AP, whether
you're interested in the next for him or the first
for someone else. There are quite various ways to
achieve those quite various goals, besides computing
the first goal. Just as an example that's, for example,
the first reasonable AP Maxwell equation (or reference)
or for everybody else, like, who knows about the Maxwell

Search is a challenge, NNTP rather puts it off to IMAP first
for free text search, then for the concept search or
"call by meaning" you reference, basically refining
estimates of the scope of what it takes to find out
what that is.

Then for events in time-series data there's a usual general
model for things as they occur. That could be rather
rich and where causal is separate from associative
(though of course casuality is associative).

With the idea of NNTP as a corpus, then a usual line
for establishing tractability of search is to associate
its contents some document then semantic model i.e.,
then to generate and maintain that besides otherwise
that the individual items or posts and their references
in the meta-data besides the data are made tractable
then for general ideas of things.

I'm to get to this, the re-routine particularly amuses
me as a programming idiom in the design of more-or-less
detached service routine from the corpus, then about
what body of data so more-than-less naturally results,
with rather default and usual semantics.

Such "natural language" meaning as can be compiled for
efficiency to the very direct in storage and reference,
almost then asks "what will AP come up with, next".

Jul 16, 2017, 6:25:00 PM7/16/17
The search could be a nice benchmark for next
generation commodity i9 CPUs with 20 Logical Cores:

The key for search is a nice index, I recently
experimented with a n-gram index, I did only trigrams.
If your text contains foobar, you make an inverted
index that lists your document for the following keys


When you search foobar, you lookup "foo" and "bar". When
you search a pattern like *ooba* you lookup "oob" and "a".
Works quite well. I dunno what Elasticsearch, Solr, etc..

exactly do, they are open source, but the stuff is still
obsfuscated for me. But they are quite popular. They might
do the same, i.e. an index that doesn't need word boundaries,
and that works similarly for chinese and german, and could

also query mathematical symbols etc.. Some confused marketing
guys recently called these text indexes already databases:
"Elasticsearch moved into the top 10 most popular database
management systems":

RF, I noticed you mentioned search in some of your past
posts about a server yourself.

Ross A. Finlayson

Jun 30, 2020, 12:24:53 AM6/30/20
I haven't much worked on this. The idea of the industry
pattern and for the re-routine makes for quite a bit simply
the modules in memory or distributed and a default free-threaded

Search you mentioned and for example HTTP is adding the SEARCH verb,
for example simple associative conditions that naturally only combine,
and run in parallel, there are of course any number of whatever is the
HTTP SEARCH implementations one might consider, here usenet's is
rudimentary where for example IMAP over it is improved, what for
contextual search and content representation.

Information retrieval and pattern recognition and all that is
plenty huge, here that terms define the corpus.

My implementation of the high-performance selector routine,
the networking I/O selector, with this slique I implemented,
runs up and fine and great up to thousands of connections,
but, it seems like running the standard I/O and non-blocking
I/O in the same actual container, makes that I implemented
the selecting hammering non-blocking I/O toward the 10KC,
though it is is small blocks because here the messages are
small, then for under what conditions it runs server class.

With the non-blocking networking I/O, the scanning and parsing
that assembles messages off the I/O, and that's after compression
and encryption in the layers, that it's implemented in Java and
Java does that, then inside that all the commands in the protocol
then have their implementations in the re-routine, that all
non-blocking itself and free-threaded, makes sense for
co-operative multithreading, of an efficient server runtime
with here the notion of a durable back-end (or running in memory).

Mostowski Collapse

Jun 30, 2020, 1:00:52 PM6/30/20
NNTP is not HTTP. I was using bare metal access to
usenet, not using Google group, via:, unfortunately dead since Corona

So was looking for an alternative. And found this
alternative, which seems fine:

Have Fun!

P.S.: Technical spec of

Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
Standort: 2x Falkenstein, 1x New York

advantage of bare metal usenet,
you see all headers of message.

Mostowski Collapse

Jun 30, 2020, 1:05:51 PM6/30/20
And you dont have posting maximums
and wait times dictated by google.

Mostowski Collapse

Jun 30, 2020, 1:17:47 PM6/30/20
If you would make a HTTP front end, you
would not necessarely need a new HTTP method

SEARCH. You can use the usual GET with a
query part as in this fictive example:<search term>

Mostowski Collapse schrieb:

Mostowski Collapse

Jun 30, 2020, 1:20:53 PM6/30/20
Maybe as a backend, you could use something
from this project:

Apache James, a.k.a. Java Apache Mail Enterprise
Server or some variation thereof, is an open source
SMTP and POP3 mail transfer agent and NNTP news
server written entirely in Java.

The project seems to be still alive:

Version 3.3.0 was released in March 26, 2019.

I dunno whether Apache James already delivers
some added value, in that it provides some
search. If not you need a second backend

for the search.

Mostowski Collapse

Jun 30, 2020, 1:27:41 PM6/30/20

"Apache James Server 3.1.0 and following versions require
Java 1.8. A migration guide for users willing to upgrade
from 2.3 to 3.4.0 is available. If relying on Guice wiring,
you can use some additional components
(Cassandra, **ElasticSearch**, ...)."

Ross A. Finlayson

Nov 16, 2020, 8:00:51 PM11/16/20
In traffic there are two kinds of usenet users,
viewers and traffic through Google Groups,
and, USENET. (USENET traffic.)

Here now Google turned on login to view their
Google Groups - effectively closing the Google Groups
without a Google login.

I suppose if they're used at work or whatever though
they'd be open.

Where I got with the C10K non-blocking I/O for a usenet server,
it scales up though then I think in the runtime is a situation where
it only runs epoll or kqueue that the test scale ups, then at the end
or in sockets there is a drop, or it fell off the driver. I've implemented
the code this far, what has all of NNTP in a file and then the "re-routine,
industry-pattern back-end" in memory, then for that running usually.

(Cooperative multithreading on top of non-blocking I/O.)

Implementing the serial queue or "monohydra", or slique,
makes for that then when the parser is constantly parsing,
it seems a usual queue like data structure with parsing
returning its bounds, consuming the queue.

Having the file buffers all down small on 4K pages,
has that a next usual page size is the megabyte.

Here though it seems to make sense to have a natural
4K alignment the file system representation, then that
it is moving files.

So, then with the new modern Java, it that runs in its own
Java server runtime environment, it seems I would also
need to see whether the cloud virt supported the I/O model
or not, or that the cooperative multi-threading for example
would be single-threaded. (Blocking abstractly.)

Then besides I suppose that could be neatly with basically
the program model, and its file model, being well-defined,
then for NNTP with IMAP organization search and extensions,
those being standardized, seems to make sense for an efficient
news file organization.

Here then it seems for serving the NNTP, and for example
their file bodies under the storage, with the fixed headers,
variable header or XREF, and the message body, then under
content it's same as storage.

NNTP has "OVERVIEW" then from it is built search.

Let's see here then, if I get the load test running, or,
just put a limit under the load while there are no load test
errors, it seems the algorithm then scales under load to be
making usually the algorithm serial in CPU, with: encryption,
and compression (traffic). (Block ciphers instead of serial transfer.)

Then, the industry pattern with re-routines, has that the
re-routines are naturally co-operative in the blocking,
and in the language, including flow-of-control and exception scope.

So, I have a high-performance implementation here.

Ross A. Finlayson

Nov 16, 2020, 8:39:08 PM11/16/20