when dealing with data structures other than arrays such as trees, graphs,
stacks, linked lists what other programming language do you resort to ?
Or do stick with J for all endeavours?
Jimmy
eg. https://code.jsoftware.com/wiki/Essays/Tree_Sum
Sat, 16 Nov 2019, Jimmy Gauvin написал(а):
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
Put different: I think you are asking the wrong question.
(Partially: it's worth thinking about why you pick whichever data
structures...)
((It can also sometimes be useful to look on rosettacode for examples of
various daya structure handling mechanisms.))
Thanks,
--
Raul
|
|
|
| | |
|
|
|
| |
ORDINAL FRACTIONS - the algebra of data
This paper was submitted to the 10th World Computer Congress, IFIP 1986 conference, but rejected by the referee....
--
Devon McCormick, CFA
Quantitative Consultant
http://www.statemaster.com/encyclopedia/Ordinal-fraction
Bo.
Trees can be represented as boxes containing subtrees. That works, but
is usually more trouble than simply managing an array.
Linked lists are used only for efficiency, and in the cases where that
matters you can easily have a list of indexes to an array of data items.
Stacks are just lists, as Devon said.
The datatype I really want is a directory object that acts as an
efficient and easy-to-use associative memory. You put key/values in and
then retrieve a value by presenting its key. Has anyone written an
addon for that?
(Note: the primitive 128!:8 (create a hash for a noun) was added to
J9.01 with this in mind)
Henry Rich
I tried a simple approach splitting huge arrays in two to insert data and
it took forever to execute.
Using a simple simulation of a linked lisI decreased execution time by
orders of magnitude (2 minutes versus 3 days !).
Viewing arrays as memory, as suggested by Raul, is probably the appropriate
mindset.
Henry Rich
Sun, 17 Nov 2019, Henry Rich написал(а):
--
I very much enjoyed its no-nonsense approach when working with it, and it
has excellent docs (https://www.lua.org/docs.html).
I've always dreamed of having a crossbreed between J and Lua, but I don't
really have the capabilities nor time too cook up an addon or the like
implementing an interface.
Maybe someone else wants to?
Jan-Pieter
regards,
Danil
пн, 18 нояб. 2019 г. в 11:57, Jan-Pieter Jacobs <janpiete...@gmail.com
>:
APL/J cousin, K, appears to have the dictionary as pretty central to its
data organisation, but maybe
that's more akin to J's locales.
Neither topic helpful probably relevant here, but might start a hare!
Mike
"Another distinguishing feature of K is the use of dictionaries:
associative tables whose keys are symbols, i.e., internalized strings. In
turn, dictionaries are the building material of a hierarchically organized
global data space called the K-tree"
https://github.com/kevinlawler/kona/wiki/Dictionaries
This is important case of course, but still a restriction. Tables in Lua
are more fundamental:
"The table type implements associative arrays. An associative array is an
array that can be indexed not only with numbers, but also with strings or
any other value of the language, except *nil*. Moreover, tables have no
fixed size; you can add as many elements as you want to a table
dynamically. Tables are the main (in fact, the only) data structuring
mechanism in Lua, and a powerful one. We use tables to represent ordinary
arrays, symbol tables, sets, records, queues, and other data structures, in
a simple, uniform, and efficient way. Lua uses tables to represent packages
as well. When we write io.read, we mean "the read entry from the io
package". For Lua, that means "index the table io using the string "read"
as the key".
пн, 18 нояб. 2019 г. в 13:11, 'Mike Day' via Programming <
progr...@jsoftware.com>:
On Mon, Nov 18, 2019, 7:10 PM Danil Osipchuk <danil.o...@gmail.com>
wrote:
Henry Rich
~ Gilles
That said, what would they be used to do?
As Henry mentioned in another message, locales already supply the
"get/fetch" semantics.
What he didn't mention is that you can use i. in conjunction with {
and } to do something quite similar, already. So what we're talking
about is potential optimizations.
Basically, though, I think we're talking about using character lists
as array indices. So how might that work?
Imagine if we could do things like ('key' { array) or (value 'key'}
array) and have that "just work". What would that be like?
One issue: we need to think about keys of differing lengths. So we'd
need some kind of rule to eliminate padding from keys when they're
used.
Another issue: boxing already has meaning in the context of { and }
array indices. Could we get away with just a rule that if the index
type is literal we expect the indices to be rank 1 instead of rank 0?
But here's another issue: these kinds of arrays are sparse. So maybe
what we're really talking about is a variation on sparse arrays with
all dimensions being sparse indices? And, maybe, with arbitrary rank
(implicitly: the rank of the array would be the length of the longest
key and all keys would be padded to that length when used)?
Which brings me back to my previous question: what would they be used to do?
Thanks,
--
Raul
"Incorrect global symbols data may cause misinterpretation of symbol
arrays, or data corruption, or a system crash, or the end of
civilization as we know it."
Thanks,
--
Raul
And as Raul observed, we're talking about an optimization here, so I
wouldn't want to have to convert each noun to a linear representation.
Henry Rich
It is not thoroughly tested and not well thought through about argument
boxing and ranks, but should be pretty close to what would be a
straightforward implementation of the thing.
'get_Map key' should give a boxed value for a key or raise a error if key
is absent
'somevalue get_Map key' should give a stored value or save and return
somevalue if key is not there (but I can not catch an exception raised in a
monad called from a dyad - something silly, but have to go home)
'key put__Map value' stores a value
'delkey__Map key' removes a key if there
M=: 0 conew'map'
123 put__M 1230
1230
'strkey' put__M 'strval'
strval
(<'boxkey') put__M <'boxval'
┌──────┐
│boxval│
└──────┘
keys__M
┌─────┬────────┬──────────┐
│┌───┐│┌──────┐│┌────────┐│
││123│││strkey│││┌──────┐││
│└───┘│└──────┘│││boxkey│││
│ │ ││└──────┘││
│ │ │└────────┘│
└─────┴────────┴──────────┘
hashes__M
_1845511330 2120780259 _71755953
values__M
┌──────┬────────┬──────────┐
│┌────┐│┌──────┐│┌────────┐│
││1230│││strval│││┌──────┐││
│└────┘│└──────┘│││boxval│││
│ │ ││└──────┘││
│ │ │└────────┘│
└──────┴────────┴──────────┘
get__M 123
┌────┐
│1230│
└────┘
get__M 'not there'
|uncaught throw.: get__M
| get__M'not there'
'to set to something' get__M 'not there'
|uncaught throw.: get
| get y
delkey__M 123
1
get__M 123
|uncaught throw.: get__M
| get__M 123
values__M
┌────────┬──────────┐
│┌──────┐│┌────────┐│
││strval│││┌──────┐││
│└──────┘│││boxval│││
│ ││└──────┘││
│ │└────────┘│
└────────┴──────────┘
keys__M
┌────────┬──────────┐
│┌──────┐│┌────────┐│
││strkey│││┌──────┐││
│└──────┘│││boxkey│││
│ ││└──────┘││
│ │└────────┘│
└────────┴──────────┘
пн, 18 нояб. 2019 г. в 16:10, Henry Rich <henry...@gmail.com>:
On Mon, Nov 18, 2019 at 1:02 PM Danil Osipchuk <danil.o...@gmail.com>
wrote:
> I quickly put together a minimal implementation along those lines, to have
пн, 18 нояб. 2019 г. в 21:15, Devon McCormick <devo...@gmail.com>:
For example: https://rosettacode.org/wiki/LZW_compression#J
See also:
https://en.wikipedia.org/wiki/DEFLATE
https://en.wikipedia.org/wiki/LZ77_and_LZ78
https://en.wikipedia.org/wiki/Prediction_by_partial_matching
https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
Thanks,
--
Raul
Cheers,
Louis
> On Nov 18, 2019, at 1:15 PM, Devon McCormick <devo...@gmail.com> wrote:
>
> I would like to see examples of where a dictionary would be the preferred
And, if you had multiple batches, for example, from :
labels1=: ~.trigrams1
sums1=: #/.~trigrams1
labels2=: ~.trigrams2
sums2=: #/.~trigrams2
labels3=: ~.trigrams3
sums3=: #/.~trigrams3
you'd probably do something like:
labels123=: ~.labels1,labels2,labels3
sums123=: (labels1,labels2,labels3) +//. sums1,sums2,sums3
Dictionaries are for where things are a lot messier -- like where the
name/value association is needed in a variety of low-level places
rather than being used to control the calculation.
(And, locales often are sufficient for those cases, but not always.)
Thanks,
--
Raul
Without such a structure the only practical way is to append and (re)sort to maintain an array in a suitable form.
An example of this is the construction of an order book in a real time trading system, VERY impractical in an array only language.
However if the array could overlaid with a linked list, then the orders could be “appended” on receipt (fast) and a linked list used to update the sort order, so the order book can be traversed very efficiently by attributes such as Price/Time.
Without it, one must sort the order book to find the best Price/Time for matching … and repeat that on each new order received.
I do use C for this for matching engine rules, but have often thought a linked list overlay would be a very powerful way to maintain an ordered array for such a purpose.
…/Regards Rob
For multi-key indexing, what you would do instead is maintain multiple
"generations" of the data. The big/slow instance gets updated rarely
(folding in the fast/small instance and doing the sorts then during
those updates). At read time you check both of them. The update
overhead here winds up being analogous to the garbage collection
overhead for linked lists. And the traversal overhead can vary - both
linked lists and arrays have overhead, though the details differ, but
in my experience it's straightforward to get the array code to perform
quickly.
Thanks,
--
Raul
Fast append is no issue, appends can be very fast as you suggest. The issue is to maintain an ordering attribute that is fast/custom for each use case (in my case an incoming order).
Linked Lists work very effectively, although having said that I would not try to write a fast matching engine in J.
The overhead associated with the multi key indexing you describe below is impractical for low latency matching, and avoided by managing a free space list for bucket re-use.
I still believe that both an ordering attribute and also a dictionary, would be very useful assets to an array language, but would need to be implemented “natively” within the language to be fully useful (not an add-on, but part of the language notation, as per kdb+ for dictionaries for example).
Thanks Raul, Rob
In kdb+ the “table” datatype is built atop the “dictionary” data type… a table may be perceived as a list of dictionaries (Key value pairs) that are Columns/Values. I’ll send you a paper offline that shows this.
Thanks, Regards Rob
Why do you say that? (How big of a dataset does your UI represent? How
fast is data arriving? And, if it's large, what are you showing to
your users?)
With a linked list, you have array elements scattered throughout
memory, which tends to result in high overhead from cache misses.
With an array, you have array elements densely packed, which gives
high cache coherency.
And, for something like list/array traversal, cache misses are order
of magnitude more expensive than computation.
(So it sounds like you might be working off of untested rumor, or
maybe that your benchmarks did not involve good implementations or did
not involve large data sets.)
As a general rule, sorting does not take perceivable time in J unless
you're working with datasets which are large enough that they would
take years for one person to enter manually. (Or unless we're talking
about secondary effects -- if it takes 30 milliseconds to sort a large
dataset and you're doing animation such that you only have 10
milliseconds to prepare the next frame of imagery, that's a real
problem, and getting things to work right there takes some effort and
some complicating simplifications.)
Put different: working with these kinds of issues can be tricky, and
can involve issues that many people are not prepared to deal with.
Consequently, people tend to favor a working implementation over any
other implementation (which is fine, but it's a bad idea to generalize
about reasons when you cannot adequately observe what you're working
with). But we also get people repeating ideas (and working off of
ideas) which don't actually play out in practice.
On the other side of things, though, there's another issue to think
about when working with J, which is the single threaded character of
the implementation. Ideally, for UI work and significant computation,
you want the UI to work independent from the computational engine --
the reason is that machines fail often enough and in so many different
ways that people need to be reassured that the thing is still working.
So, with J that might mean something like a javascript based UI and
JHS.
> I still believe that both an ordering attribute and also a
> dictionary, would be very useful assets to an array language, but
> would need to be implemented “natively” within the language to be
> fully useful (not an add-on, but part of the language notation, as
> per kdb+ for dictionaries for example).
That could be, but (speaking personally): I am not going to optimize
code that I don't have access to.
Thanks,
--
Raul
I am speaking from direct experience as I have specialised in the matching engine space since 1992 when I co-wrote the first commercial in memory matching engine (called X-Stream).
This is now one of NasdaqOMX’s 3 matching engines (iNet, OMX Click/Genium and X-Stream) and X-Stream now runs in over 40 global markets (we implemented the first 20 including the Shanghai Stock Exchange, Istanbul, Moscow to name a few).
So there are no “rumours”, unvalidated benchmarks, generalisations or suspect implementations, and data sets were (and are) very large. This is all firsthand experience, testing, measurement and optimisation.
We also capture the entire matching history in a tick database (we use kdb+), and I have also worked in array languages since 1979 at IPSA where I co-wrote Sharp APL/HP with Arthur Whitney (kdb+) so have a fair idea of internal array language architecture.
A large equities exchange can have 5,000 or 10,000 listed securities, futures & options can inflate that an order of magnitude.
For each security an order book consists of bids and offers, it could be anywhere from 10 to 200 rows, but each row is a price tier and could consist of 10 - 1,000 orders in Time priority order.
Our current platform (MarketGrid) runs one crypto client having an order book in Leverage trading that is over 1,000 rows deep (not dissimilar to BITMEX).
Visibility is irrelevant, I am not talking the UI (which is a separate browser based application and communicates via its own memory cache to the engine), I am talking of the server, the part that does centralised in memory order matching for the entire market.
We run the matching engine on Linux servers with anywhere from 32GB to 1+ TB physical memory, barely touching the disk with all our own housekeeping, written mostly in C (have done since the 1990’s).
The transaction rates operate from 100,000 orders per second sustained matching up to 300,000 to 400,000 orders per second on these engines, all fed via APIs (FIX, ITCH, OUCH, Proprietary etc).
So we are talking single digit microsecond latency round-trip from the client. Some markets run 24x7, others 9-5, 5-6 days a week, it all varies by asset class.
C kills C++ for this stuff (simply closer to the metal) and likewise both kill array languages.
I have seen efforts by people to write matching engines in array languages (eg BITMEX is written in kdb+) and they all fail at performance, barely able to consume over 1,000 orders per second.
The key reason is the overhead of the array language to process incoming orders arriving each in a single message upwards of 100,000 orders per second.
You can append to arrays in array languages at this rate but you can’t maintain indices for optimal search matching (head of list) and it is likewise impractical to run shadows of the data.
Cache misses do not impact this matching, it is physical memory and all orders are stored as contiguous rows in the Orders table as are other tables. The linked lists provide the only proven way to scale the matching speeds required.
So I stand by my comments, which were a constructive suggestion that an “ordering attribute” in an array could be a mechanism to help an array language become useful in this space, a I used sorted linked lists in C.
However relying solely on arrays for this kind of performance when receiving and processing over 100,000 “events” (orders) per second will not do it.
I won’t comment further on this thread as little point, but suffice to say it is always useful to know your audience.
Thanks, Rob
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
--
Vijay Lulla
Assistant Professor,
Dept. of Geography, IUPUI
425 University Blvd, CA-207C.
Indianapolis, IN-46202
vlu...@iupui.edu
ORCID: https://orcid.org/0000-0002-0823-2522
Online: http://vijaylulla.com | https://github.com/vlulla
I did not know, and I felt I had to convey the ideas you were
suggesting to me, to find out what you were talking about.
> A large equities exchange can have 5,000 or 10,000 listed securities, futures & options can inflate that an order of magnitude.
> For each security an order book consists of bids and offers, it could be anywhere from 10 to 200 rows, but each row is a price tier and could consist of 10 - 1,000 orders in Time priority order.
>
> Our current platform (MarketGrid) runs one crypto client having an order book in Leverage trading that is over 1,000 rows deep (not dissimilar to BITMEX).
> Visibility is irrelevant, I am not talking the UI (which is a separate browser based application and communicates via its own memory cache to the engine), I am talking of the server, the part that does centralised in memory order matching for the entire market.
>
> We run the matching engine on Linux servers with anywhere from 32GB to 1+ TB physical memory, barely touching the disk with all our own housekeeping, written mostly in C (have done since the 1990’s).
> The transaction rates operate from 100,000 orders per second sustained matching up to 300,000 to 400,000 orders per second on these engines, all fed via APIs (FIX, ITCH, OUCH, Proprietary etc).
> So we are talking single digit microsecond latency round-trip from the client. Some markets run 24x7, others 9-5, 5-6 days a week, it all varies by asset class.
This is the kind of information I was looking for.
That said, I am having trouble reconciling the concept of "single
digit microsecond latency round-trip from the client" with traversing
linked list structures of that size at that rate and not being able to
do the same thing with arrays. (I can see achieving that some of the
time -- relatively unloaded systems, and tree structures for quick
traversal, but I'm pretty sure you're going to have stalls of some
sort sooner or later. This might be manageable, though, if the
requirements of the quirks of the algorithm are baked into the
agreements about use of the system.)
Then again, further down in this message, it seems like you're
suggesting that you're not actually traversing the lists.
> C kills C++ for this stuff (simply closer to the metal) and likewise both kill array languages.
> I have seen efforts by people to write matching engines in array languages (eg BITMEX is written in kdb+) and they all fail at performance, barely able to consume over 1,000 orders per second.
> The key reason is the overhead of the array language to process incoming orders arriving each in a single message upwards of 100,000 orders per second.
>
> You can append to arrays in array languages at this rate but you can’t maintain indices for optimal search matching (head of list) and it is likewise impractical to run shadows of the data.
"Head of list" does not sound like "order matching". Maybe a part of a
larger system that does order matching, but it seems to me that "head
of list" means matching already happened. So, presumably, I'm missing
out on some key details.
> Cache misses do not impact this matching, it is physical memory and all orders are stored as contiguous rows in the Orders table as are other tables. The linked lists provide the only proven way to scale the matching speeds required.
Yeah, ok, but it sounds like you are feeding me contradictory views of
the system. So -- again -- presumably I'm missing out on some key
details. (And, this means that if I were to attempt to optimize for
these cases, I would almost certainly do it wrong.)
> So I stand by my comments, which were a constructive suggestion that an “ordering attribute” in an array could be a mechanism to help an array language become useful in this space, a I used sorted linked lists in C.
> However relying solely on arrays for this kind of performance when receiving and processing over 100,000 “events” (orders) per second will not do it.
Color me... uncertain about what you have been claiming...
Thanks,
--
Raul
So yes, have investigated and it holds promise for “components” of the system (eg we found it very suited to reducing “network latency” by bypassing the TCP/IP stack with a custom message protocol with other machines (eg market database or web services) that cooperate) but not flexible enough for regular updates for an application of this size.
Hope that explains it, Regards Rob
> On 22 Nov 2019, at 3:52 am, Jimmy Gauvin <jimmy....@gmail.com> wrote:
>
> @Rob Have you considered using FPGA boards to boost your throughput ?
>
> On 22 Nov 2019, at 4:53 am, Raul Miller <rauld...@gmail.com> wrote:
>>
>
> This is the kind of information I was looking for.
>
> That said, I am having trouble reconciling the concept of "single
> digit microsecond latency round-trip from the client" with traversing
> linked list structures of that size at that rate and not being able to
> do the same thing with arrays. (I can see achieving that some of the
> time -- relatively unloaded systems, and tree structures for quick
> traversal, but I'm pretty sure you're going to have stalls of some
> sort sooner or later. This might be manageable, though, if the
> requirements of the quirks of the algorithm are baked into the
> agreements about use of the system.)
>
> Then again, further down in this message, it seems like you're
> suggesting that you're not actually traversing the lists.
No, several lists are maintained. For the primary order matching the main 2 lists start at BestBid and BestOffer.
Matching is the process of traversing the list so a 100% hit rate) until no balance remains, or matching ceases (strike a “worse price”).
This is a key to getting the matching algorithm optimised. The lists are stored in the appopriate sorted order (Price/Time).
>
>> C kills C++ for this stuff (simply closer to the metal) and likewise both kill array languages.
>> I have seen efforts by people to write matching engines in array languages (eg BITMEX is written in kdb+) and they all fail at performance, barely able to consume over 1,000 orders per second.
>> The key reason is the overhead of the array language to process incoming orders arriving each in a single message upwards of 100,000 orders per second.
>>
>> You can append to arrays in array languages at this rate but you can’t maintain indices for optimal search matching (head of list) and it is likewise impractical to run shadows of the data.
>
> "Head of list" does not sound like "order matching". Maybe a part of a
> larger system that does order matching, but it seems to me that "head
> of list" means matching already happened. So, presumably, I'm missing
> out on some key details.
As per above, Head of List is the Best Bid (for Buys) or Best Ask (for Sells) and “possible matches” start at that point.
So an incoming Bid will try to start matching at Best Offer etc as above.
>
>> Cache misses do not impact this matching, it is physical memory and all orders are stored as contiguous rows in the Orders table as are other tables. The linked lists provide the only proven way to scale the matching speeds required.
>
> Yeah, ok, but it sounds like you are feeding me contradictory views of
> the system. So -- again -- presumably I'm missing out on some key
> details. (And, this means that if I were to attempt to optimize for
> these cases, I would almost certainly do it wrong.)
No, I explained the tables we maintain in memory are contiguous (dense) so we don’t strike cache hits.
The linked lists themselves are likewise dense, as the list next/prev pointers are column on the order table… so all dense.
>
>> So I stand by my comments, which were a constructive suggestion that an “ordering attribute” in an array could be a mechanism to help an array language become useful in this space, a I used sorted linked lists in C.
>> However relying solely on arrays for this kind of performance when receiving and processing over 100,000 “events” (orders) per second will not do it.
>
> Color me... uncertain about what you have been claiming…
I wasn’t claiming anything ? I suggested an ordering attribute may allow a matching engine model to be efficiently built in J (and I suspect once understood/tuned could be applied to other use cases just as efficiently.
Anyway enough discussion on this thread, if I ever get time (though I rarely do) I will model this to demonstrate what I think could is possible and offer more a more concrete suggestion & use case.
Thanks, Rob
>
> Thanks,
>
>
> --
> Raul