On Thu, 29 Apr 2010 14:21:49 +0000, Peter Keller wrote: > My question is: This adjustment happens at the _end_ of the array, can I > expand the array at the beginning? > [...] > Or do I have to write the code to allocate the larger array and manually > copy the elements myself?
I think that you have to copy yourself - but that is rather easy to do, see eg REPLACE.
In general, if you know that you will not prepend more than a given number of elements (or you are willing to copy elements occasionally), you can create an array that is large enough, and displace to it using an offset. Then if you need extra elements at the beginning, just displace to a smaller offset.
If I knew how big the final sizes of the arrays were I could use this method, but sadly I don't. I'll file away the method for another day. It looks like under my circumstances, a new array into which I copy all of my disparate pieces looks to be one of the only solutions.
> If I knew how big the final sizes of the arrays were I could use > this method, but sadly I don't. I'll file away the method for another > day. It looks like under my circumstances, a new array into which I copy > all of my disparate pieces looks to be one of the only solutions.
> Thank you for your time.
> -pete
You can make the underlying array adjustable (but not displaced) so that it can grow too. If you need to grow it at both ends you can use two adjustable arrays and roll your own displaced array class to do the indirection.
> If I knew how big the final sizes of the arrays were I could use this > method, but sadly I don't. I'll file away the method for another day. It > looks like under my circumstances, a new array into which I copy all of > my disparate pieces looks to be one of the only solutions.
Sometimes I face a similar problem (collecting an number of items that are either unknown a prior, or I am lazy to calculate). I usually solve it the following way: create array with extra elements, keep using them until I run out, then create a new array with a buffer of extra elements, copy and continue. This way, you will copy occasionally, but not all the time. Example:
(use-package '(:bind))
(defun prepend-elements (vector n &optional (extra 256)) (bind (((:values displaced-to index-offset) (array-displacement vector)) (new-length (+ n (length vector)))) (if (and displaced-to (<= n index-offset)) (make-array new-length :displaced-index-offset (- index-offset n) :displaced-to displaced-to) (let ((new-vector (make-array (+ extra new-length)))) (replace new-vector vector :start1 (+ extra n)) (make-array new-length :displaced-index-offset extra :displaced-to new-vector)))))
> Sometimes I face a similar problem (collecting an number of items that > are either unknown a prior, or I am lazy to calculate). I usually > solve it the following way: create array with extra elements, keep > using them until I run out, then create a new array with a buffer of > extra elements, copy and continue. This way, you will copy > occasionally, but not all the time. Example:
Yeah, I thought of that. But it turns out the layering in my code base makes that difficult to acheive.
My use case is I'm writing a master/worker style communication system which can handle 10,000 to 50,000 clients. Buffer management for me is a serious issue and I need to really conserve memory and not grind the allocator too much.
The context is that when writing a data array to a client, the buffer management layer needs to wrap a header around the data before sending it. I thought about not doing the wrapping at all--just write the small header then the payload, but then that leads to alternating writes of a very small amount of data and then a large amount of data. That oscillation sucks from a network scheduling/bandwidth perspective. I need to write as many bytes as I can as often as I can.
If it really looks like the write buffering will be problematic from a memory allocation point of view, I can make pools of buffers which are reusable for the assembling of the data. I don't want to do that initially since it is more code.
I was just wondering if I had missed something in Lisp's array management which would allow me to grow an array at the head of the array. The answer is somewhat yes, but I'd have to preallocate the array and that means the application code would have to use a special abstraction of make-array. There are problems with that I'd rather avoid.
I guess at this point, I'll just assemble the arrays naively and if the profiler tells me there is a problem there, I'll look into it deeper.
> > Sometimes I face a similar problem (collecting an number of items that > > are either unknown a prior, or I am lazy to calculate). I usually > > solve it the following way: create array with extra elements, keep > > using them until I run out, then create a new array with a buffer of > > extra elements, copy and continue. This way, you will copy > > occasionally, but not all the time. Example:
> Yeah, I thought of that. But it turns out the layering in my code base makes > that difficult to acheive.
> My use case is I'm writing a master/worker style communication system > which can handle 10,000 to 50,000 clients. Buffer management for me is > a serious issue and I need to really conserve memory and not grind the > allocator too much.
> The context is that when writing a data array to a client, the buffer > management layer needs to wrap a header around the data before sending it. > I thought about not doing the wrapping at all--just write the small header > then the payload, but then that leads to alternating writes of a very > small amount of data and then a large amount of data. That oscillation > sucks from a network scheduling/bandwidth perspective. I need to write > as many bytes as I can as often as I can.
> If it really looks like the write buffering will be problematic from > a memory allocation point of view, I can make pools of buffers which > are reusable for the assembling of the data. I don't want to do that > initially since it is more code.
> I was just wondering if I had missed something in Lisp's array management > which would allow me to grow an array at the head of the array. The > answer is somewhat yes, but I'd have to preallocate the array and that > means the application code would have to use a special abstraction of > make-array. There are problems with that I'd rather avoid.
> I guess at this point, I'll just assemble the arrays naively and if the > profiler tells me there is a problem there, I'll look into it deeper.
> Thank you.
> -pete
Could it be that a different data structure would be more appropriate? What about a tree? (also, until recently, I had not realized how many different kinds of tree there are (2-3, 2-3-4, left leaning, etc). That would impact your access time, but in your case it may be manageable, or you could first store in tree, and then transfer into array.
> Tamas K Papp <tkp...@gmail.com> wrote: > > Sometimes I face a similar problem (collecting an number of items that > > are either unknown a prior, or I am lazy to calculate). I usually > > solve it the following way: create array with extra elements, keep > > using them until I run out, then create a new array with a buffer of > > extra elements, copy and continue. This way, you will copy > > occasionally, but not all the time. Example:
> Yeah, I thought of that. But it turns out the layering in my code base makes > that difficult to acheive.
> My use case is I'm writing a master/worker style communication system > which can handle 10,000 to 50,000 clients. Buffer management for me is > a serious issue and I need to really conserve memory and not grind the > allocator too much.
> The context is that when writing a data array to a client, the buffer > management layer needs to wrap a header around the data before sending it. > I thought about not doing the wrapping at all--just write the small header > then the payload, but then that leads to alternating writes of a very > small amount of data and then a large amount of data. That oscillation > sucks from a network scheduling/bandwidth perspective. I need to write > as many bytes as I can as often as I can.
Seems to me then that what you want is not a growable array but rather smarter write buffering.
> If it really looks like the write buffering will be problematic from > a memory allocation point of view, I can make pools of buffers which > are reusable for the assembling of the data. I don't want to do that > initially since it is more code.
> I was just wondering if I had missed something in Lisp's array management > which would allow me to grow an array at the head of the array. The > answer is somewhat yes, but I'd have to preallocate the array and that > means the application code would have to use a special abstraction of > make-array. There are problems with that I'd rather avoid.
> I guess at this point, I'll just assemble the arrays naively and if the > profiler tells me there is a problem there, I'll look into it deeper.
That sounds like a fine plan. Premature optimization and all that. Memcpy is pretty frickin' fast, particularly when compared to network I/O.
Mirko <mirko.vuko...@gmail.com> wrote: > Could it be that a different data structure would be more > appropriate? What about a tree? (also, until recently, I had not > realized how many different kinds of tree there are (2-3, 2-3-4, left > leaning, etc). That would impact your access time, but in your case it > may be manageable, or you could first store in tree, and then transfer > into array.
I know a fair number of data structures, but most of them require consing which is something I need to minimize. Luckily, the storage requirements of my data aren't tree-like. Thank you for your suggesstion.
To give an example of the problems I'm facing, suppose I have 10,000 clients:
If I have a 16k initial read buffer for each client which can grow in size up to 128k as needed, then just for the read buffers alone I'll need:
156MB up to 1.250GB of memory needed.
If I have about 20k data buffers I need for each client to write, then the buffers required for writing are 195MB.
If I have about 20K in memory structures for each client, then another 195MB.
So, under worst case conditions (the clients send me 128KB results all the time instead of averaging around 16K) for 10,000 clients, I need in total
546MB up to 1.640GB.
Now this needed memory for which I can easily account!
That is a lot of memory to keep resident and perform churn on and hence why I'm very paranoid about consing or calling make-array. I use other libraries and things and who knows what their memory usages are. But at the scaling levels I'm desiring, I have to pay attention to it.
RG <rNOSPA...@flownet.com> wrote: > In article <4bd9afe9$0$9898$80265...@spool.cs.wisc.edu>, > Peter Keller <psil...@merlin.cs.wisc.edu> wrote: >> The context is that when writing a data array to a client, the buffer >> management layer needs to wrap a header around the data before sending it. >> I thought about not doing the wrapping at all--just write the small header >> then the payload, but then that leads to alternating writes of a very >> small amount of data and then a large amount of data. That oscillation >> sucks from a network scheduling/bandwidth perspective. I need to write >> as many bytes as I can as often as I can.
> Seems to me then that what you want is not a growable array but rather > smarter write buffering.
I do not deny that, have any references? The nice thing about lisp is I can just go back into that layer and bash the code into something else with relative ease.
>> I guess at this point, I'll just assemble the arrays naively and if the >> profiler tells me there is a problem there, I'll look into it deeper.
> That sounds like a fine plan. Premature optimization and all that. > Memcpy is pretty frickin' fast, particularly when compared to network > I/O.
Speaking of memcpy, in lisp what is the equivalent of memcpy? replace? Hrm... I should use that in a few places I stupidly wrote the byte copies with a do loop.
In article <4bd9c172$0$9895$80265...@spool.cs.wisc.edu>, Peter Keller <psil...@merlin.cs.wisc.edu> wrote:
> RG <rNOSPA...@flownet.com> wrote: > > In article <4bd9afe9$0$9898$80265...@spool.cs.wisc.edu>, > > Peter Keller <psil...@merlin.cs.wisc.edu> wrote: > >> The context is that when writing a data array to a client, the buffer > >> management layer needs to wrap a header around the data before sending it. > >> I thought about not doing the wrapping at all--just write the small header > >> then the payload, but then that leads to alternating writes of a very > >> small amount of data and then a large amount of data. That oscillation > >> sucks from a network scheduling/bandwidth perspective. I need to write > >> as many bytes as I can as often as I can.
> > Seems to me then that what you want is not a growable array but rather > > smarter write buffering.
RG <rNOSPA...@flownet.com> wrote: > In article <4bd9c172$0$9895$80265...@spool.cs.wisc.edu>, > Peter Keller <psil...@merlin.cs.wisc.edu> wrote:
>> RG <rNOSPA...@flownet.com> wrote: >> > In article <4bd9afe9$0$9898$80265...@spool.cs.wisc.edu>, >> > Peter Keller <psil...@merlin.cs.wisc.edu> wrote: >> >> The context is that when writing a data array to a client, the buffer >> >> management layer needs to wrap a header around the data before sending it. >> >> I thought about not doing the wrapping at all--just write the small header >> >> then the payload, but then that leads to alternating writes of a very >> >> small amount of data and then a large amount of data. That oscillation >> >> sucks from a network scheduling/bandwidth perspective. I need to write >> >> as many bytes as I can as often as I can.
>> > Seems to me then that what you want is not a growable array but rather >> > smarter write buffering.
>> I do not deny that, have any references?
> Sorry, no. But surely it's not that hard?
Yeah, it shouldn't be that hard. When it comes to it, I'll just look in the usual place for papers or descriptions of what people already do.
On Thu, 29 Apr 2010 17:27:14 +0000, Peter Keller wrote: > Speaking of memcpy, in lisp what is the equivalent of memcpy? replace? > Hrm... I should use that in a few places I stupidly wrote the byte > copies with a do loop.
Pretty much. Usually, if you want top speed, it pays to look into your implementations sources and see what they use, then call that directly. Or, alternatively, declare array types so that the compiler will do that for you. Eg look at the transforms related to REPLACE in SBCL, you will see that it will be heavily optimized when given enough information.
Peter Keller wrote: > Giovanni Gigante <g...@cidoc.iuav.it> wrote: >>> My question is: This adjustment happens at the _end_ of the array, can I >>> expand the array at the beginning?
>> Naive idea: >> what about building a list (you can cons to the beginning of that) and >> mapping it to an array only when you are done?
> That doesn't solve the problem, suppose:
> (#(1 2 3) (#4 5 6))
> I still have to allocate a 6 element array somewhere to map all of that > into a single array: #(1 2 3 4 5 6)
Do you know for sure that you can do a better job managing all these bits than the garbage collector can. If many/most of incoming chunks of data are short-lived, GC might do better than all the copying.
Peter Keller <psil...@merlin.cs.wisc.edu> writes: > I was just wondering if I had missed something in Lisp's array management > which would allow me to grow an array at the head of the array. The > answer is somewhat yes, but I'd have to preallocate the array and that > means the application code would have to use a special abstraction of > make-array. There are problems with that I'd rather avoid.
Can't you just use the array as a "growable ring buffer"? If the ring buffer is full, you grow the array, and you can move the start or the end pointers into your buffer depending on where you want to add data.
(I use one ring buffer per client in a server I've written, but they're fixed-size with 30000 elements, because in my case if the clients have a lag of that many transactions, it's better for the clients if the server just flushes part of its buffer) -- (espen)
Paul Wallich <p...@panix.com> wrote: > Do you know for sure that you can do a better job managing all these > bits than the garbage collector can. If many/most of incoming chunks of > data are short-lived, GC might do better than all the copying.
I don't know for sure, it is just a feeling I have for where a nsaty thing might happen.
> paul working first, fast later
That is very true, I don't so much want fast, as resident in memory. :)
Thank you all for your suggestions! They have been very helpful!
> From: Peter Keller <psil...@merlin.cs.wisc.edu> > If I knew how big the final sizes of the arrays were I could use > this method, but sadly I don't.
I suggest that you define a new ADT (Abstract Data Type) which provides *exactly* the functionality you want. Once that's well defined, we can brainstorm how to best implement it. For example, you might implement a virtual array as multiple chunks, each a fixed-size (non-adjustable) array, collected together by a self-balancing binary-search-tree whose leaves are the chunks. This would let you insert new segments (chunks) at the start, at the end, or anywhere in the middle, with no need to ever copy blocks of data from an old array to a new array.
> From: Peter Keller <psil...@merlin.cs.wisc.edu> > To give an example of the problems I'm facing, suppose I have 10,000 clients: > If I have a 16k initial read buffer for each client which can grow in > size up to 128k as needed, then just for the read buffers alone I'll need: > 156MB up to 1.250GB of memory needed. > ... > That is a lot of memory to keep resident and perform churn on and hence > why I'm very paranoid about consing or calling make-array. I use other > libraries and things and who knows what their memory usages are. But > at the scaling levels I'm desiring, I have to pay attention to it.
With so many clients you must synchronize, why even *try* to store the data in Lisp arrays? Why not use MySQL as your datastore, or if you need higher performance and have lots of $cash$ then use Oracle instead?
> Now this needed memory for which I can easily account!
> That is a lot of memory to keep resident and perform churn on and hence > why I'm very paranoid about consing or calling make-array. I use other > libraries and things and who knows what their memory usages are. But > at the scaling levels I'm desiring, I have to pay attention to it.
10 years ago this was a lot of memory. Now it's small change. 50 or 100G is something to think about, but less than 4 doesn't count for much unless you're planning on some vast number of instances.
Tim Bradshaw <t...@tfeb.org> writes: > 10 years ago this was a lot of memory. Now it's small change. 50 or > 100G is something to think about, but less than 4 doesn't count for > much unless you're planning on some vast number of instances.
It depends a bit on the life time of the memory and the requirements of your server. I have one server application instance that uses 6-7GB of memory, and which needs to run a full GC freeing most of that once a day. That takes a about 5 seconds (I think, I haven't looked at it in detail for a while, it runs for months without problems now), which is not critical at all for *this* server, since I can run the GC at 2 in the morning when the server rarely is needed. But other applications may have other requirements. -- (espen)
> It depends a bit on the life time of the memory and the requirements of > your server. I have one server application instance that uses 6-7GB of > memory, and which needs to run a full GC freeing most of that once a > day. That takes a about 5 seconds (I think, I haven't looked at it in > detail for a while, it runs for months without problems now), which is > not critical at all for *this* server, since I can run the GC at 2 in > the morning when the server rarely is needed. But other applications may > have other requirements.
I think that if GCs have problems which make using large heaps difficult (and 6 or 7GB is not really large), then GC implementation need to get better - perhaps the Azul people are right about that.