I have not come across any suggestion that memory allocation in C++ is any faster than Java. What makes a difference is gc pauses. Imho if you can live without managed memory. You can do this in java as well and eliminate gc pauses.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
I have not come across any suggestion that memory allocation in C++ is any faster than Java. What makes a difference is gc pauses. Imho if you can live without managed memory. You can do this in java as well and eliminate gc pauses.
Native C allocation is single threaded. Java object allocation is multi threaded.
I agree that stack allocation doesnt work well in java even with escape analysis. However if this is important to you there are workarounds. ;)
Native C allocation is single threaded. Java object allocation is multi threaded.
I agree that stack allocation doesnt work well in java even with escape analysis. However if this is important to you there are workarounds. ;)
In my experience, how you use the languafe and srructure your program matters more than the choice of language. Where there is a difference is C++ attracts developers interested in the low level details and even in java many of the best low level developers have worked in C++.
In short, to get the best results, change the developers rather than or as well as the language.
In my experience, how you use the languafe and srructure your program matters more than the choice of language. Where there is a difference is C++ attracts developers interested in the low level details and even in java many of the best low level developers have worked in C++.
In short, to get the best results, change the developers rather than or as well as the language.
On 24 Sep 2013 14:22, "Matt Fowles" <matt....@gmail.com> wrote:
Peter~In my experience the biggest difference is stack allocation. In C++ temp objects are usually on the stack. Hence their allocation is a thread local pointer bump (like java), but their collection is trivial and requires no additional work (unlike java). Also the fact that nested structures don't lead to pointer indirections helps too.Honestly, with the convenience features of C++11, C++ is becoming a downright pleasant language to develop in.Matt
On Tue, Sep 24, 2013 at 5:16 PM, Peter Lawrey <peter....@gmail.com> wrote:
I have not come across any suggestion that memory allocation in C++ is any faster than Java. What makes a difference is gc pauses. Imho if you can live without managed memory. You can do this in java as well and eliminate gc pauses.
On 24 Sep 2013 13:37, "baboune" <nicolas...@gmail.com> wrote:
Hi,--Maybe this is a stupid question, so be patient. I think this is related to mechanical sympathy..I was in a conference today (https://www.sics.se/events/cloud-and-big-data-day-2013-program), and Matei Zaharia (Berkely) said that they were considering re-writing spark in C/C++ in order to see if it would go faster. He then blamed the performance cost on Java memory management.Once you go off-heap for the big data management parts (which he indicated spark is already doing), are the GC costs + Object costs so large as to see such a difference in performance? From a memory perspective does the JVM impose some limits on what can be done to manage memory?Thanks
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Native C allocation is single threaded. Java object allocation is multi threaded.
That's not true -- if you use any half decent allocator, you get thread local allocations and good concurrent performance in C++. For example, tcmalloc performs well under concurrency.
The higher fundamental cost comes from the fact that there is work involved on the free()/destroy side of things in C/C++, where no such work exists in a GC'ed environment. If reference counting is involved, it is much higher in cost even when thread-local, as each pointer overwrite involves ref counting work. Even when no ref counting is used, the free operation used will at least double the per-allocation cost compared to a good evacuating collector that leverages the weak generational hypothesis.
In addition, in GC based heaps, even the much lower and amortized cost of GC'ing and efficiently recovering huge contiguous chunks of space can be moved to background threads, such that the actual executing program threads that perform the allocation do not have to pay it directly, making them faster. When plentiful cores exits, this translates into fundamentally better latency for memory management, not just better throughput.
For a fair real world comparation it should be taken into account that a good part of C++ allocs is on the stack. The duration of NewGen GC should be added to allocation cost in Java, as every allocation fills eden and increases frequency of newgen collections.
For a fair real world comparation it should be taken into account that a good part of C++ allocs is on the stack.
The duration of NewGen GC should be added to allocation cost in Java, as every allocation fills eden and increases frequency of newgen collections.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Well, pretty much anything that has long lived objects. Which covers the vast majority of useful applications I can think of.
E.g. A queue. A cache. A catalog. An ESB. An in-memory index. A price book...
On Oct 2, 2013, at 2:58 PM, Rajiv Kurian <geet...@gmail.com>
wrote:
> Just curious, can you provide an example of a common application where stack based allocation for small short lived objects plus arena allocation for large but short lived objects is not good enough?
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/1TMjVjyyMmA/unsubscribe.
> To unsubscribe from this group and all of its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
This works well but is very limited. Say you application needs some state which lives between messages in this ring buffer. It is hard to have a non trival program which is truly stateless.
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Why can't a queue be built with arena allocation? Are the cross-thread new and frees the issue here? If it isn't a why can't a producer thread get a chunk from an arena and put it on the queue. The consumer thread processes the entry and returns the memory to the arena. If the queues are fixed size then we use a circular buffer to store the entries with arena allocation for dynamically sized buffers. This can be used to build a simple TCP server that accepts connections and read data on one thread and delegate processing to worker threads.
On Wednesday, October 2, 2013 3:21:01 PM UTC-7, Gil Tene wrote:Well, pretty much anything that has long lived objects. Which covers the vast majority of useful applications I can think of.
E.g. A queue. A cache. A catalog. An ESB. An in-memory index. A price book...
On Oct 2, 2013, at 2:58 PM, Rajiv Kurian <geet...@gmail.com>
wrote:
> Just curious, can you provide an example of a common application where stack based allocation for small short lived objects plus arena allocation for large but short lived objects is not good enough?
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/1TMjVjyyMmA/unsubscribe.
> To unsubscribe from this group and all of its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/1TMjVjyyMmA/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.
> To unsubscribe from this group and all of its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Beyond the cross-thread freeing questions, freeing heap-allocated memory in C/C++ is fundamentally more expensive than it is in Java, C#, and other GC'ed environments. This observation is true for all memory allocations that are not arena-based, and while arena-based allocation can approximate the efficiency and low cost of management in GC based heap systems, they can only be applied (in C/C++) in very specific contexts, and do not work for regular objects that can be passed around to libraries, held in collections, etc.
Variable length isnt a problem if you are always allocating. The problem arises from fragmentation from de-allocating and allocating. (Fixed size records dont have the same fragmentation issue)
Thanks Martin. Preallocation works really well if the sizes are well known and uniform. What if the application is a server processing non uniform data like images/videos which can vastly vary in size?
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
If you have a ring buffer you can free the data for each message after you process it. Ie you maintain a write pointer and a free bytes pointer. When you add a message you can use from the write pointer to the free bytes pointer. (Using clock arithmetic)
Peter.
What I mean is say I use one ring buffer plus indexes in it to allocate memory chunks, and another ring buffer to actually send these chunks over to another thread. This second ring buffer structs just contain the indices into the first one along with some other data. The natural way to recycle memory is when the second ring buffer (the one used to do inter thread communication) entries get recycled claim those indices back and reuse them. But if I say allocate a 900 MB chunk my worker thread could be done with it in a second, but I would only be able to reuse it after the entire ring buffer recycles. This could cause me to run out of memory even though I have lots of free memory. If I have another data structure where the worker thread writes what indices it is done with then maybe I can consult that every time I run out of memory on the first thread and reclaim those indices.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.