Draft: Data Structures for Entity Systems

28 views
Skip to first unread message

Adam Martin

unread,
Feb 2, 2014, 2:49:04 PM2/2/14
to altdev...@googlegroups.com
Finally, here's a complete article. Still needs some cleaning, but
it's a complete post this time :)

http://www.altdevblogaday.com/?p=30877&shareadraft=baba30877_52eea01c5965f

Tragically, my 'ecks' key has broken, the last couple of sections have
some funny words :(. I'll fi that when I get my hands on a working KB.

Feedback wanted:

- It's long. I could easily and sensibly split it in two, just
before: "Iterating towards a solution". Thoughts ?
- CPU-caches: how intelligent are they these days? I've had good
eperiences recently, but confess I've not done serious profiling (I
don't have any projects at the moment where they're significant in our
profiling, so it's difficult!). Are the notes on CPU-caching
hopelessly wrong/misleading/worrying-about-nothing?
- ...anything else you want to say :)

Adam

Amir Ebrahimi

unread,
Feb 7, 2014, 1:21:16 AM2/7/14
to altdev...@googlegroups.com
Hi Adam,

Saw your other post and finally decided to comment on your article. I actually had your one article marked as unread, so I could come back to it and give a review.

Great introduction. Some feedback / suggestions:

1. You might want to state what you're attempting to fix earlier on; even before your introduction
2. For this section, consider giving raw data that validates this statement:
"For iterating, we send the whole Array at once
We waste Bus bandwidth (all those empty slots), but the speed-up from contiguous memory outweighs that"
3. I'm not sure I get the final outcome. Is iteration 4 the desired approach? If so, can you explain a bit more about that approach with an example?
4. After finishing the read, I'm still not sure if I understand whether you are maintaining these arrays as the end-all-be-all for component data or if these are assembled from non-contiguous memory (i.e. class data) into contiguous memory before a processor runs through all or a subset of them to calculate something. I didn't read the referenced articles you put at the top, which may be why I'm unclear.

-Amir



Adam

--
You received this message because you are subscribed to the Google Groups "AltDevAuthors" group.
To unsubscribe from this group and stop receiving emails from it, send an email to altdevauthor...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yggy King

unread,
Feb 9, 2014, 12:48:06 PM2/9/14
to altdev...@googlegroups.com
Hi Adam,
Some feedback on your latest draft. (I've been reading your t-machine posts and considering if/how we should add an entity system to our game, so this is very timely, thank-you.)

- in the Terminology paragraph: "It find it confusing" -- should be "I find it confusing"
- you say "Runtime execution has to operate on “all the Components for Entity X” at a time (anything else causes worst-case behaviour for a modern computer/CPU architecture)" -- wouldn't a modern (streaming) architecture prefer to operate on "all the Components of type Y for all Entities"? You repeatedly come back to this question in the article, but this sentence makes it sound like you treat this as a foregone conclusion, not a thesis to be demonstrated.
- in "How do OS's process data, fast?" -- "if the CPU asks for them net" -- should be "next" (oh, right, ecks key)
- you mention the Artemis Bag class -- I just looked at the source and it appears virtually identical to the standard List<T> in its interface and somewhat in its implementation. Under the hood, List<T> uses arrays for storage. As a List gets larger, it does not have to copy everything in the array because it starts to use multiple arrays. The result is that most access is contiguous, but growing the list is cheaper than for Bag. I wonder has anyone benchmarked the Artemis Bag vs. standard List?
- in "The hard case" paragraph, you use the example of Position + Velocity. You say "Writes the results directly to the Velocity" components -- don't you mean it writes the results to the Position components?
- in "Iterating towards a solution", first paragraph says "Scenario 3 is comparitively rare" -- should be "comparatively rare". And I need more convincing that it's rare. Coming from an OOP background, references between entities are ubiquitous -- if that's not true in an ES-oriented architecture then where did that information go? If the references became reference between Components (I can buy that) then how does that make it any less random / performance-crippling?
- (was about to start pointing out more spelling errors, then remembered your ecks problem)
- the "Iteration" headers should be made to stand out more, as these provide the primary structure of the article
- in Iteration 3, you suggest use of a "hint" for the "Component Types that entity is likely to acquire" -- this suggests that the entity _could_ acquire other component types. If an entity gets a component that was not mentioned in the hint, what happens? Are its components no longer contiguous? How does it get stitched back together for processing?
- more of your table-style illustrations for Iterations 3 and 4 would be very helpful
- I suggest expanding your conclusion a bit, for example a sentence on what works for smaller games (BigArray per Component) and what's needed for larger games (MegaArray + ID tricks + index data).

In part 5 of your t-machine blog you outline a set of tables to hold the data of an entity system. Whether in the current post or a future one it could be useful to draw these tables, discuss how they need to be indexed, and demonstrate data structures that implement the tables.

cheers,
Yggy

Adam Martin

unread,
Feb 9, 2014, 1:16:07 PM2/9/14
to altdev...@googlegroups.com
Thanks for hilighting the typos - silly mistakes, easy to fix.

On the more complex items:

On 9 February 2014 17:48, Yggy King <yg...@zeroandone.ca> wrote:
> - you say "Runtime execution has to operate on "all the Components for
> Entity X" at a time (anything else causes worst-case behaviour for a modern
> computer/CPU architecture)" -- wouldn't a modern (streaming) architecture
> prefer to operate on "all the Components of type Y for all Entities"? You

...together with:

> I need more
> convincing that it's rare. Coming from an OOP background, references between
> entities are ubiquitous -- if that's not true in an ES-oriented architecture

I wanted this to be "assumed knowledge" for the post. I see what you
mean that it's treated as fact, when it's arguable - but IMHO that's
enough of a tangent I can justifiably skip it here?

(Incidentally ... my short answer would be: it's different because the
architectures are fundamentally different (and incompatible) in how
they define and treat "data" and "code".

My original 5 x blog posts on ES's are (IMHO) poor writing. But I
wrote them for 2 specific purposes, that they did well: to increase
take-up, and to ram-home this idea that "everything you expect ... is
wrong". Switching to an ES is a paradigm-shift, and it's easier if you
start by discarding your existing knowledge, and re-learning from
scratch which bits still apply - and which ones don't)

> then where did that information go? If the references became reference
> between Components (I can buy that) then how does that make it any less
> random / performance-crippling?

Your algorithms become more batch-oriented and context-free.

...which is a such a short statement, maybe I can/should include that
at start of the post, as a "recap"?

> - you mention the Artemis Bag class -- I just looked at the source and it
> appears virtually identical to the standard List<T> in its interface and

Yeah ... I only wanted to call it out as "see how small a change you
can make and get a boost".

Is it a distraction? Should I remove mention of this entirely?

> has anyone benchmarked the Artemis Bag vs. standard List?

I haven't, but ... Arni's been a java coder for a long time, and I'm
pretty sure he wouldn't have built it unless it gave a substantial
boost / convenience improvement on the JVM's he was using.

> - in Iteration 3, you suggest use of a "hint" for the "Component Types that
> entity is likely to acquire" -- this suggests that the entity _could_
> acquire other component types. If an entity gets a component that was not
> mentioned in the hint, what happens? Are its components no longer
> contiguous? How does it get stitched back together for processing?

This was meant to be covered by the 3 sentences immediately following:
what goes to CPU is a skiplist / "only read the data in these ranges".
Not clear enough? Covered too quickly?

(towards the end, I was feeling bad about the excessive length, and
it's possible I took brevity much too far )

Adam Martin

unread,
Feb 9, 2014, 1:29:17 PM2/9/14
to altdev...@googlegroups.com
On 7 February 2014 06:21, Amir Ebrahimi <ae.p...@gmail.com> wrote:
>
> 1. You might want to state what you're attempting to fix earlier on; even
> before your introduction
> ...
> 3. I'm not sure I get the final outcome.
> ...
> 4. After finishing the read, I'm still not sure if I understand whether you
> are maintaining these arrays as the end-all-be-all for component data or if

Yes, as a standalone article, it's pretty vague.

I'm wondering: should I fix that, or deliberately make it more
"bloggy" - I think I need to jump one way or the other, yes?

I started this as a blog post:

"For people who've been using ES's for a while, and following my other
posts, here's some more in-depth look at the details most of you
haven't delved into yet. It's required knowledge if you want to get
into better/faster architectures, and it's all on the theme of
in-memory data structures"

...and it feels like making it into a proper, standalone article is a
lot more work :(.

Yggy King

unread,
Feb 9, 2014, 6:39:29 PM2/9/14
to altdev...@googlegroups.com
Hi Adam,
Just going to respond to some of your feedback on my questions. First off though, if I get too much into certain details (and possibly miss others) it's because I'm not just reading as an altdev peer reviewer, but also with a genuine and practical desire to understand and apply the principles. So some of my questions may indicate where I didn't grok elements of your previous 5 posts, others may be fodder for future articles, and perhaps only a small subset need to be addressed in this article. Take my feedback as you see fit, apply what resonates, and publish your article when you're happy, not when I'm happy. (I guess that's a motherhood statement that applies to ADB generally ... ;-)

Anyways ...
- regarding "assumed knowledge" for the post, I don't think you can just say "read and understand my 5 previous posts" -- perhaps better to say something like "In this article I assume the following. For more details and justification, refer to my previous articles." And then state the points you want to accept as given.
- you frequently mention that OOP and ES are fundamentally different and incompatible. It would be great to come up with a short summary of what that means and why it is. In my mind, the important distinction is that in OOP, everything is built around the entity, while in ES, it's really the components that matter, and the entity is little more than a way of finding components.

I think it's fair to say that naive OOP entity processing would be:
foreach entity X in the system
    foreach component Y in X.Components
        Y.Update()

By contrast, the simplest naive ES entity processing would be:
foreach component type T in the system
    foreach component data instance Y of type T
        TProcessor.Update(Y)

Note that there is no reference to entities here. This alone turns ones thinking on its head and could help the OOP programmer shift their paradigm. It also emphasizes the granularity and decouple nature of processors (though one might argue the same of Component.Update methods in the OOP approach).

But that naive processing loop overlooks interactions between components (e.g. Position+Velocity). So more geenrally what we really want is:
foreach entity processor that operates on component types S,T,...
    foreach entity X that has components of type S,T
        STProcessor.Update(getComponent(X,S), getComponent(X,T))

The entity has come back, but only as a way to find related component data instances. And there's the rub -- how do you efficiently organize data to (a) make this loop possible and (b) make it efficient? To me, your article is mainly about exploring this question.

For the simple case (just one component per processor), an array is a fine data structure. If the structure needs to grow (as more entities are created) then the Bag idea makes sense as a mechanism to manage the underlying array. In C#, I would replace Bag<T> with the standard List<T>, but it's still instructive to look at the Bag code. (Not being a Java guy, I have no idea if Java has an efficient equivalent of List<T>.)

To my question about components being added dynamically and your response to "only read the data in these ranges": I meant when new components are added to old entities, not when new entities with new components are created. Does the type hint have to include _every_ component the entity may have in future? If so you should say so. If not then I'm still unclear how ranges solves the problem. My understanding was that ranges dealt with the case of running past the end of the reserved block of entity ID's for a given type hint, since you can have multiple ranges per type hint. Or perhaps this is resolved by the data in the index tables? If so then I think a bit more info on those tables is in order.

cheers,
Yggy

Adam Martin

unread,
Feb 9, 2014, 7:35:46 PM2/9/14
to altdev...@googlegroups.com
On 9 February 2014 23:39, Yggy King <yg...@zeroandone.ca> wrote:
> it's because I'm not just reading as an altdev peer reviewer, but also with
> a genuine and practical desire to understand and apply the principles. So

Bear in mind this post isn't a perfect ES implementation, but more: a
learning journey, in small steps, aimed at coders using many different
languages.

Some of it is basic (especially to C++ devs), but it's a chance to
talk about some general issues relevant to everyone doing ES's.

> - you frequently mention that OOP and ES are fundamentally different and
> incompatible. It would be great to come up with a short summary of what that
> means and why it is. In my mind, the important distinction is that in OOP,

My favourite TL;DR is "ES's are Functional programming embedded in a
non-Functional language".

That's why they map so easily to other Fn-domain things (SQL tables,
batch-processing, multi-threading), and so poorly to Imperative-domain
things (most OOP source code).

...but it might be too obscure to be helpful :). And I've never tried
proving it, computationally - so it might be wrong.

> To my question about components being added dynamically and your response to
> "only read the data in these ranges": I meant when new components are added
> to old entities, not when new entities with new components are created. Does

Sorry, yes, I see what you mean now. The original flow of the post had
a section on allocation/re-allocation, using small mini-arrays. My
repeated re-draftings have deleted it. Oops.

...which gave an obvious answer to your question: move all the
following items down, inside the array, and if it overflows ... flow
into a new mini-array (inserted between this array and the one after).
Simply "don't worry, be crappy" - the overhead for shuffling this is
small.

It's nothing special: a "poor man's" memory management scheme. But
it's easy to implement (which is very important to me now I'm
independent!).

And, importantly: it segues nicely into some other, more cunning, ES
Data Structures that I want to blog about soon.

Yggy King

unread,
Feb 9, 2014, 8:15:14 PM2/9/14
to altdev...@googlegroups.com
Adam, good stuff, look forward to reading the full post (and future ones). (So don't worry about memcpy?)

You mentioned somewhere that you want to build such an ES inside Unity. Interesting, me too :)
Are you going to use only structs for components?
Or allow reference types and assume that if you pre-allocate a pool of them then they'll generally tend to be "contiguous enough"?

BTW, I think you'd find this interesting -- http://igoro.com/archive/gallery-of-processor-cache-effects -- when we say contiguous what we really mean is "can be loaded in multiples of cache line sizes and not evict other chunks from our relevant set," or something like that. (And then there's multi-processor job-based architectures, like PS3, where the rules are different but ES approach also works great).

cheers,
Yggy






Adam Martin

unread,
Feb 10, 2014, 8:33:43 AM2/10/14
to altdev...@googlegroups.com
On 10 February 2014 01:15, Yggy King <yg...@zeroandone.ca> wrote:
> build such an ES inside Unity.
> ...
> Are you going to use only structs for components?
> Or allow reference types and assume that if you pre-allocate a pool of them
> then they'll generally tend to be "contiguous enough"?

I don't know yet. I'm blogging as I go so that when I get to Unity
I've got something to refer back to.

There's lots to give you headaches when planning the data side.
dot-Net versions vs. Mono runtime vs. Unity built-in rules
(single-threaded engine) vs. Unity-editor disabling .Net features
(serialization), etc.

> BTW, I think you'd find this interesting --
> http://igoro.com/archive/gallery-of-processor-cache-effects -- when we say

Thanks, that's a great resource to share with people.

Adam Martin

unread,
Mar 8, 2014, 8:03:05 AM3/8/14
to altdev...@googlegroups.com
Finally published - more diagrams, clearer conclusion, and snipped a
lot of words to make it quicker/easier to read:

http://www.altdevblogaday.com/2014/03/08/data-structures-for-entity-systems-contiguous-memory/

Personally, I find current ADB's CSS hard to read (I'm assuming the
new site will fix this, so I'm not hugely worried) - but for now I've
mirrored on my site, with quick hacks to the CSS for readability:

http://t-machine.org/index.php/2014/03/08/data-structures-for-entity-systems-contiguous-memory/
Reply all
Reply to author
Forward
0 new messages