CoreData poor performance

71 views
Skip to first unread message

Florian Zschocke

unread,
Aug 12, 2006, 5:10:01 PM8/12/06
to
Hello NG,

I have switched my app from NSArchiver to CoreData with an SQLite
persistent store because I thought that a database like structure
would produce a better performance and smaller memory usage. What I
see now that the fetch-requests are extreme slow (if I compare that
with ADO or DAO it is a shame). Also inserting a new object into the
context takes milliseconds. It seems that the objects from the prior
request are not released. The memory consumption rises with each
request. I need to mutablecopy the fetched arrays - perhaps that is
the reason ? I did some tries with a extra release and autorelease -
but that leads to a double free. For the moment it is much slower and
fatter than with NSArchiver and that can't be the truth. So where are
the typical beginner errors in my implementation ? How can I get the
prior requests out of the memory ? Or is that the context that grows
with each request ? Should I save a after each request ? What can I do
to speed up ?


Thank you for your time and help


Florian

Chris Hanson

unread,
Aug 12, 2006, 8:34:53 PM8/12/06
to

Core Data applications should scale quite well to large data sets when
using an SQLite persistent store. Core Data optimization questions
might be better asked on the cocoa-dev list, since there's a much
higher density of Core Data-experienced developers on that list than on
Usenet.

That said, there are a couple implementation tactics that are critical
to performance for pretty much any application using a technology like
Core Data:

(1) Maintain a well-normalized data model.
(2) Don't fetch or keep around more data than you need to.

Implementing these tactics will make it much easier to both create
well-performing Core Data applications in the first plce, and to
optimize the performance of applications already in progress.

Maintaining a normalized data model is critical for not fetching more
data than you need from a persistent store, because for data
consistency Core Data will fetch all of the attributes of an instance
at once. For example, consider a Person entity that can have a binary
data attribute containing a picture. Even if you're just displaying a
table of Person instances by name, Core Data will still fetch the
picture because it's an attribute of Person. Thus for performance in a
situation like this, you'd normalize your data so that you have a
separate entity, Picture, to represent the picture for a Person on the
other side of a relationship. That way the image data will only be
retrieved from the persistent store if the relationship is actually
traversed; until it's traversed, it will just be represented by a fault.

Similarly, if you have lots of to-many relationships and need to
display summary information about them, de-normalizing your data model
slightly and caching the summary information in the main entity can
help.

For example, say your app works with Authors and Books. Author.books
is a to-many relationship to Book instances and Book.authors is a
to-many relationship to Author instances. You may want to show a table
of Authors that includes the number of Books related to the Author.
However, binding to "books.@count" for that column value will cause the
relationship fault to fire for every Author displayed, which can
generate a lot more traffic to the persistent store than you want.

One strategy would be to de-normalize your data model slightly so
Author also contains a booksCount attribute, and maintains that
whenever the Author.books relationship is maintained. This way you can
avoid firing the Author.books relationship fault just because you want
to display the number of Books an Author is related to, by binding the
column value to "booksCount" instead of "books.@count".

Another thing be careful of is entity inheritance. It's an
implementation detail, but inheritance in Core Data is single-table.
Thus if you have every entity in your application inheriting from one
abstract entity, it'll all wind up in a single table, potentially
increasing the amount of time fetches take etc. because they require
scanning more data.

You're correct and inferring that retaining or copying the arrays
containing fetch results -- not "fetched arrays," the array you get
back by executing a fetch request is just a container for the results
of the fetch -- will keep those results (and their associated row cache
entries) in memory for as long as you retain the arrays or copies of
them, because the arrays and any copies will be retaining the result
objects from the fetch. And as long as the result objects are in
memory, they'll also be registered with a managed object context.

If you want to prune your in-memory object graph, you can use
-[NSManagedObjectContext refreshObject:mergeChanges:] to effectively
turn an object back into a fault, which can also prune its relationship
faults. A more extreme measure would be to use
-[NSManagedObjectContext reset] to return a context to a clean state
with no changes or registered objects. Finally, you can of course just
ensure that any managed objects that don't have changes are properly
released, following normal Cocoa memory management rules: So long as
your managed object context isn't set to retain registered objects, and
you aren't retaining objects that you've fetched, they'll be released
normally like any other autoreleased objects.

Hopefully this gives you some general strategies that you can apply in
improving the performance of your Core Data-based application. It
would be much easier to help with any performance issues you run into
if you could post them to the cocoa-dev list and describe both what
Shark tells you about where your code is spending its time and show the
relevant portions of your code.

-- Chris

Florian Zschocke

unread,
Aug 13, 2006, 1:19:42 PM8/13/06
to
Chris Hanson <c...@mac.com> wrote:

Dear Chris,
thank you very much for your very detailed answer. A lot of new aspects I have
to think about.



> Core Data applications should scale quite well to large data sets when
> using an SQLite persistent store. Core Data optimization questions
> might be better asked on the cocoa-dev list, since there's a much
> higher density of Core Data-experienced developers on that list than on
> Usenet.
>

Perhaps, but I'm programming a newsreader. So posting here, for me has always
two functions - asking for help and testing the software.

> That said, there are a couple implementation tactics that are critical
> to performance for pretty much any application using a technology like
> Core Data:
>
> (1) Maintain a well-normalized data model.
> (2) Don't fetch or keep around more data than you need to.
>

To 1. - what is meant with well-normalized data model ?
In my app I have only one entity. This entity has only one, one to many,
relationship to its self. With that custom class I feed a subclass from
NSOutlineView.

To 2. In my case that would mean that i should find a way to feed the
OutlineView with sequential fetch requests. But that means also that I have to
breakup the relationship - right ?
Perhaps I could store a array of object-ids instead of having this
relationship. But if I have a recursive programming down the tree than how
long will it take to fetch the objects by there ids ? Also searching in the
content should be very complicated in this case.

>
> You're correct and inferring that retaining or copying the arrays
> containing fetch results -- not "fetched arrays," the array you get
> back by executing a fetch request is just a container for the results
> of the fetch -- will keep those results (and their associated row cache
> entries) in memory for as long as you retain the arrays or copies of
> them, because the arrays and any copies will be retaining the result
> objects from the fetch. And as long as the result objects are in
> memory, they'll also be registered with a managed object context.
>
> If you want to prune your in-memory object graph, you can use
> -[NSManagedObjectContext refreshObject:mergeChanges:] to effectively
> turn an object back into a fault, which can also prune its relationship
> faults. A more extreme measure would be to use
> -[NSManagedObjectContext reset] to return a context to a clean state
> with no changes or registered objects. Finally, you can of course just
> ensure that any managed objects that don't have changes are properly
> released, following normal Cocoa memory management rules: So long as
> your managed object context isn't set to retain registered objects, and
> you aren't retaining objects that you've fetched, they'll be released
> normally like any other autoreleased objects.

I will make some tests with refreshing the objects.

> Hopefully this gives you some general strategies that you can apply in
> improving the performance of your Core Data-based application. It
> would be much easier to help with any performance issues you run into
> if you could post them to the cocoa-dev list and describe both what
> Shark tells you about where your code is spending its time and show the
> relevant portions of your code.

I'm sure that it spends the time by fetching the objects.
What makes me hopeful is that saving goes very fast.
Perhaps I will join the dev-list.

Florian


Florian Zschocke

unread,
Sep 10, 2006, 3:15:00 PM9/10/06
to

What I found now is ( and I really should have known that ) that you
can speed up fetch-request by sorting the conditions with ascending
complexity. A bool or numeric condition should always be in front of a
text condition. For example (Male == true) AND (Name like 'Adam') is
much faster than (Name like 'Adam') AND (Male == true).

Florian


Reply all
Reply to author
Forward
0 new messages