> There is one thing however that I couldn't make any sense of:
> SQLite maintains an on-disk representation of your app's data, right?
> Your app however deals with an in-memory representation of same data.
> This requires syncing: "[Brent:] Making the switch did mean I had to
> do some things manually that Core Data would have done for me: keeping
> any in-memory items synced with the database storage, mostly."
> If running "[feedItem markAsUnread];" on 10,000 feedItems in Core Data
> was too slow for NNW, and SQLite on the other side super fast…
> …then wouldn't you lose all the benefits of SQLite's performance at
> the very point where you have to sync your in-memory objects (the
> stuff that gets presented to the user) with their newly updated/
> created/deleted/etc on-disk clones in SQLite?
One of my apps (VoodooPad) is in a bit of a similar situation (it can have thousands of pages, which can be in memory, and of course in an sqlite file). My solution with synchronization is to make sure I do as little of it as possible. I try and keep as many of the page objects on disk as possible - and whenever I need to grab one, I pull a fresh one out of the database (so it's possible to have two or more objects in memory that represent the same page).
If there's a change made to that object that the other copies need to know about, a notification is sent out and the other page objects pick it up, or they just get discarded and a new copy is pulled out of the db. This is pretty rare though.
In general, instead of handing around objects all the time I'll hand around uuids which represent the page in the database. Then when I need to act on that data or pull it out, I'll just ask my database manager object to hand me back a new object with the uuid. And since I get a new one each time, there's no problem using those guys on multiple threads.
Hopefully that made sense.
August 'Gus' Mueller
Flying Meat Inc.
Nevertheless, I *strongly* recommend using Core Data. As I pointed out in my initial post on the subject, Core Data is awesome. There are a few things that Core Data does slowly -- but devices and Macs are getting faster, and it's a good bet Core Data will keep improving. (Remember the old hockey adage: skate to where the puck is going to be.)
I myself use FMDB when I'm working with non-object things. But for something like an article from a feed (or tweet from Twitter, or post from Facebook, or similar), I highly recommend using Core Data. Any time you're dealing with things that are more object-y than database-y, go with Core Data.
I actually argue the opposite view: I never use Core Data. Here's the rational behind my position:
In any given application that serializes data to disk, there are really two different models in an application:
- The in-memory application object model.
- The on-disk serialized data model.
The in-memory application object model services the needs of the application; it is a literal representation of the state of your application, in memory. The API and data it provides need only be maintained in the context of that specific version of the application, during that single runtime. You're free to modify the application model during development without constraint, as the specific concepts it is modeling do not need to be shared across release of the application -- if you add a bit of data or API to your application model and remove it later, there's no concern about long-term maintenance of that data, migrating it across versions, etc.
In contrast to the application model, the on-disk serialized data model is a high-level, abstract representation of your application's data that must be maintained across iterations of your application, and when optimally expressed, will likely not even map 1:1 with the optimal application's model. It requires unique consideration on a number of fronts:
- Data longevity, specifically in ensuring that the concepts as modeled are maintainable across the lifetime of your data.
- Data validity -- care to avoid duplicate, corrupt, or invalid data. Whereas there's often little harm to maintaining multiple read-only in-memory data records, on-disk data should be updated carefully.
- Atomic or transactional updates. Often data changes must be implemented as an all or none transactional/atomic update.
CoreData attempts to tightly weld these two different abstract representations together, attempting to allow developers to leverage their single application model as both an on-disk serialization as well as an in-memory representation. Unfortunately this abstraction is extremely leaky, and Core Data's requirements spread pervasively through the application's in-memory model:
- Every managed object must be a subclass of NSManagedObject, which results in objects inheriting a large number of non-overridable methods and strict requirements that provide object behavior as per Core Data's internal requirements, preventing the application author from expressing an API more suited to their in-memory model representation (such as overriding -hash, -isEqual, etc).
- Managed objects are not thread-safe -- special case must be taken when using GCD or threads directly:
- Managed objects should not be shared across threads (as per Apple's recommendation)
- If shared across threads, it is the application author's responsibility to or to implement extremely complex and difficult locking to allow sharing of instances.
- Care must be taken to safely merge data changes made in multiple managed object contexts on multiple threads, which potentially may occur asynchronously.
- The most lightweight transactional/atomic update mechanism is the NSManagedObjectContext, however, when using multiple managed object contexts, as noted above, care must be taken to merge data correctly between contexts.
The leakiness of this abstraction is extremely similar to that of distributed objects, where the idea was that complexity of network communications could be hidden behind a simple object model; your application's model could also be your network model. This led to the same problems that Core Data has today, but instead with the network protocol poorly welded into the application model.
These approaches towards unifying models fail (in my opinion) because they fail to take into account that the models are genuinely different, and actually express different data:
- The application model represents the application's in-memory state. It is transient and thus may be iterated on freely as to suite the application.
- The on-disk serialization model is an abstract representation of the data the application operates on using the in-memory model. It is not the in-memory model, and must be maintained across versions of the application (or even potentially across different applications). The data must be updated according to transactional requirements of the application, errors must be safely handled in disk serialization, etc.
- In the case of distributed objects, which I posit is extremely similar to Core Data, the network protocol defines not just a means of accessing the peer's state, but a dialog between independent applications with their own, potentially very different internal application model/state. The protocol must hand errors, retries, and other complexities that can not be adequately represented through seemingly transparent method calls.
This may be an unpopular view -- I honestly don't know, since this is the first time I've taken any time to explain it -- but I think that ultimately, application implementations benefit from a separation of application and serialization models in terms of implementation time, complexity, cost, and stability. It may be possible to achieve this by maintaining distinct application and Core Data-specific object hierarchies, but such an approach would seem to discard most of the advertised value of using Core Data in the first place.