Understanding loading mechanism

Qiwen Chen

unread,

Feb 22, 2019, 11:30:51 PM2/22/19

to zodb

If I use IFBTree ( integer-keyed float-valued BTree) to store some time series data. How does the loading work? For example, if l want to load the entire time series into memory, how many trips happen between the client and server? In the document, it says it uses lazy loading. Does it mean each key value pair is loaded from the db to memory one at a time?

Jim Fulton

unread,

Feb 23, 2019, 10:07:51 AM2/23/19

to Qiwen Chen, zodb

On Fri, Feb 22, 2019 at 9:30 PM Qiwen Chen <qwc...@gmail.com> wrote:

If I use IFBTree ( integer-keyed float-valued BTree) to store some time series data.

Interesting. What's the domain? Financial data? Scientific?

How does the loading work?

BTrees store data stored by key in such a way that only part of a tree needs to be loaded into memory to fetch data for a subset of keys. Similarly, if only a subset of keys (like 1) is modified, then only a subset of the BTree is modified and stored.

See: https://en.wikipedia.org/wiki/B-tree

The leaves of the tree, where the data are stored are called buckets. Internal tree nodes help find the right bucket for a key or key range.

For example, if l want to load the entire time series into memory, how many trips happen between the client and server?

That depends on the access pattern. Assuming you iterated over the tree, you would load roughly N/100 objects, where N is the number of values in the tree. (This is governed by a compile-time max bucket-size parameter, which for IFBTrees, 120.)

Each load requires a round-trip.

Recently, we added a prefetch API that mitigates round trips (by effectively doing them in parallel) if you know you're going to load multiple objects. This could conceivably leveraged when iterating over a BTree.

Loading the entire tree into memory defeats one of the design goals, which is to have trees that are much larger than application memory. Sometimes, when we want to iterate over trees that are larger than memory, we periodically explicitly invoke garbage collection (normally invoked implicitly at transaction boundaries) to free memory got leaves were done with.

In the document, it says it uses lazy loading. Does it mean each key value pair is loaded from the db to memory one at a time?

No, it means the minimum number of nodes (internal or bucket) are loaded to retrieve a given key or key range.

The primary use case of BTrees is storing very large mappings where you usually only want to access very small subsets of keys.

There's a feature of our implementation though that might be useful for time series data, depending on the application. I mentioned that BTrees are trees of nodes with data in buckets at the leaves of the trees. But BTrees are also linked lists of buckets. Each internal node has a reference to it's first bucket and each bucket has a reference to the next bucket in the tree. This is to allow efficient iteration.

In the example above, to iterate over the entire tree, we only had to load the top node and the buckets. We didn't have to load or traverse internal nodes.

As always, picking and tuning data structures depends on application usage (update and access) patterns, but I can imagine time-series applications for which BTrees could work very well, especially with some compile-time tuning (much larger bucket sizes and different storage types).

Early in my career I worked a lot with hydrologic time series data that was collected over short fixed time intervals (5 or 15 minutes). This data was very expensive to store at the time (early eighties, when a 300MB disk was the size of a dishwasher and wildly more expensive). Storage of the data was wildly important and the nature of this data allowed it to be very compressible, typically 90-95%, which was an important factor in it's storage.

(I could imagine a variation of BTrees for fixed-time-interval data that avoided storing time values in buckets.)

Of course, storage is much cheaper now, but we store a lot more data. The same factors that made a custom storage format for time-series data attractive in the 80s, makes columnar formats like Parquet and ORC popular today.

(Hm, writing that made me wonder if it would be useful to have a BTree variant that used Numpy/Arrow/xnd arrays to store bucket data. Perhaps memory-mapped...Hm....)

Jim

--

Jim Fulton
http://jimfulton.info

Qiwen Chen

unread,

Feb 23, 2019, 4:20:53 PM2/23/19

to zodb

Thanks for the explanation. Actually it's pretty obvious to me now that the loading unit is a bucket in the BTree.

The domain is Financial Data. My application loads the time series by min and max key pair. It's read heavy and write light. So it's well suited.

The prefetch feature is the exact thing i am looking for. Before doing any calculation, the application would know the exact min and max key pairs for a list of time series objects. I would like to load them in one go. Any reference you can point me to?

In my application, I use the iterator in BTrees extensively. One big limitation is that I cannot iterate in reversed order. I guess that is because the linked lists of buckets are only linked in one direction.

Qiwen

Jim Fulton

unread,

Feb 24, 2019, 12:47:11 PM2/24/19

to Qiwen Chen, zodb

On Sat, Feb 23, 2019 at 2:20 PM Qiwen Chen <qwc...@gmail.com> wrote:

Thanks for the explanation. Actually it's pretty obvious to me now that the loading unit is a bucket in the BTree.

The domain is Financial Data. My application loads the time series by min and max key pair. It's read heavy and write light. So it's well suited.

Yup

The prefetch feature is the exact thing i am looking for. Before doing any calculation, the application would know the exact min and max key pairs for a list of time series objects. I would like to load them in one go.

Knowing the min and max keys don't help you as much as you might think. To use prefetch you need oids or (ghost) objects, which BTrees don't have an API to give you. To implement such an API, BTrees would have to load a bunch of internal nodes to find the buckets you want, defeating the purpose.

Similarly, iterators don't expose enough information to allow you to prefetch data. You couldn't really implement an interface to get all of the buckets in one go, because you have to load a bucket to get it's next pointer.

What could, conceivably, be done is to have BTree iterators prefetch next buckets as buckets are loaded. Then, retrieval of the next bucket could be in flight while an application is processing the current bucket. I'd be open to a PR that implemented this under the control of a new prefetch keyword argument to the BTree items, keys, and values methods.

Any reference you can point me to?

Prefetch is a new somewhat experimental method. It's not documented anywhere yet. Prefetch sounds very appealing until you try to figure out how to leverage it. :-]

There's a prefetch method on connections (_p_jar attributes of persistent objects) that takes one or more oids, persistent objects, or iterables of oids or persistent objects. It doesn't return anything, but just hints to the connection and underlying storage that the objects will likely be wanted soon. For ZEO, this causes requests to be sent for that data. When the data are returned, they're added to the ZEO cache to be available when the application requests it. If the application requests the data before it's returned, the application will block.

At the storage level, prefetch has only been implemented for ZEO. It's ignored for other storages, most notably RelStorage.

In my application, I use the iterator in BTrees extensively. One big limitation is that I cannot iterate in reversed order. I guess that is because the linked lists of buckets are only linked in one direction.

Yes. When I designed BTrees I sought to avoid circular references, because I was hoping that we'd have storages that supported reference-counting garbage collection, but we never did that.

If I ever redesigned BTrees, I'd probably make them doubly-linked lists (and probably add more internal pointers to simplify computation).

Something else I noted that you may not have picked up on in my earlier message is that you can also tune BTrees at the C level. In particular:

You can adjust the maximum bucket size to better fit your requirements, perhaps using much bigger buckets for your application.
You can use different C types that might fit your needs better.

Jim

--
You received this message because you are subscribed to the Google Groups "zodb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zodb+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Qiwen Chen

unread,

Feb 25, 2019, 12:20:45 PM2/25/19

to zodb

In this case, i won't be able to leverage the prefetch feature.

My application consumes a lot of data but produces very limited amount for persisting. And data for consumption won't be modified in the application. Before i give up on ZODB for this project, maybe I can try the following. First, i can download all the data needed (user defined scope) from a non-ZODB server and create a ZODB file storage on the local disk. That might take a few mins at most. Then i can leverage the lazy loading feature while controlling the memory usage for the calculation process. And all the calculation can still deal with native python objects. Am I right in thinking that way? Essentially I am using ZODB as a on-disk caching for data consumption. Any application data can still write to a ZODB server and it's not a lot of data. Any thoughts?

Reply all

Reply to author

Forward