How VMS handle large data update?

Zhenwei Liu

unread,

Oct 13, 2015, 5:33:53 AM10/13/15

to Netflix Zeno Discussion Group

Hi

Since I read this post http://techblog.netflix.com/2013/12/announcing-zeno-netflixs-in-memory-data.html and it mention that

The VMS architecture consists of a single server (with plenty of CPU and RAM), called the “data origination server”, which every few minutes pulls in all of the data necessary from multiple sources and reconstructs our video metadata objects.

What I want to know is that how VMS handle this "all of the data necessary from multiple sources" if they are larger than a couple of gigabytes. Will VMS load them all to memory and construct to Objects? I think it not so practical.

Drew Koszewnik

unread,

Oct 13, 2015, 8:07:20 PM10/13/15

to Netflix Zeno Discussion Group

Hi Zhenwei,

Yes, your interpretation is correct. VMS currently loads all metadata into memory and constructs Objects representing that data on our "data origination server".

A couple of GB can mean very different things depending on the context. A couple of GB of uncompressed json? A couple of GB memory footprint after the objects are constructed? A couple of GB of zipped XML data?

It's a tradeoff.

Some Pros:

Keeping this data in memory gets you extremely low access latency.
You get atomic data states, which you can:

shunt back and forth between environments
test before propagating
roll back if errors exist

Some Cons:

Waiting for the data origination server to pick up and package your changes into a blob introduces some propagation delay.
Your application will have a larger memory footprint.
You may experience longer startup times (for the time it takes to download and initialize your data).

It's definitely feasible to do this (for at least some interpretations of a couple of GB), and if extremely low access latency is important for you, it can be practical as well.

- Drew.

Zhenwei Liu

unread,

Oct 14, 2015, 3:10:57 AM10/14/15

to Netflix Zeno Discussion Group

Hi Drew

Could you tell me more detail about the data origination server?

I think there are dozens of GB video data in netflix (maybe hundreds of ?), and it's hard to imagine load them all to memory and construct to Objects - since it will incease the memory footprint a lot if those binary (maybe compressed) meta data become Java Objects.

The situation is that I want to build a public data dispatch system, it dispatch not only video meta data but also all other business system data to the client. If every business system has couple of GB data, I think my origination server can not hold all these data in memory.

Drew Koszewnik

unread,

Oct 15, 2015, 1:34:35 PM10/15/15

to Netflix Zeno Discussion Group

Hi Zhenwei,

We structure our data so that we get the maximum benefit from the automatic deduplication that the blob format provides. Depending on how your data is organized, even if you have quite a lot of data, once the common parts are normalized it may become much more manageable.

You shouldn't have to hold all denormalized Objects in memory at once. Instead, as you instantiate individual objects, you can add them to the state engine and then drop them on the floor. Once they're in the state engine, they are deduped. Alternatively, in theory there's only one data origination server per environment, so these can be pretty hefty machines with lots of RAM (it's not unheard-of to have 100s of GB of RAM on single instances, check out the r3.8xlarge from AWS). One more alternative, you could also shard the data origination server, to produce multiple blobs which get loaded on clients, or produce multiple blobs and then combine them to again have a single blob to load on clients.

Usually, the bigger concern is minimizing the memory footprint on the client instances. If the deduplicated memory footprint works for you on your clients, and the tradeoffs are right for your use case, then you can definitely find ways to produce the blobs.

Drew.

Zhenwei Liu

unread,

Oct 23, 2015, 12:36:28 AM10/23/15

to Netflix Zeno Discussion Group

I see. Thank you for the explanation. :)

Reply all

Reply to author

Forward