Sky Stability and A Few Questions

Daniel Fagnan

unread,

Jul 31, 2013, 7:54:55 PM7/31/13

to sk...@googlegroups.com

I'm just wondering a few things.

1) How stable is sky right now? I realize it's pre 1.0 and therefor not at a super stable position right now.

2) When the multi-node support drops, will there be a suitable upgrade path that doesn't involve starting from scratch at that version? For example, if I were to use sky right now on a single machine, when multi-node support comes, will I be able to smoothly upgrade without any problem, or is some sort of migration needed?

3) I haven't done any benchmarks, so I might have to find this out myself, but do you know if performance drops beyond a certain point? For example, in the billions of events? I'm not too sure on the specifics of the internal workings and how events are stored exactly. It seems like things aren't being denormalized and runtime aggregation at the billion event level seems like it'd be quite slow.

Thanks,

Daniel

Ben Johnson

unread,

Jul 31, 2013, 10:52:56 PM7/31/13

to sk...@googlegroups.com

Daniel-

1) The "unstable" branch of Sky is quite stable. I run it in production and Shopify is also running it in production. That branch will become v0.4.0 once multinode drops. There have been a lot of changes (e.g. switching from LevelDB to LMDB) since v0.3.0 do don't use the master branch.

2) The migration to v0.4.0 from the current unstable branch should be minimal. It will basically start as a cluster of one. New nodes and groups will be added to that original node.

3) I haven't tested in the billions of events yet but Sky operates on sequential access so it should scale linearly. You'll see a drop off in performance once you go beyond the size of RAM on your machine. Sky works well on relatively low cardinality data and does a lot to compact and dedupe data. Event data is typically 50 bytes or less so you can store a billion events in 64GB of RAM. On my single node machine I'm querying through about 3GB/sec so that's a billion events in ~20 seconds. Multinode should scale out linearly so a ten node cluster should aggregate a billion events in 2 seconds.

I hope that helps. Let me know if you have any other questions.

Ben

Sent from my iPhone

Daniel Fagnan

unread,

Aug 1, 2013, 3:12:22 PM8/1/13

to sk...@googlegroups.com

Awesome, thank you very much.

Just another quick question, I'm just curious on why you chose to use Lua for the query parsing instead of plain Go?

Ben Johnson

unread,

Aug 1, 2013, 3:24:25 PM8/1/13

to sk...@googlegroups.com

The "unstable" stuff definitely needs some documentation. It's been in flux until recently so I haven't put the time toward it yet.

I used LuaJIT because it lets me take the declarative JSON query, convert it to a Lua program and then JIT that program into optimized machine code and run it. Part of the reason why it Sky is so fast is that there's no interpretation being done -- it's literally running a compiled program to compute the data. Lua and Go then communicate results through MsgPack. Sky originally used LLVM to do the JIT processing but LLVM is a huge dependency and at the time wasn't easy to install.

At some point in the future I might switch from Lua to use Julia (http://julialang.org/) which is an LLVM-based language use for statistical processing. LuaJIT runs at about half the speed of C but Julia runs pretty close to C speeds. Plus there's a lot of work being done in Julia to add machine learning and other cool stuff. That won't cause any changes to the external Sky API or data format so it would be a fairly seamless transition. Also, Julia has nice installation packages across major OSes.

Ben Johnson

b...@skylandlabs.com

Reply all

Reply to author

Forward