Moloch architecture question - is a "one viewer to rule them all" architecture possible?

Eric G

unread,

Jul 29, 2013, 10:59:47 AM7/29/13

to moloc...@googlegroups.com

So I'm looking at the architecture drawing Andy posted here: https://github.com/aol/moloch/wiki/Architecture

In the end, my goal is to build a Moloch implementation where there's just one viewer instance... basically so that I don't have to be aware of the "context" of the data I'm looking for. What I mean is that in Andy's drawing, if I had the "outside Internet" tap feeding "Moloch server 1" but I had "inside servers" feeding "Moloch server 2," I'd have to be aware of the distinction and log into the right viewer instance, correct?

So what I'm asking with this post is whether or not the following drawing is a feasible implementation of Moloch (feasible in terms of ability to scale, not terribly inefficient in terms of I/O, etc). I'm completely open to suggestions or comments on "this thing" is essentially what I'm saying. I have old tap boxes that can run Linux right now in place, and it would be super convenient to convert them over to essentially "dump capture boxes" that simply grabbed traffic off the wire and shuffled it over to a common NFS mount that the elastic search nodes could "consume" and that the one viewer instance would have access to.

The drawing is probably the best way of explaining it (my apologies on the drawing by the way... I hacked apart Andy's architecture drawing with MS Paint).

Thanks for any and all input!

--

Eric

http://www.linkedin.com/in/ericgearhart

Andy

unread,

Jul 29, 2013, 12:27:24 PM7/29/13

to moloc...@googlegroups.com

If you are running a single moloch cluster, which really just means all the capture/viewer processes talk to the same elastic search cluster, then moloch already takes care of the context for you. You can talk to any of the viewers and it will auto proxy the requests for you if you talk to the wrong one, or if you need to talk to multiple ones. So from your drawing it looks like this is what you want, so you don't NEED to set up a separate viewer.

You MAY also optionally set up a stand alone viewer process that will proxy all traffic. It still uses the viewer processes that are local to the capture data behind the scenes. You would do this if you want to set up apace in front of viewer for authentication reasons, because of network topology issues or just because you want to. This means you have n+1 viewer processes. This is in fact how we run moloch, since we use apache to authenticate the users instead of the built in moloch digest.

If you are running separate elasticsearch clusters, then there currently is no way to do what you want, since it would need to aggregate the search results.

Thanks,

Andy

Eric G

unread,

Jul 29, 2013, 12:43:03 PM7/29/13

to moloc...@googlegroups.com

Awesome, thanks Andy. I suspected the shared elastic search might have made the viewer "transparent" to the user, I just wanted confirmation. I am planning on "fronting" node.js with Apache so that I can do RADIUS auth, so the n+1 approach is likely the one we'll go with. I wasn't trying to use separate elasticsearch clusters, so your answer hit the nail on the head.

Thanks for the quick response!

--

Eric

http://www.linkedin.com/in/ericgearhart

Eric G

unread,

Jul 29, 2013, 12:49:35 PM7/29/13

to moloc...@googlegroups.com

On Monday, July 29, 2013 12:27:24 PM UTC-4, Andy wrote:

If you are running a single moloch cluster, which really just means all the capture/viewer processes talk to the same elastic search cluster, then moloch already takes care of the context for you. You can talk to any of the viewers and it will auto proxy the requests for you if you talk to the wrong one, or if you need to talk to multiple ones. So from your drawing it looks like this is what you want, so you don't NEED to set up a separate viewer.

The other thing I forgot to mention was the possibility of using block level deduplication with the design drawing above... presumably if the shared /data/moloch/raw directory resided on a shared storage container that supported block level dedup (like ZFS, or NetApp dedup) then the possibility of deduplication of the packet capture data being fed in from capture processes would be introduced, right?

Is anyone trying to dedup their Moloch capture data? We have a bunch of taps in various places around the network (thankfully we have several Gigamon appliances that manage all the taps), so we get lots of duplication in our current full packet capture implementations. It would be really neat if the actual meat of the pcap, the data, could be deduplicated, while the packet headers stayed in place (which I'd think we could achieve with the storage dedup implementation I'm describing).

Just some thoughts.

--

Eric

http://www.linkedin.com/in/ericgearhart

Andy

unread,

Jul 30, 2013, 9:36:57 AM7/30/13

to moloc...@googlegroups.com

Since most disk dedup is done on disk blocks, which are much larger then single packets, and pcaps contain ms resolution time stamps per packet, I'm not sure you'll ever see any savings. I think disk compression would probably work better. I tried a gzip of a 12G pcap raw file and the result was about 60% the size of the input, so 40% savings. Not sure what it would do to disk IO and obviously doesn't help with multiple taps seeing the same packets.

Andy

Eric G

unread,

Jul 30, 2013, 12:26:03 PM7/30/13

to Andy, moloc...@googlegroups.com

On Jul 30, 2013 9:36 AM, "Andy" <andy...@gmail.com> wrote:
>
> Since most disk dedup is done on disk blocks, which are much larger then single packets, and pcaps contain ms resolution time stamps per packet, I'm not sure you'll ever see any savings. I think disk compression would probably work better. I tried a gzip of a 12G pcap raw file and the result was about 60% the size of the input, so 40% savings. Not sure what it would do to disk IO and obviously doesn't help with multiple taps seeing the same packets.
>

Hmm I see what you're saying... the "resolution" of the block is too course to actually realize any savings with block level dedup (unless you had some really tiny block sizes set up).