So-elastic cluster abstraction

63 views
Skip to first unread message

Rob B

unread,
Dec 6, 2017, 7:14:30 PM12/6/17
to security-onion
Please advise if there is an install procedure - from iso that allows elk stack to be built on external instances, rather than the so-master.

Any ideas Wes or Doug?

Via so docker images?

Thx.


Wes

unread,
Dec 6, 2017, 7:30:14 PM12/6/17
to security-onion
Rob,

Are you referring to installing the Elastic components on a completely separate machine (without Security Onion)?

Thanks,
Wes

Rob B

unread,
Dec 6, 2017, 7:43:51 PM12/6/17
to security-onion
Wes,

Yes, I would like SO to communicate to an external stack, which I would like to build as split brain tolerant.

Is it simply a matter of running the SO docker containers on separate instances, editing the .yml configs, then pointing the data from SO master to the external elk stack?

Not sure where to configure within SO.
Not sure what installation to choose from the iso install.


Thanks.

Doug Burks

unread,
Dec 7, 2017, 6:20:33 AM12/7/17
to securit...@googlegroups.com
On Wed, Dec 6, 2017 at 7:43 PM, Rob B <robb...@gmail.com> wrote:
> Wes,
>
> Yes, I would like SO to communicate to an external stack, which I would like to build as split brain tolerant.
>
> Is it simply a matter of running the SO docker containers on separate instances, editing the .yml configs, then pointing the data from SO master to the external elk stack?

Hi Rob,

Is there a particular reason why you don't want to use our standard
distributed deployment model? It should not experience split brain as
we don't do any replication. Each sensor has its own independent
Elasticsearch database and the master server simply queries all
Elasticsearch instances remotely using cross cluster search.

--
Doug Burks

Rob B

unread,
Dec 7, 2017, 8:53:14 AM12/7/17
to security-onion
Doug,

Need more fault tolerance for the shards. A multiple master and multi data node model minimum three with shard relplica default of 5 will automatically recover and keep data available in the event an index fails.

Looks like there is no indexer fault tolerance designed in. Did I miss something? Another suggestion to satisfy my need?

Brant Hale

unread,
Dec 7, 2017, 9:06:13 AM12/7/17
to securit...@googlegroups.com
Some thoughts and questions on this (not very much Elastic Static production experience)

Is it common to lose indexes in ES?   Does this happen without a disk error or some unexpected fault?

I am somewhat interested in having an archive of sorts where some logs are saved for a longer time - does it make sense to have a seperate ES box that is queried by the master server as well.   Perhaps you could make this instance a cluster?


The distributed queries allows resources to be added at roughly the same rate as you add data which also reduces the amount of sensor to master communications which has really worked well for the current version of SO.  It would be hard to move the current master server which is relatively light weight compared to the sensors and change it to a large cluster of machines.  I like the fault tolerance, but right now i would likely end up running more docker ES rather than deploy more physical machines.  




--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onion+unsubscribe@googlegroups.com.
To post to this group, send email to security-onion@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Rob B

unread,
Dec 7, 2017, 9:21:26 AM12/7/17
to security-onion
Yes thanks to java garbage, indexes can crash, amongst performance considerations like swaps and merges, which is where having a cluster and not losing data availability in a massive ingest environment is needed.

At our rate of ingest and query models, we need three data nodes minimum - currently running six, two of which can be masters, and a search balancer. Cold Retention/query is also a requirement which is where a clustered master model is most needed.

Any suggestions?

Brant Hale

unread,
Dec 7, 2017, 10:40:45 AM12/7/17
to securit...@googlegroups.com
Wow, that really blows up the number of machines needed!  So it sounds like Elastic is a little bit lossy.   It seems to be problematic to have so many instances running.   With internal issues on indexes does it make sense to run more but smaller docker instances of ES on a sensor?



Doug Burks

unread,
Dec 7, 2017, 10:52:08 AM12/7/17
to securit...@googlegroups.com
First, please avoid using words like "garbage" as this may be
considered inflammatory and doesn't help the conversation.

Second, I'll note that NSM traditionally has been seen as "best
effort" meaning that it's not mission critical like a file server or
web server would be. Of course, we want to achieve as much
availability as possible, but we do have to balance that with hardware
costs, administrative overhead, etc.

What is your actual rate of ingest? How do you know you need three
data nodes? Could you make that three Security Onion sensors all
managed by the same Security Onion master server and searchable via
cross cluster search?

Finally, if you decide that you do need a full Elastic cluster with
high availability, in theory it should be possible to configure
external Elasticsearch instances to join our default Elasticsearch
instance to form a cluster, but it's not something that we've tried or
can support at this time.
> --
> Follow Security Onion on Twitter!
> https://twitter.com/securityonion
> ---
> You received this message because you are subscribed to the Google Groups "security-onion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
> To post to this group, send email to securit...@googlegroups.com.
--
Doug Burks

Rob

unread,
Dec 7, 2017, 11:13:39 AM12/7/17
to security-onion
Doug,

Thanks. The word "garbage" was used accurately scientific in nature.
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html

Elasticsearch runs on Java, and Java is a garbage collected language. Which means you'll run into memory management problems.
Memory is divided in 2 parts: what you allocate to the Java heap space, and everything else. Elasticsearch does not rely on Java heap only. For example, every thread created within the thread pool allocates 256Kb of off-heap memory.
The basic thing to understand with heap allocation is: the more memory you give it, the more time Java spends garbage collecting.
Elasticsearch comes with Concurrent Mark Sweep as a default garbage collector. https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html

CMS runs multiple concurrent threads to scan the heap for objects that can be recycled. The main problem with CMS is how it might enter “stop the world” mode in which the JVM becomes unresponsive until it is restarted. The main reason for stop the world is when the application has changed the state of the heap while CMS was running concurrently, forcing it to restart from scratch until it has all the objects marked for deletion. Let's put it this way: CMS performs very poorly when the heap is over 4GB, which is almost always the case with Elasticsearch.

Java 8 brings a brand new garbage collector called Garbage First, or G1, designed for heaps greater than 4GB. G1 uses background threads to divide the heap into 1 to 32MB regions, then scan the regions that contain the most garbage objects first. Elasticsearch and Lucene does not recommend using G1GC for many reasons, one of them being a nasty bug on 32bits JVM that might lead to data corruption. From an operational point of view, switching to G1GC was miraculous, leading to no more stop the world and only a few memory management issues.
That said, choosing the right amount of memory to fill in the heap is the most touchy part of designing an Elasticsearch cluster. Whatever you pick up, never allocate more than 31GB to the heap.

As you had suggested Doug, the PoC architecture I am building here is in fact made of three sensors and two additional external ES instances which I will tie into the SO ES master. This PoC I will keep you updated upon findings.

Thank you for confirming my thoughs with your suggestion.


R.B.

Doug Burks

unread,
Dec 7, 2017, 11:24:11 AM12/7/17
to securit...@googlegroups.com
Yes, I'm familiar with "garbage collection". :) It's probably best
to explicitly use the term "garbage collection" to avoid any namespace
collisions! :)

Rob

unread,
Dec 7, 2017, 11:28:14 AM12/7/17
to security-onion
Indeed! ;)

Thanks again... Much appreciated.

Doug Burks

unread,
Dec 7, 2017, 11:45:18 AM12/7/17
to securit...@googlegroups.com
On Thu, Dec 7, 2017 at 11:13 AM, Rob <rba...@netorian.com> wrote:
> As you had suggested Doug, the PoC architecture I am building here is in fact made of three sensors and two additional external ES instances which I will tie into the SO ES master. This PoC I will keep you updated upon findings.

If I understand this statement correctly, you would then only have
replication for the data that's on the Security Onion master server
and NOT the data that's on the three Security Onion sensor boxes. Is
that your intention?



--
Doug Burks
Reply all
Reply to author
Forward
0 new messages