Status Update: April 20

21 views
Skip to first unread message

Jon Haddad

unread,
Apr 20, 2019, 12:15:21 PM4/20/19
to TLP Dev Tools
Hey folks,

As we continue to build out the tlp-stress and tlp-cluster dev tools I'll try to provide status updates here, summarizing what's been worked on and what we'll be looking to do in the near future.  We'd love to get feedback on what we've done, as well as encourage folks in the community to pitch in on the work that's upcoming. 

For both repos, I've gone through the GitHub issues for each project, added tags and elaborated on some of the issues that were a little hazy.  

This week, the focus has been on tlp-cluster.  We've fixed a handful of bugs as well as added some features to make it easier to work on test clusters.

All servers now install a handful of diagnostic utilities by default.  While we have a lot of metrics available in Cassandra and prometheus / Grafana provisioned automatically with every cluster, sometimes it's nice to be able to log into a machine and watch in real time.  Going forward, every machine will have sysstat, dstat, iftop, ifstat and htop installed automatically.  We're also grabbing the async java profiler, for generating flame graphs.  Our long term intent is to create a single command which can be run from your laptop that will collect flame graphs from every node in the cluster.  This will be incredibly convenient when doing load tests under very specific conditions where timing is critical.  See https://github.com/thelastpickle/tlp-cluster/issues/32.

In addition to these utilities we're looking to improve our Grafana dashbaords as well as the information we supply.  We're now installing the Prometheus node_exporter on every machine, which allows Prometheus to collect CPU and Disk information.  Long term, we're planning on supplying better Grafana defaults.  We've currently exploring programmatically generating the dashboards, potentially with the Grafonnet tool: https://github.com/grafana/grafonnet-lib.  If you have worked with this tool in the past we'd love to hear your experience and get your input: https://github.com/thelastpickle/tlp-cluster/issues/78

The biggest patch for this week expanded the use for tlp-cluster from us-west-2 to every region in AWS.  Previously, we had hard coded the region in place.  Now we're taking data from AWS and the instance information from https://ec2instances.info/ to pick the right AZs and AMIs automatically for your chosen region.

In the process of doing this, we discovered that just because AWS says an AZ exists doesn't mean it's always available to use.  When provisioning a cluster, you might get an error like this:

aws_instance.cassandra.3: Error launching source instance: Unsupported: Your requested instance type (r3.2xlarge) is not supported in your requested Availability Zone (us-west-2d). Please retry your request by not specifying an Availability Zone or choosing us-west-2c, us-west-2a, us-west-2b.

To address this, we've added an --azs flag which can be used as a workaround when an AZ isn't available.  You can use it by supplying the letter of the AZ you want.  For example --azs abc.

We also fixed the way disks are mounted.  We had initially used instances with NVMe drives, and had coded the disk setup code to look for an nvme drive by default.  Now that we're using an r3 instance type, we adjusted the setup code to look first for xvdb then the nvme drive.  Either way - more instance types will work correctly.  

In the process of making tlp-cluster more flexible, we've found some limitations in AWS we'll need to address soon.  Specifically, we'll need to create a VPC for each cluster we launch.  There's a bit of work that will happen around this to make it work seamlessly, we'll be tracking that here: https://github.com/thelastpickle/tlp-cluster/issues/79

Finally, we've fixed a few bugs, improved user feedback on what next steps to take, and improved security by masking the secret key input when doing the initial profile setup.

Thanks everyone, and have a great weekend.

Jon


Reply all
Reply to author
Forward
0 new messages