Job Title: SRE - Site Reliability Engineer
Location: Sunnyvale, CA
Duration: 12 months
Job Summary:
Experience in Linux and Python, Shell scripting.
Experience of maintaining production systems on AWS and/or GCP.
Experience of Kubernetes clusters maintenance, managing and debugging containerized applications (Golang, Java, Python).
Understanding Kafka, Spark, Storm, Cassandra, ElasticSearch, PostgreSQL, Redis (Elasticache), Zookeeper, Nginx, AWS S3/GCP GS.
Understanding of infrastructure as code software (e.g. Terraform, AWS and Google Cloud Deployment, CloudFormation).
Experience in continuous integration practices & tools (Jenkins, Travis CI, CircleCI, etc.
Experience with monitoring solutions such as: CloudWatch, Stackdriver, Prometheus, Thanos, Graphite, Grafana, ELK, Alert Logic, Datadog.
Experience with logging service solutions.