Senior DevOps Engineer (Production Support) in Chicago, IL

0 views
Skip to first unread message

Tushar Chauhan

unread,
9:54 AM (3 hours ago) 9:54 AM
to Tushar Chauhan

Senior DevOps Engineer (Production Support)
Chicago, IL

 

Role Summary

We are seeking a highly experienced Senior DevOps Engineer (Production Support) with deep expertise in AWS, Kubernetes, CI/CD, and cloud-native platforms. This role will focus on operating, stabilizing, and continuously improving production environments, ensuring high availability, performance, and scalability of mission-critical applications.

The ideal candidate is a hands-on DevOps/SRE professional who thrives in fast-paced production environments and can automate, troubleshoot, and optimize distributed systems at scale.

You will work extensively with AWS, Kubernetes (Rancher), Jenkins, GitHub, Terraform, Kafka, Harness, and Python while partnering with engineering, platform, and product teams.

  

Key Responsibilities

Production Operations & Reliability

Provide L2/L3 production support for cloud-native applications running on AWS and Kubernetes.

Own incident triage, root cause analysis (RCA), and resolution for high-severity production issues.

Participate in on-call rotations and drive post-incident improvements.

Improve system reliability, resilience, and observability using SRE best practices.

  

AWS & Cloud Infrastructure

Design and operate scalable AWS environments using:

EC2, EKS, VPC, ALB/NLB

S3, RDS, DynamoDB

IAM, CloudWatch, EventBridge

Optimize cloud cost, performance, and security posture.

Implement multi-account, multi-region architectures.

  

Kubernetes & Container Platforms

Manage and operate Kubernetes clusters (Rancher-managed or EKS).

Troubleshoot:

Pod failures

Resource constraints

Networking issues (CNI, ingress)

Stateful workloads

Improve:

Autoscaling strategies

Cluster resilience

Deployment reliability

  

CI/CD & Developer Enablement

Design and maintain CI/CD pipelines using:

Jenkins

GitHub Actions

Harness (preferred)

Implement:

Blue/green and canary deployments

GitOps workflows

Automated rollbacks

Enable developer self-service deployment platforms.

  

Infrastructure as Code & Automation

Build and maintain infrastructure using:

Terraform (primary)

Python automation

Develop reusable:

IaC modules

Platform templates

Deployment accelerators

Automate provisioning, scaling, and recovery workflows.

  

Kafka & Streaming Platforms

Design and manage Kafka infrastructure including:

Clusters, topics, brokers

Producers/consumers

Schema evolution

Ensure:

High availability

Throughput optimization

Secure connectivity

Integrate Kafka with AWS and Kubernetes ecosystems.

  

Observability & Platform Health

Implement monitoring and alerting using:

CloudWatch / Splunk Observability

Define:

SLIs/SLOs

Alerting thresholds

Runbooks

Proactively identify bottlenecks and prevent outages.

  

Security & Compliance

Implement DevSecOps best practices:

Secrets management

IAM least privilege

Container scanning

Supply chain security

Ensure infrastructure adheres to security and compliance standards.

  

Collaboration & Continuous Improvement

Partner with development teams to:

Improve deployment maturity

Reduce operational toil

Increase automation coverage

Drive:

Platform standardization

Developer experience improvements

Operational excellence initiatives

  

Qualifications

Experience

4 - 10 years in DevOps / SRE / Production Support roles

Strong experience managing production-grade cloud environments

Proven track record handling live incident management

  

Technical Skills

Must Have

AWS (deep hands-on)

Kubernetes (EKS/Rancher)

Splunk

Terraform

Jenkins / GitHub

Kafka

Python or Shell scripting

Linux systems expertise

  

Good to Have

Harness CI/CD

GitOps (ArgoCD/Flux)

Service mesh (Istio/Linkerd)

Observability tools (New Relic, Datadog, Prometheus)

Platform engineering mindset

  

Soft Skills

Strong troubleshooting and debugging mindset

Excellent communication during incidents

Ability to work in high-pressure production environments

Ownership-driven and automation-first approach



Reply all
Reply to author
Forward
0 new messages