Only Locals
Senior DevOps Engineer (Cloud Platforms)
Location: Atlanta, GA- Day 1 Onsite
Contract
supporting AI Automation and Operations Teams and Enterprise Data Platform Operations Teams
Building a multi-agent SRE (Site Reliability Engineering) platform on an on-premises Google Anthos Kubernetes footprint. The platform automates event detection, topology mapping, and root cause analysis across
enterprise data services such as Kafka, OpenSearch, and PostgreSQL. The DevOps engineer in this role joins an established platform team responsible for the build, deployment, and ongoing operation of those services and the AI agents that depend on them.
This is a hands-on, build-oriented role. The engineer is expected to design and implement platform components, not solely operate existing ones.
What you'll do
-
Design, build, and maintain Helm charts, Kubernetes manifests, container images, sidecars, and init containers for production agentic and data-platform services
-
Develop new platform components where needed — log-processing sidecars, configuration bootstrappers, regression-test harnesses, custom controllers — contributing code in Python, Go, or shell
-
Troubleshoot Python application services end-to-end, including reading code, reproducing issues, and contributing fixes back to application repositories
-
Own CI/CD pipelines in Jenkins (or comparable) tied to Bitbucket Data Center; promote services through development, QA, and production environments
-
Serve as the senior technical advisor to operations and platform support engineers on Kubernetes platform management, deployment process, and GitOps configuration strategy
-
Maintain and patch existing platform code as part of regular operational work
-
Mentor mid-level engineers and review intended platform chart, pipeline, and code changes for quality, security and performance.
What you bring
-
5+ years of senior or lead DevOps experience in a large enterprise environment (telecom, financial services, healthcare, or similar)
-
Production Kubernetes experience, preferably with on-premises distributions such as Google Anthos, Red Hat OpenShift, Rancher, or vanilla kubeadm
-
Strong Helm 3 skills, including umbrella charts, sub-charts, and dependency management
-
Strong container engineering skills, including authoring and tuning sidecar and init containers for production services
-
Working Python development skills — able to read, debug, modify, and extend Python services in Flask or FastAPI
-
Experience with GitOps pipelines, Bitbucket or GitHub Enterprise, and an enterprise artifact registry such as JFrog Artifactory, Nexus, or Harbor
-
Experience with HashiCorp Vault and OIDC-based Kubernetes RBAC
-
Experience instrumenting workloads against an APM and observability stack such as New Relic, OpenTelemetry, OpenSearch, or Elasticsearch
-
A demonstrated record of shipping platform components — not solely deploying and configuring third-party software
-
Demonstrated ability to design and scale platform solutions — sound architectural judgment as systems grow from initial deploy to production load and multi-team use
-
Working familiarity with AI-assisted development tooling, particularly Claude Code (Anthropic), for routine code authoring, debugging, and infrastructure work
Bonus experience
-
Skaffold for multi-cluster developer workflows
-
Kubernetes operator development (Kubebuilder, Operator SDK)
-
Azure AD / Entra ID OIDC integration into Kubernetes
-
Prior experience supporting an AI/ML or agentic platform on Kubernetes (Temporal, vector databases, MCP servers)
-
Incident response and runbook authoring experience
Working environment
The role operates within an established GitOps framework with documented branching, promotion, and chart conventions. The engineer is expected to advise the broader team on the right place within that framework
for a given change, contribute to the platform handbook, and propagate patterns rather than introduce one-off solutions.
Regards,
➤
Connecting exceptional talent with exceptional opportunities..