We are seeking an experienced Senior Site Reliability Engineer (SRE) with a strong background in designing, building, and operating enterprise-scale Kubernetes platforms within highly regulated environments such as FedRAMP High and DoD IL5. The ideal candidate will have deep expertise in cloud infrastructure, automation, observability, and compliance, with a passion for building reliable, secure, and scalable platforms.
Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent practical experience).
Minimum of 10 years of experience in Site Reliability Engineering (SRE), DevOps, Platform Engineering, or Infrastructure Engineering.
Hands-on experience designing, deploying, and managing production Kubernetes environments (EKS, AKS, GKE, OpenShift, or upstream Kubernetes).
Proven experience supporting FedRAMP High, DoD IL5, or other highly regulated government cloud environments.
Strong knowledge of cloud platforms, preferably AWS, with experience in Azure or Google Cloud considered a plus.
Solid understanding of Linux system administration, networking, DNS, load balancing, storage, and distributed systems.
Experience with Infrastructure as Code (IaC), preferably using Terraform.
Hands-on experience with CI/CD tools such as GitHub Actions, GitLab CI, Jenkins, or ArgoCD.
Proficiency in scripting or programming using Python, Go (Golang), or similar languages for automation.
Experience implementing and managing observability solutions using Prometheus, Grafana, OpenTelemetry, ELK Stack, or similar monitoring platforms.
Strong understanding of Site Reliability Engineering principles, including Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
Experience with incident management, root cause analysis, blameless postmortems, and production support in an on-call environment.
Working knowledge of security and compliance frameworks, including NIST 800-53, RMF, STIGs, and related government security standards.
Excellent troubleshooting, analytical, communication, and collaboration skills.
Experience with service mesh technologies such as Istio or Linkerd.
Familiarity with GitOps methodologies and tools such as ArgoCD or Flux.
Experience implementing Policy-as-Code using OPA/Gatekeeper or Kyverno.
Experience managing multi-cluster Kubernetes deployments and hybrid cloud environments.
Knowledge of FIPS-compliant systems and DoD Cloud Security Requirements Guide (SRG).
Experience supporting Authority to Operate (ATO) processes, audit readiness, and compliance documentation.
Industry certifications such as Certified Kubernetes Administrator (CKA), Certified Kubernetes Security Specialist (CKS), AWS Certified Solutions Architect, AWS Certified DevOps Engineer, Microsoft Azure certifications, Google Cloud certifications, or CompTIA Security+ are highly desirable.
Strong ownership mindset with a focus on reliability, automation, and operational excellence.
Ability to balance platform reliability, security, compliance, and developer productivity.
Excellent collaboration skills with software engineering, security, compliance, and operations teams.
Passion for reducing operational toil through automation and continuous improvement.
Strong problem-solving skills with the ability to thrive in fast-paced, mission-critical production environments.
Prasad
AB Tech Solutions
1604 Spring Hill Road, Suite 208, Vienna, VA 22182