Role: Senior Site Reliability Engineer (SRE)
Location: Remote
Positions: 2
Experience: 10–12+ Years
Visa: Only Visa Independent
Mandatory Requirements (Screening Filter)
- Strong Go (Golang)
programming – must be hands-on and very strong
- Strong Kubernetes (K8s)
experience
- Candidate must be comfortable
in coding + automation
- LinkedIn ID + DL Copy
mandatory
- Candidate photo/screenshot
required during submission
Job Summary
We are
looking for a Senior SRE with strong expertise in AWS, Kubernetes, and
Golang, focused on building reliable, scalable, and automated systems.
Key Responsibilities
Reliability & Performance
- Design monitoring &
alerting systems using CloudWatch, Grafana, Prometheus, Datadog, ELK
- Maintain SLIs/SLOs,
error budgets, and system performance
- Implement auto-scaling,
health checks, and self-healing systems
- Perform RCA &
post-incident reviews
Automation & DevOps
- Build infrastructure using Terraform,
Ansible, CloudFormation
- Develop CI/CD pipelines (GitHub
Actions, GitLab CI, Jenkins)
- Manage workloads in Kubernetes,
ECS, EKS, Lambda
- Implement blue/green,
canary deployments & automated rollbacks
Incident Management
- Participate in 24/7
on-call rotation
- Reduce MTTD & MTTR
using automation
- Create runbooks &
operational playbooks
Security & Compliance
- Implement secure DevOps
practices
- Ensure compliance with ISO
27001, SOC 2
- Manage IAM, secrets,
networking securely
Collaboration
- Work with developers on scalable
& reliable system design
- Drive DevOps & SRE
best practices
- Contribute to platform
improvements
Required Skills
- 6+ years in SRE / DevOps
/ Infrastructure
- Strong AWS: EC2,
EKS/ECS, RDS, Lambda, S3, IAM, VPC
- Hands-on IaC tools:
Terraform, Ansible, CloudFormation
- Observability tools: Prometheus,
Grafana, CloudWatch, ELK, Datadog
- Programming: Go (must),
Python/Bash/PowerShell
- Strong Networking (DNS,
Load Balancing)
- Strong troubleshooting
& RCA skills
Preferred Skills
- Certifications: AWS / CKA
/ SRE Foundation
- Experience in Chaos
Engineering
- Knowledge of SLI/SLO/Error
Budgets
- Experience with multi-region
/ hybrid architectures
- Exposure to regulated
environments (SOC2, HIPAA, GDPR)