Job Title : Site reliability engineer
Duration: 12 months
Visa: GC and USC preferred
Rate: Competitive, please keep it reasonable
Location: Remote
Cloud Infrastructure & Architecture
Design, implement, and manage scalable, secure, and highly available cloud infrastructure on AWS - infrastructure as code (IaC) using AWS CDK, CloudFormation, or Terraform, ensuring all environments are version-controlled and reproducible.
Architect multi-region and disaster recovery strategies that meet healthcare uptime requirements.
Manage containerized workloads using Docker and Kubernetes, optimizing for cost, performance, and resilience.
Site Reliability Engineering
Define, implement, and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) across all production services.
Build and maintain observability stacks (DataDog, AWS CloudWatch, Sentry) covering metrics, logs, traces, and alerting.
Lead incident response: triage, mitigate, and drive blameless post-incident reviews with actionable follow-ups.
Conduct capacity planning and performance engineering to ensure the platform scales ahead of demand.
Champion error budgets and use them to balance feature velocity with system stability.
Identify, assess, and mitigate operational risks by collaborating with engineering and product teams to evaluate impact and likelihood before they become incidents.
Participate in and help structure an on-call rotation, ensuring clear escalation paths and fair distribution of after-hours coverage.
Thanks & Regards
Rakhi Rajput
Senior Technical Recruiter
Email: Ra...@hexgs.com
Hexagon Global Services Inc.
15 Corporate PI S, Piscataway,NJ - 08854

If you prefer not to receive future emails of this nature, please reply with “Unsubscribe” in the subject line, and we will promptly remove you from our distribution list