DevOps/Site Reliability Engineer | Austin TX - Hybrid

1 view
Skip to first unread message

Sulthan recruiter

unread,
Apr 1, 2026, 9:45:00 AM (10 days ago) Apr 1
to Sulthan VJ
Please share your profiles to Sult...@nextgen-is.com

Only share Austin TX locals

Position: : Systems Analyst 3 (DevOps/Site Reliability Engineer

Location: Austin TX - Hybrid

Duration: 5 Months

Client: Texas Health and Human Services Commission – 529601671

 

Job Summary

We are seeking an experienced Systems Analyst 3 with a strong background in Site Reliability Engineering (SRE) and DevOps practices. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of production systems by applying software engineering principles to infrastructure and operations.

This role requires close collaboration with development teams to build resilient, observable, and automated systems that meet defined service level objectives.

 

Key Responsibilities

  • Ensure the availability, reliability, and performance of production systems
  • Design, implement, and maintain scalable and highly available distributed systems
  • Monitor system health using logging, monitoring, and alerting tools
  • Define and manage SLIs, SLOs, and error budgets
  • Perform incident management, root cause analysis (RCA), and postmortems
  • Collaborate with development teams to improve system architecture and performance
  • Automate infrastructure and operational processes using scripting and DevOps tools
  • Implement containerization and orchestration solutions using Docker and Kubernetes
  • Integrate security and compliance requirements into system operations
  • Develop and maintain documentation, including runbooks and operational procedures

 

Required Qualifications

  • Minimum 8 years of experience in Systems Engineering, DevOps, or Site Reliability Engineering
  • Strong expertise in Linux/Unix systems and system internals
  • Proficiency in one or more programming or scripting languages such as Python, Go, Java, or Bash
  • Experience with cloud platforms such as AWS or GCP
  • Hands-on experience with containerization and orchestration tools (Docker, Kubernetes)
  • Strong understanding of monitoring, logging, and alerting concepts
  • Experience working with highly available, distributed systems
  • Experience with incident management and root cause analysis
  • Knowledge of integrating security and compliance into operational workflows

 

Preferred Qualifications

  • Experience with observability tools such as Prometheus, Grafana, Datadog, Splunk, or Application Insights
  • Experience supporting 24x7 production environments and on-call rotations
  • Familiarity with chaos engineering and resiliency testing
  • Experience with feature flags, canary deployments, and progressive delivery
  • Strong documentation and communication skills

 

Work Environment

  • Hybrid work model with onsite presence required in Austin, TX (2 days per week)
  • Standard business hours with potential need for after-hours or weekend support
  • Candidates must be local to Texas
Reply all
Reply to author
Forward
0 new messages