DevOps/Site Reliability Engineer | Austin TX - Hybrid

1 view

Skip to first unread message

Sulthan recruiter

unread,

Apr 1, 2026, 9:45:00 AM (10 days ago) Apr 1

to Sulthan VJ

Please share your profiles to Sult...@nextgen-is.com

Only share Austin TX locals

Position: : Systems Analyst 3 (DevOps/Site Reliability Engineer

Location: Austin TX - Hybrid

Duration: 5 Months

Client: Texas Health and Human Services Commission – 529601671

Job Summary

We are seeking an experienced Systems Analyst 3 with a strong background in Site Reliability Engineering (SRE) and DevOps practices. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of production systems by applying software engineering principles to infrastructure and operations.

This role requires close collaboration with development teams to build resilient, observable, and automated systems that meet defined service level objectives.

Key Responsibilities

Ensure the availability, reliability, and performance of production systems
Design, implement, and maintain scalable and highly available distributed systems
Monitor system health using logging, monitoring, and alerting tools
Define and manage SLIs, SLOs, and error budgets
Perform incident management, root cause analysis (RCA), and postmortems
Collaborate with development teams to improve system architecture and performance
Automate infrastructure and operational processes using scripting and DevOps tools
Implement containerization and orchestration solutions using Docker and Kubernetes
Integrate security and compliance requirements into system operations
Develop and maintain documentation, including runbooks and operational procedures

Required Qualifications

Minimum 8 years of experience in Systems Engineering, DevOps, or Site Reliability Engineering
Strong expertise in Linux/Unix systems and system internals
Proficiency in one or more programming or scripting languages such as Python, Go, Java, or Bash
Experience with cloud platforms such as AWS or GCP
Hands-on experience with containerization and orchestration tools (Docker, Kubernetes)
Strong understanding of monitoring, logging, and alerting concepts
Experience working with highly available, distributed systems
Experience with incident management and root cause analysis
Knowledge of integrating security and compliance into operational workflows

Preferred Qualifications

Experience with observability tools such as Prometheus, Grafana, Datadog, Splunk, or Application Insights
Experience supporting 24x7 production environments and on-call rotations
Familiarity with chaos engineering and resiliency testing
Experience with feature flags, canary deployments, and progressive delivery
Strong documentation and communication skills

Work Environment

Hybrid work model with onsite presence required in Austin, TX (2 days per week)
Standard business hours with potential need for after-hours or weekend support
Candidates must be local to Texas

Reply all

Reply to author

Forward

0 new messages