Hi Vendors,
Position - Senior Site Reliability Engineer
(SRE)
Location: Naperville, IL
(Local Candidates Only)
Duration: 6 Months Contract
Employment Type: Contract
Rate: $55/hr on C2C
Experience: 10+ Years
Job Description
We are seeking a
highly experienced Senior Site Reliability Engineer (SRE) / Application
Reliability Engineer to support mission-critical enterprise applications in
a 24x7 production environment. The ideal candidate will have extensive
experience in production support, incident management, AWS cloud technologies,
application reliability, and enterprise operations. This role requires strong
troubleshooting skills, a proactive approach to system reliability, and the
ability to collaborate with cross-functional teams to ensure maximum
application availability and operational excellence.
Responsibilities
- Ensure high availability, reliability, and
performance of enterprise applications.
- Monitor production applications, batch
processes, and scheduled jobs to maintain operational continuity.
- Lead and coordinate major production incidents
(P1/P2) and drive timely resolution.
- Perform Root Cause Analysis (RCA) and
implement permanent corrective actions.
- Maintain compliance with SLA, SLO, and ITIL
best practices.
- Design, enhance, and maintain monitoring
dashboards, alerts, and observability solutions.
- Troubleshoot application, infrastructure, and
database issues using SQL queries, log analysis, and monitoring tools.
- Support production releases, deployments,
change validations, and post-release verification.
- Participate in disaster recovery planning,
testing, and business continuity activities.
- Collaborate with Infrastructure, Database,
Development, Cloud, and Operations teams to resolve production issues.
- Create and maintain operational documentation,
runbooks, troubleshooting guides, and knowledge base articles.
- Drive continuous improvement initiatives to
enhance application reliability, operational efficiency, and automation.
Required Qualifications
- 10+ years of experience in Site Reliability
Engineering, Production Support, or Application Support.
- Strong experience managing enterprise
production environments.
- Hands-on experience with AWS services
including EC2, S3, and VPC.
- Strong knowledge of Incident Management,
Problem Management, and Change Management.
- Experience performing Root Cause Analysis
(RCA) for production issues.
- Strong understanding of SLA/SLO management and
ITIL processes.
- Experience with Oracle and SQL Server
databases.
- Strong SQL scripting and query optimization
skills.
- Experience with GitHub and CI/CD processes.
- Working knowledge of Java and SQR.
- Experience with ServiceNow and Jira.
- Strong knowledge of UNIX, Linux, and Windows
operating systems.
- Experience with application monitoring, log
analysis, and observability tools.
- Excellent analytical, troubleshooting, and
communication skills.
- Ability to work in a fast-paced 24x7
production support environment.
Preferred Qualifications
- Experience supporting Banking, Financial
Services, or Insurance (BFSI) applications.
- Experience supporting ERP applications.
- Knowledge of cloud-native application
reliability best practices.
- Experience with automation and operational
excellence initiatives.
Education
- Bachelor's degree in Computer Science,
Information Technology, Engineering, or a related field.