Job Title: Site Reliability Engineer (SRE)
Location: Detroit, MI (Onsite)
Type: Contract
Visa: H1B/H4EAD/TN/L2/GC-EAD
The Work
• Collaborate with cross-functional teams to design, build, and maintain robust, scalable, and fault-tolerant systems
• Work closely with development teams and architects to advocate for reliability best practices during the application development lifecycle
• Design and implement monitoring and alerting to provide real-time visibility into user experience and system health and performance
• Monitor and analyze system performance, proactively identifying potential issues and implementing solutions to ensure optimal performance and reliability
• Develop and maintain automated tools and processes to streamline operational tasks and reduce manual interventions
• Participate in incident response and post-mortems, contributing to continuous improvement efforts
• Conduct capacity planning and resource optimization to handle growing demands on our infrastructure
• Continuously research and evaluate new technologies and practices to enhance the reliability and efficiency of our systems
• Conduct capacity planning and resource optimization to handle growing demands on our infrastructure
• Continuously research and evaluate new technologies and practices to enhance the reliability and efficiency of our systems
The Skills You Bring
• Bachelor\'s degree in Computer Science, Engineering, or related fields preferred (or equivalent practical experience)
• Strong verbal and written communication skills
• Experience of overall 4-8 years of managing an SRE or DevOps team with observability workload.
• 4-8 years of Agile Management owning SRE roadmaps and deliverables using Scrum / Kanban
• 4-8 years of delivering projects alongside a constant flow of side intake and production response workloads
• Experience presenting to leadership and collaborate effectively/communicate technical concepts to non-technical business stakeholders
• Proven 5+ years\' experience as a Site Reliability Engineer or similar role in a production environment
• Applied AWS/Cloud Certification (AWS Cloud Architect, DevOps/SysOps) including experience with ASG, Fargate, Lambda, Aurora DB, Dynamo DB, ALB/NLB
• 5+ years\' working experience with CI/CD pipelines (Gitlab) and developing infrastructure-as-code (Terraform, Python, Ansible, etc.)
• Applied experience with Linux and Windows platforms, Java EE, JavaScript, Spring, Spring Boot, REST API/Micro Services, Shell Scripting, Python, PL/SQL, and databases, specifically Oracle
• Working knowledge of observability platforms like Splunk, Dynatrace
• Working experience with designing Observability for enterprise applications
• Experienced knowledge of system administration, DevSecOps
• Development experience along with cloud and physical servers
• Understanding and experience working with business, product and engineering teams in developing SLI, SLO and SLA\'s
• Conduct capacity planning and resource optimization to handle growing demands on our infrastructure
Other Skills & Experience Desired
• Strong knowledge of Linux/Unix systems and network protocols
• Familiarity with cybersecurity best practices and principles
• Ability to lead triage calls including working across multiple divisions to resolve issues.
Thanks & Regards:-
Md Zahid
Technical Recruiter
Nityo Infotech Corp.
📧 Email: md.z...@nityo.com
🌐 Website: www.nityo.com
🔗 LinkedIn: linkedin.com/in/md-zahid-3241a6290