Role:
Site Reliability Engineer
Location:
2320 Ball Drive, St Louis, MO 63146 United States
Length: 6
months contract
NO OPT, CPT and EAD’s
and need locals or nearby candidates
Description:
Site
Reliability Engineering (SRE) is a discipline that combines software and
systems engineering for building and running large-scale, distributed,
fault-tolerant systems. SRE ensures that internal and external services meet or
exceed reliability and performance expectations while adhering to Equifax
engineering principles.
SRE is also
an engineering approach to building and running production systems – we
engineer solutions to operational problems. As SREs are responsible for overall
system operation, we use a breadth of tools and approaches to solve a broad set
of problems. Practices such as limiting time spent on operational work,
blameless post-mortems, and proactive identification and prevention of
potential outages.
SRE's
culture of diversity, intellectual curiosity, problem solving, and openness is
key to its success. Our organization brings together people with a wide variety
of backgrounds, experiences and perspectives. We encourage them to collaborate,
think big, and take risks in a blame-free environment. We promote
self-direction to work on meaningful projects, while we also strive to create
an environment that provides the support and mentorship needed to learn, grow,
and take pride in our work.
Responsibilities
·
Engage in and improve the software development
lifecycle – from inception and design, through development, deployment,
operation and refinement
·
Influence and design infrastructure,
architecture, standards and methods for large-scale systems
·
Support services prior to production via infrastructure
design, software platform development, load testing, capacity planning and
launch reviews
·
Maintain services during deployment and in
production by measuring and monitoring key performance and service level
indicators including availability, latency, and overall system health
·
Automate system scalability and continually work
to improve system resiliency, performance and efficiency
·
Practice sustainable incident response as part
of an on-call rotation and through blameless postmortems
·
Remediate tasks within corrective action plan
via sustainable, preventative, and automated measures whenever possible
Minimum
qualifications:
·
BS degree in Computer Science or related
technical field involving coding (e.g., physics or mathematics), or equivalent
practical experience
·
Experience developing and/or administering
software in public cloud infrastructure as a service (IaaS), platform as a
service (PaaS), or micro services
·
Experience in monitoring infrastructure and
application uptime and availability to ensure functional and performance
objectives.
·
Experience in languages such as Python, Ruby,
Bash, Java, Go, Perl, javascript and/or node.js
·
Demonstrable cross-functional knowledge with
systems, storage, networking, security and databases
·
System administration skills, including
automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt
Stack and/or containers (Docker, Kubernetes, etc.)
·
Proficiency with continuous integration and
continuous delivery tooling and practices
·
Strong analytical and troubleshooting skills
Preferred
qualifications:
·
Expertise designing, analyzing and
troubleshooting large-scale distributed systems.
·
Systematic problem-solving approach, coupled
with strong communication skills and a sense of ownership and drive
·
Experience managing Infrastructure as code via
tools such as Terraform or CloudFormation
·
A passion for automation with a desire to
eliminate toil whenever possible
·
Experience building software or maintaining
systems in a highly secure, regulated or compliant industry
·
Experience and passion for working within a
DevOps culture and as part of a team
Thanks.,
Rama Krishna
Mettapalli
ASAP
Solutions Inc
Rmett...@myasap.com