Position: Site Reliability Engineer
Location : Santa Clara, CA
Duration : 1 Year +
Rate/Salary : Market Best
Job Description:
Responsibilities:
• Development and Operations (DevOps) subject matter expert for 24x7 SaaS operation
• Work hand-in-hand with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson's next generation TV platforms.
• Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention.
• Accountable for working upstream with micro service developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scale
• Point of escalation/decision maker on response level of incidents
• Participate in the Core SRE on-call roster and respond with command and control incident management during High Pri Events while maintaining internal and external SLAs
• Act as Technical Duty Officer who leads resolution effort of the most complex service problems from network layer to the application at scale
• Drive Problem Management/Retrospectives (“post mortems”)
• Strong contribution and maintenance of our knowledge base
• Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support.
• Contribute to the future growth of the team by conducting candidate screenings and assessments
• Accountable for deploying services to production environments
Technologies:
• Experience with Docker and SaltStack, Kubernetes orchestration tools, etc.
• Knowledge of MongoDB, Cassandra databases, Kafka, IIS Servers on Azure/AWS/Openstack
• Azure, Openstack and AWS concepts and APIs
• Experience designing, setting up and maintaining, refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Kibana, Grafana, Alertmanager, etc
• Demonstrable experience in one or more languages: Powershell, Python, BASH, C#, .NET
• Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP, load-balancers (such as NGINX), highly available microservice architecture, CDNs
• Team Foundation Server/Visual Studio, Atlassian suite (Jira, Confluence), Git
• Network analysis, performance and application issues using tcpdump, Fiddler and Wireshark.
Regards,
Kumar Vinay
vin...@eiferinc.com