Hi All,
Hope you all are doing good.
Role: AWS Observability Engineer
Location: NJ (Onsite)
Duration: Long Term
Please share the profiles having strong in Observability and monitoring experience only
Job Description:
· Design and develop observability and monitoring solutions by configuring and managing observability tools on AWS.
· Extensive hands-on experience with installing necessary agents on servers, virtual machines, and AWS-supported services, and forwarding logs to a centralized location with configured aggregators.
· Deep understanding of logs, metrics, and tracing services and their capabilities.
· Conduct assessments related to AWS platform and observability, providing detailed comparative analysis, cost metrics, advantages of various monitoring tools, and setting benchmarks.
· Develop and maintain documentation processes by creating templates for observability, including assessment checklists, questionnaires, presentations, and proof of concepts with a hands-on approach.
· Participate in sessions and workshops for clients and internal team members on observability, delivering high-quality presentations.
· Participate in solution design and proposal development activities.
· Proven experience in IT monitoring and observability with a focus on cloud environments.
Requirements:
Bachelor’s degree in computer science, Information Technology, or a related field.
At least 10 years of IT experience, with a minimum of 6 years focused on AWS Cloud with an emphasis on observability.
Extensive experience with AWS services and a thorough understanding of compute, storage, networking, security, and database services in the cloud such as EC2, S3, VPCs, Network Flow Logs, RDS, etc.
Expertise in AWS monitoring solutions such as Amazon CloudWatch and AWS CloudTrail.
Proficiency in AWS-supported monitoring solutions such as Dynatrace, AppDynamics, DataDog, and Sumo Logic.
Experience in monitoring infrastructure, APIs, microservices, JVMs, and RUM, with the ability to create necessary dashboards for visualization.
Experience integrating with notification systems and incident management systems.
Understanding of self-healing concepts and AIOps.
Proficiency in programming languages such as Python, Java, or Go.
Ability to work independently and as part of a team, demonstrating leadership when required.
Relevant cloud certifications are required.