Job
Title: Senior Observability Engineer (ESS Platform SME)
Location: McLean, VA & Plano, TX (onsite)
In person Interview (If you’re in local)
Job Type: C2C or W2
Role
Overview:
We
are seeking a highly experienced Senior Observability Engineer with deep
expertise in ESS (Elastic Stack) to lead and accelerate the development of
enterprise-grade observability capabilities across mission-critical
applications.
This
role requires a hands-on SME who can design, build, and scale
observability dashboards, APM, tracing, and monitoring solutions exclusively
within ESS. The candidate will play a key role in transforming current
monitoring into a proactive, intelligent, and scalable observability
ecosystem.
This
is a high-impact, fast-paced engagement (target < 6
months) requiring ownership, technical depth, and execution excellence.
Key
Responsibilities:
ESS
Observability Architecture & Implementation
- Design and
implement end-to-end observability solutions using ESS (Elastic
Stack).
- Build
a centralized observability layer covering all MF applications.
- Ensure block-level
aggregation with drill-down to:
- Application-level
metrics
- APM
traces
- Logs
and events
- Service
dependencies
Dashboard
Engineering (Critical Priority)
- Develop and scale
a large backlog of ESS dashboards, including but not limited to:
- Cluster
Health (OCP/K8s)
- API
& APM Dashboards
- Service
Health & Dependency Monitoring
- Pod
Status / Restart / Scaling Metrics
- HTTP
Status Analytics (200/400/500 trends)
- Transaction
Processing Metrics
- Infra
Metrics (CPU, Memory, Disk, Network)
- Synthetic
Monitoring & Availability
- Build intuitive,
drill-down dashboards from MF Block → Service → Application level.
APM,
Tracing & Monitoring Expansion
- Expand ESS-based:
- Application
Performance Monitoring (APM)
- Distributed
tracing
- Real
User Monitoring (RUM)
- Synthetic
monitoring
- Enable end-to-end
traceability across microservices.
Proactive
Observability & Alerting
- Design and
implement smart alerting rules:
- Move
from reactive → proactive detection
- Reduce
noise, improve signal quality
- Define SLOs, SLIs,
and error budgets
- Enhance anomaly
detection and trend analysis
Collaboration
& Leadership
- Work closely with:
- EOT
Observability Team
- Internal
CDLs
- Application
teams
- Act as ESS
Observability SME
- Provide guidance,
standards, and best practices
Required
Skills & Experience:
- Strong hands-on
experience with ESS (Elastic Stack):
- Elasticsearch
- Logstash
- Kibana
- Beats
/ Elastic Agent
- Elastic
APM
- Proven experience
building enterprise-scale observability dashboards in ESS
- Deep understanding of:
- Microservices
architecture
- Kubernetes
/ OpenShift (OCP)
- Experience with:
- APM,
distributed tracing, logging, metrics correlation
- Ability to
design multi-layer observability (infra → platform → app)
Strongly
Preferred:
- Experience with:
- Synthetic
monitoring tools integrated with ESS
- Real
User Monitoring (RUM)
- Service
maps and dependency graphs
- Knowledge of:
- CI/CD
observability integration
- Alerting
frameworks within Elastic
- Scripting: Python /
Shell / Groovy (nice to have)
Soft
Skills:
- Strong ownership
mindset
- Ability to work under
aggressive timelines
- Excellent
problem-solving skills
- Clear communication
with technical and non-technical teams
Success
Criteria (First 3–6 Months):
- Deliver enterprise-grade
ESS observability dashboards
- Achieve full MF
application visibility
- Implement end-to-end
APM + tracing coverage
- Establish proactive
alerting framework
Additional
Notes:
- Candidate must be
an ESS expert — alternative tools experience alone will not be
sufficient.
- This is
a high-priority, business-critical role with immediate impact
expectations.