Please share with me suitable profiles.
Role:
Senior AIOps ML EngineerLocation: Woodland Hills, CA
Descriptions:
"Core Responsibilities
Lakehouse Architecture & Data Engineering
• Schema Design: Design and evolve the Lakehouse schema (Delta Lake / Apache Iceberg) for multi-domain observability data at petabyte scale.
• Pipeline Engineering: Build and maintain robust ingestion pipelines from the OTel Collector through Kafka to the Lakehouse, ensuring exactly-once semantics and strict schema enforcement.
• Data Transformation: Implement dbt transformation models to generate mart-ready, denormalized fact and dimension tables for each of the six domains.
• Data Quality Governance: Define and enforce data quality contracts, establishing SLAs for data freshness, completeness, and cardinality budgets per mart.
• Performance Optimization: Optimize query performance utilizing partitioning strategies, Z-ordering, bloom filters, and materialized views tailored for time-series patterns.
ML Model Development & AIOps
• AIOps Modeling: Design, train, and deploy machine learning models for streaming multivariate anomaly detection, root-cause analysis, and incident forecasting across all six mart domains.
• Streaming Inference: Build low-latency streaming inference pipelines (Flink / Spark Streaming) for real-time anomaly scoring on APM, infrastructure, and security signals.
• Log Intelligence: Develop sophisticated log intelligence models—including clustering (DRAIN3 / LogBERT), NLP classification, and error deduplication—over the Log mart.
• Behavioral Analytics: Implement unsupervised and semi-supervised methods for User Experience frustration detection and KPI correlation analysis.
• Feature Store Management: Own the ML feature store, managing feature engineering, versioning, backfill pipelines, and point-in-time correct joins for training datasets.
• Model Lifecycle MLOps: Instrument model performance tracking, including drift detection, accuracy monitoring, and automated retraining triggers.
AIOps Platform & Productionization
• Workflow Orchestration: Design and operate the end-to-end AIOps workflow, spanning signal ingestion, feature computation, model inference, alert routing, and auto-remediation hooks.
• Model Serving Infrastructure: Build high-performance model serving infrastructure—supporting real-time REST/gRPC endpoints and async batch scoring—with strict p99 latency SLOs.
• Incident Tool Integration: Integrate AIOps insights with incident management platforms (PagerDuty, Opsgenie) and internal runbooks to deliver enriched, noise-reduced alerting.
• Business Impact Quantification: Define and publish metrics from the Business KPI mart to quantify the blast radius, revenue loss, and affected user counts for each incident.
Security & Compliance Observability
• Security Mart Collaboration: Partner with the Security team to build the Security mart schema, including threat feed ingestion, UEBA baselines, and CVE correlation pipelines.
• Threat Detection: Train anomalous-access and lateral-movement detection models, tuning precision/recall thresholds in collaboration with the SOC team.
• Compliance & Governance: Ensure all data handling across the marts adheres strictly to data residency requirements, PII masking standards, and audit-log protocols.
Collaboration & Engineering Standards
• Schema Contracts: Define telemetry schema contracts with the OTel Instrumentation team to guarantee high upstream signal quality for downstream ML models.
• Organizational Standards: Author ML platform RFCs and contribute actively to observability data model standards across the broader engineering organization.
• Mentorship & Reviews: Mentor junior ML and data engineers, and conduct rigorous design reviews for new mart schemas and model architectures."
✅ Kafka + Streaming (Flink/Spark)
✅ Lakehouse (Delta / Iceberg)
✅ ML (Anomaly detection + time-series)
✅ Observability (OTel, APM, Logs)
✅ MLOps (feature store, drift, retraining)
✅ SQL + Python (strong)
Skills: AI Agents
Experience Required: 10 & Above