"Sr Azure Data Engineer (Databricks)" in Phoenix AZ (Hybrid)

0 views

Skip to first unread message

roy garg

unread,

Dec 5, 2025, 8:25:59 AM12/5/25

to roy garg

Hello

Hope you are doing well!
We are looking for a "Sr Azure Data Engineer (Databricks)" in Phoenix AZ (Hybrid)

Title: Sr Azure Data Engineer

Location: Phoenix, AZ (Hybrid)

Req Skill: Azure Databricks, Python, SQL, ETL, Data lakes

Exp Level- 10-12+ years

Senior Data Engineer

We are looking for a seasoned Senior Data Engineer to architect, build, and maintain scalable data ecosystems—encompassing batch, streaming, and event-driven ingestion, transformation, storage, and delivery. The ideal candidate has deep skills in Python and SQL, distributed computing frameworks, lake-house architectures, orchestration tooling, and strong governance and data engineering fundamentals. You will own end-to-end data workflows, make architectural decisions, and ensure the system’s reliability, performance, and observability.

Core Responsibilities

Data Ingestion & Pipeline Architecture
Design and implement ETL/ELT pipelines for batch, streaming, and event-driven data sources (e.g., API payloads, message queues, CDC).
Leverage distributed compute engines (e.g., Apache Spark, Databricks, or equivalents) to process large-scale data.
Storage Layer & Lake House Architecture
Design and maintain a modern data ecosystem using lake/lakehouse patterns: object storage (e.g., S3, Azure Data Lake), table formats like Delta Lake, Apache Iceberg, or equivalents.
Ensure efficient partitioning, compaction, versioning, and time-travel capabilities for large tables.
Orchestration & Workflow Management
Build workflows via orchestration tools such as Apache Airflow, DBT (for transformation), Prefect, or similar.
Design workflows with retry logic, alerting, SLA monitoring, and dependency management for batch and streaming jobs.
Data Governance, Quality & Metadata
Define and enforce schema design, versioning, and evolution best practices.
Implement metadata management and lineage capture to provide traceability of data flow end-to-end.
Build a data-quality framework (e.g., checks on completeness, accuracy, freshness, anomaly detection) and automated monitoring.
Collaborate with stakeholders on data definitions, data cataloging, and compliance (especially critical in domains like healthcare).
System Reliability, Performance & Scalability
Architect the data platform for high availability, fault-tolerance, and efficient compute and storage cost management.
Optimize job execution: tuning Spark jobs, caching strategies, shuffle minimization, partitioning optimization, and resource scaling.
Ensure observability (metrics, logging, job monitoring), and incident management for data pipeline failures.

Required Skills & Experience

Expert proficiency in Python and SQL, including complex queries, window functions, CTEs, stored procedures, and integration with data frameworks.
Extensive experience with distributed data processing frameworks (Spark, Databricks, Flink, or equivalent) in production at scale.
Strong experience building ETL/ELT pipelines for both batch and streaming/event-based ingestion, and handling event-driven architectures.
Hands-on experience in designing and managing a data lake or lakehouse architecture, using technologies such as S3/ADLS + Delta Lake/Iceberg or comparable.
Solid experience deploying and managing orchestration platforms (Airflow, DBT, Prefect) for scheduling, dependency management, and monitoring of data workflows.
Practical experience in implementing governance: schema evolution, metadata capture, lineage tracing, data quality monitoring and alerting.
Strong system-design capability: able to independently design a scalable, reliable data platform, make architecture decisions, and operate with minimal supervision.
Excellent communication and collaboration skills—able to partner with data scientists, analysts, and business stakeholders to translate data needs into engineering solutions.

What You’ll Work On

End-to-end data ecosystems powering analytics, real-time dashboards, and ML model inputs.
Scalable ingestion of high-volume event-streams and API raw payloads into transformable, query-ready tables.
Lakehouse architecture enables both ad-hoc analysis and structured model training.
Data-quality, lineage and governance frameworks enabling trust in the data for downstream teams.
Full lifecycle: ingestion, transformation, storage, orchestration, monitoring, and delivery.

Thank You

Rahul

ra...@ccube.com

Reply all

Reply to author

Forward

0 new messages