Location: Boston, MA - Hybrid (as needed) - The customer is seeking local candidates only (in and around Boston) to ensure the consultant can visit the office when required.
We are seeking a highly experienced Senior Data Engineer / Data Architect with deep expertise in Databricks, Snowflake, and Azure cloud data platforms. The ideal candidate will have extensive experience designing and implementing scalable data pipelines, Lakehouse architectures, and real-time data processing solutions, particularly in regulated domains such as Life Sciences or Healthcare.
This role requires strong proficiency in Spark (PySpark), Delta Lake, Medallion architecture, and cloud-native data engineering practices, along with a solid background in data warehouse modernization and performance optimization.
________________________________________
Key Responsibilities
• Design and implement end-to-end data engineering pipelines using Azure Databricks, ADLS Gen2, and Snowflake.
• Develop scalable ETL/ELT pipelines using PySpark, Spark SQL, Python, and Talend.
• Build and maintain Lakehouse architecture using Delta Lake and Medallion (Bronze, Silver, Gold) layers.
• Implement real-time and batch data ingestion pipelines, including streaming using Spark Structured Streaming.
• Design and enforce data governance, access control, and lineage using Unity Catalog.
• Optimize Spark workloads through partitioning, caching, broadcast joins, and cluster tuning to improve performance and reduce cloud costs.
• Architect and manage CI/CD pipelines using Azure DevOps, Jenkins, and Git for automated deployments.
• Integrate multiple data sources and systems, ensuring high-quality, reliable, and scalable data delivery.
• Collaborate with cross-functional teams including data analysts, scientists, and business stakeholders to support analytics and reporting needs.
• Support data warehouse modernization initiatives, including migration from legacy systems to cloud platforms.
________________________________________
Required Qualifications
• 10+ years (ideally 15–20+) of experience in data engineering or data architecture.
• Strong expertise in:
o Databricks & Delta Lake
o Snowflake Data Warehouse
o Apache Spark (PySpark, Spark SQL)
• Hands-on experience with Azure Cloud (ADLS Gen2, Azure Databricks, ADF).
• Proficiency in Python and SQL for data engineering.
• Experience with ETL/ELT tools such as Talend or Informatica.
• Strong knowledge of data modeling, CDC (Change Data Capture), and incremental loading techniques.
• Experience working in Linux/Unix environments with shell scripting.
________________________________________
Preferred Qualifications
• Knowledge of data governance, compliance, and regulatory standards (e.g., IDMP).
• Exposure to real-time data streaming technologies (Kafka, Kinesis).
• Experience with multi-cloud environments (AWS, GCP).
• Familiarity with workflow orchestration tools such as Airflow or Databricks Workflows.
________________________________________
Key Skills
• Data Engineering & Architecture
• Lakehouse & Data Warehousing
• Spark Performance Optimization
• Cloud Data Platforms (Azure)
• ETL/ELT Pipeline Development
• Data Governance & Security
• CI/CD & DevOps Practices
________________________________________
Education
• Bachelor’s degree in Information Technology, Computer Science, or a related field.
________________________________________
Nice-to-Have Traits
• Strong analytical and problem-solving skills
• Ability to work in enterprise-scale, complex environments
• Experience working with global stakeholders and cross-functional teams
• Leadership capability with mentoring experience