Lead Data engineer with databricks and PySpark experience. Minnesota (MN) (Onsite) Long Term
With our extensive knowledge in application, data, and cloud engineering services, we strive to create groundbreaking solutions that provide real value to our clients.
Key Responsibilities
Design, develop, and maintain scalable data pipelines using Apache Spark (PySpark and/or Scala)
Build and optimize data workflows on Databricks, including Delta Lake, notebooks, and scheduled jobs
Ingest, transform, and curate large-scale structured and semi-structured datasets
Perform performance tuning and cost optimization of Spark workloads and Databricks clusters
Implement data quality checks, monitoring, and error handling
Collaborate with analytics and business stakeholders to deliver well-modeled, analytics-ready data
Support batch processing and, where applicable, streaming data pipelines
Follow best practices for testing, documentation, security, and version control
Requirements
Required Qualifications
10+ years of experience in Data Engineering or related roles
Strong hands-on experience with Apache Spark (PySpark or Scala)
Proven experience working in Databricks environments
Strong SQL skills and experience with relational and analytical databases
Experience building and maintaining ETL/ELT pipelines at scale
Familiarity with modern data lake architectures (Delta Lake preferred)
Experience with Git-based version control
Additional Requirements
Must be currently located in Minnesota (MN), or willing to relocate