Hi,
Please share profiles for the role below, including visa status, location, and LinkedIn ID.
LinkedIn ID should be before 2020
Job Title: Databricks Data Engineer with DevOps Skills
Location: Los Angeles, CA (Hybrid) Must be within 100miles
Hire Type: C2C
Rate : $60/hr. C2C H1B/H4/EAD
USC/GC W2 only(No exceptions)
Job Summary
We are looking for an experienced Databricks Data Engineer with strong DevOps expertise to join our data engineering team. The ideal candidate will design, build, and optimize large-scale data pipelines on the Databricks Lakehouse Platform on AWS, while driving automated CI/CD and deployment practices. This role requires deep expertise in PySpark, SQL, AWS cloud services, and modern DevOps tooling. You will collaborate closely with cross-functional teams to deliver scalable, secure, and high-performance data solutions.
Must Demonstrate – Critical Skills & Architectural Competencies
- Designing and implementing Databricks-based Lakehouse architectures on AWS
- Clear separation of compute vs. serving layers
- Ability to design low-latency data/API access strategies (beyond Spark-only patterns)
- Strong understanding of caching strategies for performance and cost optimization
- Data partitioning, storage optimization, and file layout strategy
- Ability to handle multi-terabyte structured or time-series datasets
- Skill in requirement probing and identifying architectural priorities
- A player-coach mindset — hands-on engineering combined with technical leadership
Key Responsibilities
1. Data Pipeline Development
- Design, build, and maintain scalable ETL/ELT pipelines using Databricks on AWS
- Develop high-performance data processing workflows using PySpark/Spark and SQL
- Integrate data from Amazon S3, relational databases, and semi/unstructured sources
- Implement Delta Lake best practices: schema evolution, ACID transactions, OPTIMIZE, ZORDER, partitioning, and file-size tuning
- Ensure architectures support high-volume, multi-terabyte workloads
2. DevOps & CI/CD
- Implement CI/CD pipelines for Databricks using Git, GitLab, GitHub Actions, or AWS-native tools
- Build and manage automated deployments using Databricks Asset Bundles
- Manage version control for notebooks, workflows, libraries, and environment configurations
- Automate cluster policies, job creation, environment provisioning, and configuration management
- Support infrastructure-as-code via Terraform (preferred) or CloudFormation
3. Collaboration & Business Support
- Work with data analysts and BI teams to prepare curated datasets for reporting and analytics
- Collaborate with product owners, engineering teams, and business partners to translate requirements into scalable implementations
- Document data flows, technical architecture, and DevOps/deployment workflows
4. Performance & Optimization
- Tune Spark clusters, workflows, and queries for cost efficiency and compute performance
- Monitor pipelines, troubleshoot failures, and maintain high reliability
- Implement logging, monitoring, and observability across workflows and jobs
- Apply caching strategies and workload optimization for low-latency consumption patterns
5. Governance & Security
- Implement and maintain data governance using Unity Catalog
- Enforce access controls, security policies, and data compliance requirements
- Ensure lineage, quality checks, and auditability across all data flows
Technical Skills
Databricks (Hands-On Experience Required):
- Delta Lake, Unity Catalog, Lakehouse Architecture
- Delta Live Pipelines, Databricks Runtime, Table Triggers, Databricks Workflows
Programming & Query Languages:
- PySpark, Apache Spark, Advanced SQL
AWS Cloud Services:
- S3, IAM, Glue/Glue Catalog, Lambda, Secrets Manager
- Kinesis (optional but beneficial)
DevOps & Automation:
- Git/GitLab, CI/CD Pipelines, Databricks Asset Bundles
- Terraform (preferred), CloudFormation
Other:
- Relational databases and data warehouse concepts
Preferred Experience
- Knowledge of streaming technologies such as Structured Streaming / Spark Streaming
- Experience building real-time or near real-time data pipelines
- Exposure to advanced Databricks runtime configurations and performance tuning
Certifications (Optional but Preferred)
- Databricks Certified Data Engineer Associate or Professional
- AWS Certified Data Engineer or AWS Solutions Architect