Job Title: Azure Databricks Consultant (with RAG Experience)
Location: Richardson, TX (Hybrid)
Work Authorization: US Citizens or Green Card Holders Only
Duration: 12+ Months (Contract)
Position Overview
We are seeking a highly skilled Azure Databricks Consultant with strong expertise in PySpark, Kafka, Azure Cosmos DB, and hands-on experience implementing RAG (Retrieval-Augmented Generation) pipelines. The ideal candidate will architect, build, and optimize cloud-based data engineering solutions that support enterprise-scale analytics and AI-driven applications. This role requires deep technical capability in streaming ingestion, ETL development, and integrating big data systems within the Azure ecosystem.
Key Responsibilities
- Design, develop, and optimize scalable ETL and data pipelines in Azure Databricks using PySpark.
- Ingest, process, and transform high-volume streaming data from Kafka for real-time and batch analytics use cases.
- Build integrations and write high-performance data flows to Azure Cosmos DB, ensuring optimized query and storage patterns.
- Implement and support RAG-based data processing flows, including vector creation, embeddings, retrieval optimization, and integration with LLM pipelines.
- Collaborate with engineering, data science, and analytics teams to gather requirements and deliver robust, scalable data solutions.
- Monitor, tune, and manage Databricks clusters to ensure high performance, cost efficiency, and operational reliability.
- Troubleshoot pipeline issues, improve workflow efficiency, and enforce data quality and governance standards.
- Document technical solutions, provide knowledge transfer, and support production deployments.
Required Skills & Experience
- Strong, hands-on experience with Azure Databricks and PySpark for large-scale data engineering and ETL pipelines.
- Expertise with Kafka for streaming ingestion, data processing, and ensuring pipeline reliability.
- Proficiency with Azure Cosmos DB, including data modeling, indexing, tuning, and integration with distributed systems.
- Experience implementing RAG (Retrieval-Augmented Generation) workflows: vector storage, embedding pipelines, retrieval optimization, and integration with LLM-based applications.
- Solid understanding of the Azure cloud ecosystem, including networking, storage, compute, and hybrid environments.
- Ability to work in cross-functional teams, troubleshoot complex data issues, and deliver high-quality solutions in fast-paced environments.