AI DATA ENGINEER/ Machine Learning Engineer ||St. Louis MO|| USC or GC

3 views
Skip to first unread message

Adarsh Kumar

unread,
Nov 20, 2025, 12:41:36 PM (2 days ago) Nov 20
to Recruiting Simplifies

Role: AI DATA ENGINEER/ Machine Learning Engineer 

LOCATION:  St. Louis MO

DURATION:  6-12 months

Visa: GC/USC

 

 

Data Engineer – AI Systems (Databricks)

 

We’re building intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, and more). As a Data Engineer, you’ll design and optimize the data pipelines that transform raw and unstructured content into clean, AI-ready datasets for machine learning and generative AI agents.

You’ll collaborate with a cross-functional team of Machine Learning Engineers, Software Developers, and domain experts to create high-quality data foundations that power Databricks-native AI agents and retrieval systems.

 

Key Responsibilities

·       Develop Scalable Pipelines: Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake.

·       Data Integration: Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems.

·       Unstructured Data Handling: Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools.

·       Data Modeling: Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores.

·       Data Quality & Observability: Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy.

·       Collaboration: Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services.

·       Performance & Automation: Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices.

 

What You Bring

·       Experience with data engineering, ETL development, or data pipeline automation.

·       Proficiency in PythonSQL, and PySpark.

·       Hands-on experience with DatabricksSpark, and Delta Lake.

·       Familiarity with data APIsJSON, and unstructured data processing (OCR, text extraction).

·       Understanding of data versioningschema evolution, and data lineage concepts.

·       Interest in AI/ML data pipelinesvector databases, and intelligent data systems.

 

Bonus Skills

·       Experience with vector databases (e.g., Pinecone, Chroma, FAISS) or Databricks’ Vector Search.

·       Exposure to LLM-based architecturesLangChain, or Databricks Mosaic AI.

·       Knowledge of data governance frameworksUnity Catalog, or access control best practices.

·       Familiarity with REST API development or data synchronization services (e.g., Airbyte, Fivetran, custom connectors).


Adarsh

Reply all
Reply to author
Forward
0 new messages