Principal AI Data Architect
(AI-Ready Data Platform ·
ML/LLMOps · Agentic AI Infrastructure · Governance & Security )
Location: Remote
Contract
ABOUT THE ROLE
We are hiring a Principal Data Architect — a hands-on, senior individual
contributor who will design, build, govern, and evolve the single source of
truth that powers every AI initiative in our organisation. This platform will
serve as the foundational nervous system for conversational AI assistants,
dashboard intelligence, autonomous AI agents, RAG-powered applications,
predictive ML models, and any AI product we build today or in the future.
You will architect the system, drive implementation, own the data
contracts that agents and AI applications depend on, enforce security and
access governance for both human and agent consumers, and continuously monitor
and improve the accuracy and reliability of AI outputs that flow from this
platform. You are the person who ensures our AI systems are only as good as the
data beneath them — and you make that data exceptional.
WHAT YOU'LL OWN
1. AI-Ready Data Platform — The Single Source of Truth
- Architect and own the enterprise AI data platform
— the unified, governed layer that ingests, transforms, stores, and serves
all data consumed by AI systems across the organisation.
- Design multi-domain data models (lakehouse, data
mesh, event-driven) that are structured from day one to serve AI
workloads: clean lineage, versioned schemas, well-documented contracts,
and low-latency serving APIs.
- Own the full data stack: real-time streaming
(Kafka, Spark Structured Streaming), batch processing (Databricks,
PySpark, Delta Lake), cloud storage and compute (AWS, Azure), and data
quality / metadata management.
- Ensure this platform is the single, authoritative
data source for all downstream consumers — conversational AI, dashboard
assistants, autonomous agents, ML models, and reporting — eliminating data
silos and conflicting truths.
- Drive modernisation of legacy pipelines (on-prem
ETL, batch DWH) to cloud-native, AI-ready architectures with measurable
improvements in cost, latency, and delivery velocity.
2. Semantic Models & Knowledge Layer
- Design the semantic layer that sits above raw
data — business-aligned ontologies, entity relationships, domain
taxonomies, and knowledge graphs — so AI systems understand context, not
just tokens.
- Build and maintain knowledge graphs (Neo4j or
equivalent) that capture relationships between business entities,
policies, KPIs, hierarchies, and domain rules — enabling structured
reasoning alongside unstructured retrieval.
- Define and govern a feature store and semantic
data contracts that serve both classical ML models and LLM-based
applications from a single, well-versioned, trusted source.
- Own metadata management, data lineage, and audit
trails across the semantic layer — ensuring every AI system can trace its
outputs back to source data with full accountability.
3. RAG, Vector & Retrieval Infrastructure
- Design the retrieval infrastructure that powers
RAG-based AI applications: embedding pipelines, vector stores (Pinecone,
FAISS, ChromaDB, OpenSearch), chunking strategies, and hybrid retrieval
layers combining semantic search with structured queries.
- Define the data contracts between the AI data
platform and retrieval consumers — ensuring consistent,
freshness-guaranteed, well-indexed data surfaces to RAG pipelines,
conversational AI, and agent tools.
- Architect retrieval systems that balance
precision, recall, latency, and cost — with clear evaluation benchmarks,
not just infrastructure defaults.
4. ML/LLMOps Infrastructure
- Own the ML and LLMOps data infrastructure:
training data curation pipelines, feature engineering, model registry,
experiment tracking (MLflow), automated evaluation, and production
monitoring.
- Build CI/CD pipelines for AI systems: automated
data validation, model quality gates, deployment automation, rollback
mechanisms, and production health dashboards.
- Design data infrastructure for LLM fine-tuning
workflows — training corpus curation, data quality filtering, RLHF
pipelines, and adapter management — ensuring models trained on this
platform reflect accurate, governed, domain-specific knowledge.
- Establish LLMOps best practices across the
organisation: versioning, A/B evaluation, shadow deployments, and canary
releases for AI model updates.
5. Multi-Consumer AI Serving Architecture
The platform must reliably serve a diverse set of AI consumers. You will
design the serving architecture and data contracts for each:
- Conversational AI Platforms — low-latency,
context-rich data APIs that power chatbots, voice assistants, and
enterprise copilots with accurate, fresh, source-grounded responses.
- Dashboard Assistants & BI Copilots — semantic
query layers and text-to-SQL infrastructure that allow natural language
interfaces to query structured business data accurately and safely.
- Autonomous AI Agents — structured tool APIs,
function-calling schemas, and memory/state data stores that agents depend
on for context retrieval, action execution, and multi-step reasoning.
- Predictive ML Models — feature pipelines,
training datasets, and real-time feature serving for classification,
forecasting, anomaly detection, and propensity models.
- Ad-hoc AI Experimentation — governed sandbox
environments where data scientists and AI engineers can access
production-equivalent data safely for research and prototyping.
6. Governance, Security & Access Control
- Design and enforce a comprehensive data
governance model that governs access for both human users and AI agents —
with role-based access control (RBAC), attribute-based policies, and
agent-specific permission scopes that prevent privilege escalation.
- Implement data security controls across the
platform: PII detection and masking, data classification, encryption at
rest and in transit, audit logging, and compliance alignment (SOX, GDPR,
SOC 2, AML/KYC, APAC regulations).
- Define agent data access boundaries — what data
an autonomous agent can read, write, modify, or delete — and enforce those
boundaries at the platform layer, not just at the application layer.
- Build data contracts and schema governance that
prevent upstream changes from silently breaking downstream AI
applications, with automated breaking-change detection and versioned
migration paths.
- Own regulatory and compliance readiness for all
AI data pipelines — ensuring audit trails, explainability artefacts, and
data provenance are available on demand.
7. Agentic Behaviour Observability & Output Accuracy
- Own the observability stack for AI agent
behaviour: instrument agents to capture inputs, retrieved context, tool
calls, reasoning traces, and outputs — creating a complete audit trail of
every agentic action driven by platform data.
- Design and operate evaluation frameworks that
continuously measure AI output quality: factual accuracy, context
faithfulness, retrieval relevance, hallucination rates, and task
completion success — across all AI consumers of the platform.
- Establish feedback loops between evaluation
signals and platform improvements: when agent outputs degrade, trace the
root cause to data freshness, retrieval failures, schema drift, or model
issues — and own the remediation.
- Define SLAs for AI output quality and data
freshness; build alerting and escalation frameworks that surface
platform-driven AI degradation before end users notice.
- Implement human-in-the-loop review workflows for
high-stakes agent actions — ensuring critical decisions have appropriate
oversight, audit records, and rollback capability.
8. Architecture Standards & Engineering Enablement
- Define and maintain the reference architecture
for the AI data platform — documenting design patterns, data contracts,
integration standards, and decision records (ADRs) that all engineering
teams follow.
- Establish data engineering standards: pipeline
testing frameworks, code review practices, CI/CD automation,
infrastructure-as-code (Terraform), reusable component libraries, and
observability instrumentation.
- Serve as the senior technical reviewer for all
data system designs that interact with the AI platform — ensuring
consistency, security, and quality across every integration point.
- Run internal architecture workshops, design
reviews, and enablement sessions to embed AI-ready data platform best
practices across data engineering and AI teams.
Must-Have Experience:
- 15+ years of hands-on data engineering and
architecture experience, with 3–5+ years building production AI/ML and
LLM-era data infrastructure.
- Proven experience designing enterprise-scale AI
data platforms that serve multiple AI consumers — not just one application
or pipeline.
- Deep expertise in lakehouse and data mesh
architectures: Databricks, Delta Lake, PySpark, Kafka, Spark Structured
Streaming, cloud-native data services (AWS, Azure).
- Hands-on experience with vector stores, semantic
models, knowledge graphs, and retrieval infrastructure in production
environments.
- Working knowledge of LLMOps: model serving
pipelines, MLflow, CI/CD for AI, automated evaluation, and production
monitoring.
- Strong background in data governance, security,
and compliance in regulated industries (financial services, payments,
cybersecurity, healthcare).
- Experience defining data access controls for AI
agents and automated systems — not just human users.
Technical Skills:
- Expert: Python, SQL, PySpark, Kafka, Databricks,
Delta Lake, Snowflake,AWS (S3, Glue, EKS, Bedrock, Kinesis, Redshift),
Docker, Kubernetes, Terraform, GitHub Actions.
- Strong: LangChain, LlamaIndex, LLM APIs (OpenAI,
AWS Bedrock, Claude, HuggingFace), vector databases (Pinecone, FAISS,
ChromaDB, OpenSearch), knowledge graphs (Neo4j).
- Solid: MLflow, FastAPI, CI/CD pipelines,
observability tooling (CloudWatch, Grafana, or equivalent), data lineage
and metadata management platforms.
The Right Mindset
- You think platform-first: every design decision
considers all current and future AI consumers, not just the one use case
in front of you.
- You are as comfortable in a governance design
session as you are debugging a broken embedding pipeline at 11pm.
- You believe agent safety starts with data — and
you design access controls, audit trails, and guardrails before the agents
are built, not after.
- You close the loop: you do not consider a feature
shipped until you have observability, evaluation, and a feedback mechanism
in place.
PREFERRED QUALIFICATIONS
- B. Tech / M. Tech / M.S. in Computer Science,
Information Technology, Data Engineering, or a related quantitative field.
- Experience in presales solutioning for large data
and AI programmes: RFP/RFI responses, SOW shaping, effort estimation,
CXO-level solutioning.
- Prior experience at a global financial
institution (payments, risk, AML, compliance) or enterprise SaaS — where
AI data infrastructure operated under strict regulatory oversight.
- Familiarity with emerging agent infrastructure
standards: MCP (Model Context Protocol), agent memory architectures,
multi-agent coordination frameworks (LangGraph, AutoGen, CrewAI).
- Experience designing observability and evaluation
systems specifically for agentic AI behaviour, not just traditional
software or ML model monitoring.
TECH STACK YOU'LL WORK WITH:
Databricks · Delta Lake · PySpark · Kafka · Spark Structured Streaming ·
Apache NiFi · Snowflake · AWS (S3, Glue, EKS, Bedrock, Kinesis, Redshift,
Lambda) · Azure · Kubernetes · Docker · Terraform · GitHub Actions · Jenkins ·
MLflow · LangChain · LlamaIndex · HuggingFace · OpenAI · AWS Bedrock · Claude ·
Pinecone · FAISS · ChromaDB · OpenSearch · Neo4j · FastAPI · Python · SQL · MCP
· LangGraph · Prompt Engineering · MLOps · CI/CD · Grafana / CloudWatch