Required Skills & Experience
- Cloud‑Native Architecture:
Design and operate resilient, scalable Azure cloud‑native platforms aligned to enterprise standards and
RUN SLAs
- DevSecOps
& GitOps: Implement secure CI/CD and
GitOps pipelines with built‑in security, policy enforcement, and automated controls
- Cloud
Landing Zone & Policy Management:
Operate and govern Azure Landing Zones using Azure Policy, RBAC,
guardrails, and compliance automation
- Platform
& COE Tooling: Build
and support reusable COE accelerators, golden paths, templates, and
automation frameworks
- AIOps
& Observability: Enable
proactive monitoring, logging, alerting, and AIOps‑driven insights for platform reliability and incident
reduction
- FinOps: Embed cost governance, tagging, budgets, and
optimization practices into platform operations
- Cloud
Architecture (RUN‑focused): Translate
client‑approved architectures into
operable, supportable, and compliant Azure platforms
- Containers
& Kubernetes: Design, deploy, and
operate container platforms using Kubernetes, AKS, Docker, and Helm
- Infrastructure
as Code: Provision and manage
Azure infrastructure using Terraform and automated pipelines
- API
& Integration Platforms: Design
and support secure APIs and integrations using Azure API Management (APIM)
- Event
& Streaming Platforms: Support
cloud‑native messaging and streaming
solutions using Kafka and managed services
- Scripting
& Automation: Develop operational
automation using Python and platform SDKs
- Agile
& ITSM Alignment: Operate
within Agile delivery models while supporting ITSM, incident, change, and
problem management processes
Certifications:
- Microsoft
Certified: Azure Solutions Architect Expert (AZ‑305) - Required
- Microsoft
Certified: Azure Administrator Associate (AZ‑104) – Required
- AZ‑400 (DevOps Engineer Expert)
- AZ‑500 (Azure Security Engineer Associate)
- ITIL
4 Foundation
- Terraform
Associate
Responsibilities:
Azure Platform RUN Ownership
- Act
as L3/L4 escalation point for Azure platform incidents across IaaS, PaaS,
landing zones, and Terraform‑based deployments.
- Lead
root‑cause analysis (RCA) for P1/P2
incidents and drive permanent fixes through automation and design
improvements.
- Ensure
platform services meet availability, reliability, and performance SLAs.
Landing Zone & Governance
Operations
- Operate
and govern Azure Landing Zones, including RBAC models, Azure Policy,
network/security baselines, and compliance monitoring.
- Detect
and remediate configuration drift using policy‑as‑code and IaC controls.
- Maintain
operational RACI alignment across Platform, Security, FinOps, and Network
teams.
Infrastructure‑as‑Code
& Automation
- Design,
maintain, and review Terraform modules, CI/CD pipelines, and reusable
“golden paths” used in RUN operations.
- Ensure
provisioning, changes, and decommissioning follow approved automated
pipelines.
- Perform
senior‑level IaC and pipeline
conformance reviews.
Service Requests & Change
Governance
- Provide
architectural oversight for service requests, enhancements, and onboarding
of new Azure services.
- Support
cloud change governance processes and validate Low‑Level Designs (LLDs) for operational readiness.
- Ensure
changes are safe, auditable, and compliant within the managed services
model.
Security & Compliance
Support
- Implement
and operate Azure security controls (Azure Policy, RBAC, Conditional
Access, Key Vault).
- Support
security incidents, audit evidence requests, and remediation of compliance
findings in coordination with Security teams.
FinOps & Continuous
Improvement
- Partner
with FinOps teams to enforce cost guardrails, tagging standards, and
optimization actions.
- Drive
continuous service improvement through automation, reliability
engineering, and cost efficiency initiatives.
Contineous Service Improvement
- Automation‑Led Optimization: Continuously reduce manual
operational effort by automating Azure platform tasks using Python, Azure
SDKs, and REST APIs
- Self‑Healing Operations: Implement Agentic AI–driven
remediation workflows to auto‑detect, diagnose, and resolve recurring platform issues
- Proactive
Incident Reduction: Leverage AIOps and AI‑assisted analytics to identify patterns, predict
failures, and prevent incidents before impact
- IaC
Drift & Compliance Improvement: Use automation to detect and remediate
Terraform drift, configuration non‑compliance, and policy violations
- Operational
Observability Enhancement: Improve platform reliability through continuous
tuning of logging, metrics, alerts, and telemetry across Azure services
- Agentic
Runbook Automation: Convert manual runbooks into agent‑driven workflows for repeatable, zero‑touch execution of common operational tasks
- Cost
& Performance Optimization: Drive CSI through FinOps automation,
including rightsizing, scheduling, and cost anomaly detection
- API‑First Improvements: Enhance service responsiveness by
integrating Azure services using SDK‑based and event‑driven automation
- Intelligent
Change Execution: Apply AI‑assisted impact analysis and guardrails to reduce change‑related incidents and improve change success rates
- Continuous
Feedback Loop: Use operational data, AI insights, and platform KPIs to
prioritize CSI backlog and deliver measurable improvements sprint‑over‑sprint