Hi Folks.
My
client is looking for an AI Infrastructure Platform Engineer for a 5-month contract role based in Charlotte, NC (Local Only)
Position: AI Infrastructure Platform Engineer
Location: Charlotte, NC (Local Only)
Duration: 5 Month Contract
In This Role, You Will
- Lead complex infrastructure initiatives supporting Generative AI and Predictive AI platforms from design to production operations.
- Serve as a technical lead for platforms supporting AI/ML model training, inference, and batch workloads.
- Design, build, deploy, and operate OpenShift-based container platforms optimized for high-performance GPU workloads.
- Build, support and operate scalable GPU SuperPod architecture with large multi-node GPU clusters.
- Own monitoring, alerting, and observability using Grafana, Splunk, and enterprise telemetry tools.
- Define SLIs/SLOs and build actionable alerts to proactively detect performance, capacity, and resiliency risks.
- Build AI- and agent-based automation tools for self-healing, scaling, diagnostics, and incident remediation.
- Apply AIOps techniques to reduce alert fatigue and improve platform reliability.
- Lead production incident analysis and ensure operational rigor and root-cause prevention.
- Mentor engineers and influence stakeholders across a geographically distributed organization.
Required Qualifications
- 5+ years of infrastructure engineering experience.
- 5+ years troubleshooting complex end-to-end architectures(including CI/CD pipeline).
- 5+ years Linux systems experience.
- 4+ years supporting AI/ML platforms.
-
4+ years of Kubernetes / container platform experience including production
support.
Thanks
Sid