Hi,
Momento USA is a global technology consulting, talent
acquisition, and creative development firm that addresses clients' most
pressing needs and challenges. We are currently looking for a Sr AI
DevOps Engineer - San
Jose, CA (On-site)-Please let me know if you are interested.
Position: Sr AI DevOps Engineer
Location:
San Jose, CA (On-site)---Need local’s
Duration: 6 Months+
Exp Range : 7 to 9 Years
- Experience with Generative AI, LLM Ops,
RAG architectures, and AI platform engineering.
- Knowledge
of NVIDIA GPU infrastructure and CUDA-based deployments.
- Experience with Kubernetes-based AI
platforms such as Kubeflow and KServe.
IAC
Kubernetes
Docker
Terraform
Ansible
AI/ML
MLOps workflows
Note : Look for resource who is good with hands on
Coding
Position Summary
We are seeking an experienced AI DevOps Engineer to
design, implement, and maintain scalable infrastructure and deployment
pipelines for AI/ML applications. The ideal candidate will have strong
expertise in Kubernetes, Docker, Infrastructure as Code (IaC), cloud platforms,
and CI/CD automation. This role will be responsible for enabling reliable,
secure, and efficient deployment of AI/ML workloads across development,
testing, and production environments.
Key Responsibilities
- Design,
deploy, and manage cloud-native infrastructure supporting AI/ML
applications.
- Build and maintain Kubernetes clusters
for scalable container orchestration.
- Develop and manage Docker containers for
AI/ML services and microservices.
- Implement Infrastructure as Code (IaC)
using tools such as Terraform, CloudFormation, or Pulumi.
- Create and optimize CI/CD pipelines for
automated deployment of AI/ML models and applications.
- Collaborate with Data Scientists, ML
Engineers, and Software Developers to operationalize machine learning
workflows.
- Monitor system performance, availability,
and security across cloud environments.
- Implement logging, monitoring, and
observability solutions using tools such as Prometheus, Grafana, ELK, or
Datadog.
- Automate infrastructure provisioning,
configuration management, and application deployments.
- Manage cloud resources and optimize
infrastructure costs.
- Ensure compliance with security best
practices and organizational standards.
- Troubleshoot production issues and
provide operational support for AI platforms.
Required Qualifications
- Bachelor's degree in Computer Science,
Information Technology, Engineering, or a related field.
- 5+ years of DevOps, Platform Engineering,
or Cloud Engineering experience.
- Hands-on experience with Kubernetes
administration and container orchestration.
- Strong experience with Docker and
containerized application deployment.
- Expertise in Infrastructure as Code (IaC)
tools such as Terraform, CloudFormation, or Pulumi.
- Experience building and maintaining CI/CD
pipelines using Jenkins, GitHub Actions, GitLab CI, Azure DevOps, or
similar tools.
- Proficiency
in scripting and automation using Python, Bash, or PowerShell.
- Experience with Linux system
administration.
- Strong understanding of networking,
security, and cloud architecture principles.
Preferred Qualifications
- Experience supporting AI/ML platforms and
MLOps workflows.
- Hands-on experience with Kubeflow,
MLflow, Airflow, or similar MLOps tools.
- Experience deploying Large Language
Models (LLMs), Generative AI applications, or AI inference workloads.
- Knowledge of GPU-enabled Kubernetes
environments and AI infrastructure.
- Experience with vector databases and
AI-serving platforms.
- Relevant cloud certifications (AWS,
Azure, or GCP).
- Kubernetes certifications (CKA, CKAD, or
CKS).
Technical Skills
Containerization & Orchestration
- Docker
- Kubernetes
- Helm
- OpenShift (preferred)
Infrastructure as Code
- Terraform
- CloudFormation
- Pulumi
- Ansible
Cloud Platforms
- AWS
- Microsoft Azure
- Google Cloud Platform (GCP)
CI/CD & Automation
- Jenkins
- GitHub Actions
- GitLab CI/CD
- Azure DevOps
Monitoring & Logging
- Prometheus
- Grafana
- ELK Stack
- Datadog
Programming/Scripting