Position:- Site Reliability Engineer SRE – ML platform
Location: Austin, TX and Sunnyvale, CA (Onsite)
Visa: USC/H4EAD/H1B
Job Summary
For this role, we are
looking for ML Ops Engineer with Kubernetes and Python.
Experience Required:
- 6 Plus years of experience in ML Ops
with strong knowledge in Kubernetes, Python, MongoDB and AWS.
Technical skills:
- Python, Kubernetes, Mongo DB,
Microservices, AWS
- SOLR
- ML operations, CI CD pipelines, LLM
- Good understanding of Apache SOLR
- Proficient with Linux administration.
- Knowledge of ML models and LLM.
- Ability to understand tools used by
data scientists and experience with software development and test
automation
- Ability to design and implement cloud
solutions and ability to build MLOps pipelines on cloud solutions (AWS, MS
Azure or GCP).
Qualifications:
- Experience working with cloud
computing and database systems
- Experience building custom
integrations between cloud-based systems using APIs
- Experience developing and maintaining
ML systems built with open-source tools
- Experience with MLOps Frameworks like
Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and
Kubernetes
- Experience developing containers and
Kubernetes in cloud computing environments
- Familiarity with one or more
data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo,
etc.)
- Ability to translate business needs
to technical requirements
- Strong understanding of software
testing, benchmarking, and continuous integration
- Exposure to machine learning
methodology and best practices
- Good communication skills and ability
to work in a team.