Site Reliability Engineer SRE – ML platform - Austin, TX OR Sunnyvale, CA

0 views

Skip to first unread message

unread,

Aug 8, 2025, 10:07:08 AM8/8/25

to Shekhar K

Title: Site Reliability Engineer SRE – ML platform

Location: Austin, TX OR Sunnyvale, CA

Duration : Long Term

Note: Focus is to have 60% SRE and 40% ML Ops

Job Description :

Continuous Deployment using GitHub Actions, Flux, Kustomize
Design and implement cloud solutions, build MLOps on cloud AWS
Data science model containerization, deployment using docker, VLLM, Kubernetes
Communicate with a team of data scientists, data engineers and architects, document the processes
Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
Knowledge of ML models and LLM

Qualifications:

6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS.
Good understanding of Apache SOLR.

Note: Focus is to have 60% SRE and 40% ML Ops

Skill Area	Includes	Weight (%)
Platform Reliability & Containerization	Kubernetes, Docker, Microservices, Linux	30%
MLOps & AWS Cloud	Model deployment, versioning, monitoring, AWS (SageMaker, S3, Lambda, EKS)	25%
CI/CD & GitOps	GitHub Actions, Flux	15%
Monitoring & Observability	Splunk, Grafana, Prometheus, performance tracking	15%
Integration & Collaboration	Python scripting, API integrations, Apache Solr, LLM awareness, teamwork with data scientists & engineers	15%

Proficient with Linux administration.
Knowledge of ML models and LLM.
Ability to understand tools used by data scientists and experience with software development and test automation
Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
Experience working with cloud computing and database systems
Experience building custom integrations between cloud-based systems using APIs
Experience developing and maintaining ML systems built with open-source tools
Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
Experience developing containers and Kubernetes in cloud computing environments
Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
Ability to translate business needs to technical requirements
Strong understanding of software testing, benchmarking, and continuous integration
Exposure to machine learning methodology and best practices
Good communication skills and ability to work in a team

Share the Resumes & Below Details to my Official email id sek...@transreach.com only

Legal Name (First/Last):
Phone (Primary and secondary):
Candidate Email:
Current Location (City, State):
Work Authorization / Visa Status :
Interview Availability:
LinkedIn URL:
Education Details ( Bachelors / Masters , University Name , Location, Year of Pass out ) :
Availability once Confirmed:
Total years of Work Experience in USA :
Over All Years of Work Experience :
Open to Relocate ( Yes / No ) :
Expected Hourly Bill Rate on C2C :

Thanks & Regards,

Shekhar

Talent Acquisition Group

197 Route 18 South #3000 East Wing, East Brunswick, NJ 08816

Email: sek...@transreach.com

Web : www.transreach.com

Reply all

Reply to author

Forward

0 new messages