Site Reliability Engineer SRE – ML platform - Austin, TX OR Sunnyvale, CA

0 views
Skip to first unread message

Shekhar K

unread,
Aug 8, 2025, 10:07:08 AM8/8/25
to Shekhar K
Title: Site Reliability Engineer SRE – ML platform

Location: Austin, TX OR Sunnyvale, CA

Duration : Long Term

 

Note: Focus is to have 60% SRE and 40% ML Ops

 

Job Description :

  • Continuous Deployment using GitHub Actions, Flux, Kustomize
  • Design and implement cloud solutions, build MLOps on cloud AWS
  • Data science model containerization, deployment using docker, VLLM, Kubernetes
  • Communicate with a team of data scientists, data engineers and architects, document the processes
  • Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
  • Knowledge of ML models and LLM

Qualifications:

  • 6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS.
  • Good understanding of Apache SOLR. 

Note: Focus is to have 60% SRE and 40% ML Ops

 

Skill Area

Includes

Weight (%)

Platform Reliability & Containerization

                                                  Kubernetes, Docker, Microservices, Linux

30%

MLOps & AWS Cloud

                   Model deployment, versioning, monitoring, AWS (SageMaker, S3, Lambda, EKS)

25%

CI/CD & GitOps

                                                                           GitHub Actions, Flux

15%

Monitoring & Observability

                                             Splunk, Grafana, Prometheus, performance tracking

15%

Integration & Collaboration

Python scripting, API integrations, Apache Solr, LLM awareness, teamwork with data scientists & engineers

15%

 

  • Proficient with Linux administration.
  • Knowledge of ML models and LLM.
  • Ability to understand tools used by data scientists and experience with software development and test automation
  • Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
  • Experience working with cloud computing and database systems
  • Experience building custom integrations between cloud-based systems using APIs
  • Experience developing and maintaining ML systems built with open-source tools
  • Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
  • Experience developing containers and Kubernetes in cloud computing environments
  • Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
  • Ability to translate business needs to technical requirements
  • Strong understanding of software testing, benchmarking, and continuous integration
  • Exposure to machine learning methodology and best practices
  • Good communication skills and ability to work in a team

 

Share the Resumes & Below Details to my Official email id sek...@transreach.com only 

 

  • Legal Name (First/Last):
  • Phone (Primary and secondary):
  • Candidate Email:
  • Current Location (City, State):
  • Work Authorization / Visa Status :
  • Interview Availability:
  • LinkedIn URL:
  • Education Details ( Bachelors / Masters , University Name , Location, Year of Pass out ) :
  • Availability once Confirmed:
  • Total years of Work Experience in USA :
  • Over All Years of Work Experience :
  • Open to Relocate ( Yes / No ) :
  • Expected Hourly Bill Rate on C2C :



Thanks & Regards,

Shekhar
Talent Acquisition Group
  
197 Route 18 South  #3000 East Wing, East Brunswick, NJ 08816 
Sent by a Verified sender
Reply all
Reply to author
Forward
0 new messages