Senior AI-ML LLM Quality Engineer - Austin TX and Sunnyvale CA - Note: local to Sunnyvale, CA. Must have Ex-Apple.

0 views

Skip to first unread message

Rohit K

unread,

Sep 3, 2025, 11:51:34 AM (2 days ago) Sep 3

to Rohit K

Hi Vendors,

Note Must be local to Austin TX or Sunnyvale CA

Job Title: Senior AI-ML LLM Quality Engineer

Location: Austin TX and Sunnyvale CA

Duration: Contract

Role Description:

Must Have Skills:
1. Strong experience in Python scripting, REST-APIs, YAML
2. Good hands-on testing experience with Gen AI / ML products / evaluating LLMs within a large-scale enterprise environment
3. Experience with LLM Testing Tools – LangSmith, Promptfoo
4. Strong understanding of LLM behavior
5. Proficiency with PyTest, Selenium or similar frameworks
6. Strong experience with testing automation – be able to guide customers on relevant technology and automation strategy.

Nice To Have Skills:
1. Experience with Testing Frameworks
2. Experience testing RAG, LLM agent systems
3. Familiarity with LangChain, LlamaIndex, or Haystack
4. Knowledge of AI/ML model evaluation metrics
5. Experience with RED Teaming is a plus but not mandatory
6. Familiarity with AWS cloud platform and MLOps tooling (e.g., MLfloW etc.)

Technical/Functional Skills Key Responsibilities
• Support testing and validation of Large Language Model (LLM)-powered applications.
• Help implement test strategies, execute evaluation workflows, and assist in model performance validation across diverse generative AI use cases.
• Design and execute test cases for Gen AI / ML features and user workflows
• Develop automated test frameworks to evaluate LLM outputs for accuracy, bias and safety
• Perform manual and automated test execution on APIs and LLM-integrated user interfaces.
• Conduct end-to-end testing of integrated generative AI solutions, including APIs, data pipelines, and user interfaces
• Collaborate with ML engineers to validate fine-tuned models and optimize prompts for target scenarios
• Analyze model failures, edge cases, and adversarial inputs to identify risks and improvement areas
• Benchmark LLM performance against industry standards and product-specific KPIs
• Strong analytical skills for dissecting model behavior, statistical performance, and failure modes
• Familiarity with AWS cloud platform and MLOps tooling (e.g., MLfloW etc.)
• Collaborate with product managers to convert requirements into test cases and test data
• Write automation scripts to simulate user behavior and backend interactions
• Document test plans, test reports, and AI evaluation metrics

Reply all

Reply to author

Forward

0 new messages