Hi Vendors,
Note Must be local to Austin TX or Sunnyvale CA
Job Title: Senior AI-ML LLM Quality Engineer
Location: Austin TX and Sunnyvale CA
Duration: Contract
Role Description:
Must Have Skills:
1. Strong experience in Python scripting, REST-APIs, YAML
2. Good hands-on testing experience with Gen AI / ML products / evaluating LLMs within a large-scale enterprise environment
3. Experience with LLM Testing Tools – LangSmith, Promptfoo
4. Strong understanding of LLM behavior
5. Proficiency with PyTest, Selenium or similar frameworks
6. Strong experience with testing automation – be able to guide customers on relevant technology and automation strategy.
Nice To Have Skills:
1. Experience with Testing Frameworks
2. Experience testing RAG, LLM agent systems
3. Familiarity with LangChain, LlamaIndex, or Haystack
4. Knowledge of AI/ML model evaluation metrics
5. Experience with RED Teaming is a plus but not mandatory
6. Familiarity with AWS cloud platform and MLOps tooling (e.g., MLfloW etc.)
Technical/Functional Skills Key Responsibilities
• Support testing and validation of Large Language Model (LLM)-powered applications.
• Help implement test strategies, execute evaluation workflows, and assist in model performance validation across diverse generative AI use cases.
• Design and execute test cases for Gen AI / ML features and user workflows
• Develop automated test frameworks to evaluate LLM outputs for accuracy, bias and safety
• Perform manual and automated test execution on APIs and LLM-integrated user interfaces.
• Conduct end-to-end testing of integrated generative AI solutions, including APIs, data pipelines, and user interfaces
• Collaborate with ML engineers to validate fine-tuned models and optimize prompts for target scenarios
• Analyze model failures, edge cases, and adversarial inputs to identify risks and improvement areas
• Benchmark LLM performance against industry standards and product-specific KPIs
• Strong analytical skills for dissecting model behavior, statistical performance, and failure modes
• Familiarity with AWS cloud platform and MLOps tooling (e.g., MLfloW etc.)
• Collaborate with product managers to convert requirements into test cases and test data
• Write automation scripts to simulate user behavior and backend interactions
• Document test plans, test reports, and AI evaluation metrics