About the Team

The Forward Deployed Team is our primary customer-facing unit, acting as the crucial bridge between our core product and our enterprise clients. This fast-moving team manages the end-to-end lifecycle of enterprise conversational AI deployments, from pre-sales Statements of Work (SOWs) through go-live.

We maintain daily, high-touch interactions with customers to communicate workflow progress, manage User Acceptance Testing (UAT) deadlines, integrate client feedback, and ensure that deployments are executed rapidly while maintaining rigorous quality standards.

Ellipsis Health is located in the San Francisco Bay Area, but we are open to remote candidates within the United States.

About the Role

As a Forward Deployed QA Engineer, you will occupy a critical, high-impact role dedicated to ensuring the reliability, stability, and quality of our core conversational AI product, Sage, across diverse client workflows.

This role bridges the gap between Quality Assurance, AI Engineering, and Production Operations. You will focus heavily on automated testing, prompt engineering validation, and rapid root cause analysis (RCA) of Large Language Model (LLM)-driven behaviors in fast-paced, real-world deployments.

Responsibilities:

Workflow Mapping & Test Case Generation: Deeply analyze assigned client workflows to design robust, comprehensive positive and negative test cases that safeguard system stability.
AI-Driven Test Automation: Build and execute automated test scenarios by configuring shadow agents.
Prompt Evaluation & Optimization: Apply a strong understanding of prompt awareness to draft, refine, and evaluate prompts used within the testing framework to accurately simulate user behaviors and edge cases.

End-to-End Testing Execution: Strategically deploy specific testing methodologies including Sanity, Smoke, Regression, and Functional testing - determining the exact environment (staging, pre-production, production) and timing for each execution.
Deployment Cadence & Cross-Functional Collaboration: Partner closely with engineering teams during release cycles to proactively identify, triage, and unblock technical roadblocks, ensuring the product is continuously deployment-ready.

Daily LLM Defect RCA: Perform rigorous, daily root cause analysis on LLM-specific failures inherent to generative AI, including hallucinations, high latency, and logic deviations.
Live Production Call Debugging: Investigate live customer calls and production incidents in real time to unblock critical production use cases.
Audio & Transcription Validation: Query and analyze historical call transcripts, system behaviors, and audio data pipelines to pinpoint where a conversational workflow broke down.Speech-to-Speech (S2S) Pipeline Monitoring: Monitor and evaluate the end-to-end voice AI pipeline. This involves analyzing Automatic Speech Recognition (ASR) accuracy, managing audio-to-text latency issues, and understanding general Speech-to-Speech mechanics alongside the stability of the core Knowledge Base feeding the AI.

Advanced Evaluation Frameworks: Maintain a strong conceptual understanding of advanced LLM evaluation paradigms and tools such as LLM-as-a-judge - to remain aware of how AI response quality and accuracy are programmatically graded at scale.
Telephony & Call Flow Awareness: Possess a foundational understanding of real-world call management and telephony routing concepts, including how the system is expected to navigate warm transfers, blind transfers, and voicemail detection workflows.

Qualifications:

Experience in QA Engineering: Strong background in software quality assurance, with a proven track record of designing, executing, and managing end-to-end test strategies (Smoke, Sanity, Regression, Functional).
LLM & Generative AI Expertise: Hands-on experience or deep technical familiarity with troubleshooting LLM behaviors, diagnosing hallucinations, managing latency, and using LLM call-tracing tools.
Technical & Logging Proficiency: Ability to comfortably write SQL queries (specifically PostgreSQL) to pull data logs and navigate cloud infrastructure logs (such as GCP) to perform rapid root-cause analysis.
Voice & Conversational AI Domain Knowledge: Foundational understanding of Speech-to-Speech pipelines, including Automatic Speech Recognition (ASR), audio-to-text workflows, and core knowledge base integrations.
Telephony Foundations: Basic familiarity with enterprise telephony routing, call management mechanics (warm/blind transfers), and voicemail detection systems.
Client-Facing Capability: Strong communication skills and the professional agility required to manage UAT timelines, coordinate with client stakeholders, and support rapid production deployments.

Salary and Benefits

We offer competitive salary and benefits, including 401(k) matching, health, vision, and dental insurance, and very flexible paid time off.

The typical salary range for this role is $120,000 to $140,000 USD, depending on skills, qualifications, and relevant experience.

Background Checks

As a health technology company, we reserve the right to run background checks on candidates to whom we extend offers, in compliance with applicable laws. We evaluate candidates holistically and comply with all “ban the box” regulations.

Assistance

If you have a disability or require accommodations during the application or recruitment process, please contact [email protected].

Senior QA Engineer, Forward Deployed