Description

Work Flexibility: Hybrid or OnsiteVocera, now part of Stryker, is seeking a visionary and hands-on Principal Engineer – AI Test, Evaluation & Data Architecture to define and lead the enterprise-wide strategy for AI validation, model evaluation, and data governance across our speech and GenAI platforms.This role serves as the AI Quality Architect for real-time speech systems, NLP pipelines, and LLM-powered applications deployed in mission-critical healthcare environments. You will establish scalable evaluation frameworks, design AI testing platforms, define data governance standards, and ensure production reliability of AI systems at scale.This is a high-impact architectural leadership role requiring deep expertise in LLM validation, RAG evaluation, speech benchmarking, automation, MLOps, and AI lifecycle governance.<h2> What You Will Do</h2><h3>Enterprise AI Evaluation Architecture</h3><ul><li>Define and own the end-to-end AI evaluation architecture across speech, NLP, and GenAI platforms.</li><li>Establish standardized evaluation frameworks for:ASR systems (WER, latency, robustness, domain adaptation),NLP systems (intent accuracy, entity F1, confusion analysis),LLM systems (hallucination rate, groundedness, factual accuracy, consistency, safety)</li><li>Define measurable AI quality SLAs and release gating criteria.</li><li>Architect benchmarking standards across model versions, prompt changes, and retrieval updates.</li><li>Institutionalize regression evaluation pipelines for all AI releases.</li></ul><h3>LLM & RAG Reliability Strategy</h3><ul><li>Architect validation frameworks for:RAG-based systems,Prompt orchestration workflows,Multi-agent or multi-model AI pipelines</li><li>Define groundedness measurement strategies for enterprise RAG.</li><li>Establish adversarial testing, stress testing, and edge-case validation frameworks.</li><li>Implement hallucination detection standards and mitigation measurement.</li><li>Drive responsible AI practices, including bias detection and safety validation.</li></ul><h3>AI Testing Platform & Automation Architecture</h3><ul><li>Design and lead implementation of a scalable AI testing platform that includes:Offline evaluation pipelines,Golden dataset-driven regression systems,Synthetic data generation frameworks,Online A/B testing & shadow deployment strategies</li><li>Integrate AI validation workflows into CI/CD and MLOps pipelines.</li><li>Define drift detection and performance degradation monitoring strategies.</li><li>Establish real-time observability dashboards for AI quality metrics.</li></ul><h3>AI Data Governance & Lifecycle Management</h3><ul><li>Define enterprise-wide data governance strategy for AI systems, including:Data collection and curation standards,Annotation workflows and validation,Dataset versioning and reproducibility,Traceability across model iterations</li><li>Establish gold datasets for:Speech systems,NLP pipelines,Clinical and conversational workflows</li><li>Drive continuous learning loops between production telemetry and training data.</li><li>Ensure compliance with healthcare data privacy and regulatory standards.</li></ul><h3>Speech & Domain-Specific AI Validation</h3><ul><li>Define evaluation strategies for:Accent variability,Noisy clinical environments,Domain-specific vocabulary adaptation</li><li>Establish measurable latency and reliability benchmarks for real-time AI systems.</li><li>Lead failure mode analysis and systemic AI quality improvements.</li></ul><h3>Technical Leadership & Organizational Influence</h3><ul><li>Serve as the principal authority on AI testing and evaluation strategy.</li><li>Influence architecture decisions alongside Principal AI Architects and platform leaders.</li><li>Mentor senior engineers in AI validation, benchmarking, and data governance practices.</li><li>Drive AI quality maturity across multiple pods and engineering teams.</li><li>Partner with Product and Executive stakeholders to align AI quality metrics with business outcomes.</li><li>Shape long-term AI reliability roadmap for the organization.</li></ul><h2> Required Qualifications</h2><ul><li>Bachelor’s or Master’s degree in Computer Science, Engineering, AI, or related field.</li><li>13+ years of experience in software engineering, AI engineering, or AI validation roles.</li><li>5+ years of hands-on experience with LLM, RAG, NLP, or speech-based AI platforms.</li><li>Proven experience designing AI evaluation or testing frameworks at scale.</li><li>Strong expertise in:Hallucination detection,Golden dataset regression strategies,Adversarial and edge-case testing,Prompt validation and benchmarking</li><li>Strong proficiency in Python and data analysis for AI evaluation.</li><li>Experience building automated AI validation pipelines integrated with CI/CD.</li><li>Strong system design and distributed architecture understanding.</li><li>Experience leading cross-team technical initiatives.</li></ul><h2> Preferred / Strongly Desired Qualifications</h2><h3>AI & GenAI</h3><ul><li>Experience in architecting evaluation frameworks for production RAG systems.</li><li>Familiarity with semantic search validation and retrieval benchmarking.</li><li>Experience designing LLM guardrails and structured output validation.</li><li>Knowledge of Responsible AI, fairness evaluation, and compliance auditing.</li></ul><h3>Speech & Voice Systems</h3><ul><li>Experience evaluating ASR/TTS systems in production environments.</li><li>Strong understanding of speech benchmarking metrics and domain adaptation strategies.</li></ul><h3>Cloud & Platform</h3><ul><li>Experience with Azure ML, Azure OpenAI, Azure AI Search.</li><li>Familiarity with MLOps and model lifecycle automation.</li><li>Experience designing scalable evaluation infrastructure in cloud-native environments.</li></ul>Travel Percentage: 10%

Principal AI Engineer

Description

Stryker