Principal AI Engineer

Stryker·
Bengaluru, India
5d ago
Full-timeSenior

Description

Work Flexibility: Hybrid or Onsite<p style="text-align:inherit"></p><p style="text-align:inherit"></p><p>Vocera, now part of Stryker, is seeking a visionary and hands-on <b>Principal Engineer – AI Test, Evaluation &amp; Data Architecture</b> to define and lead the enterprise-wide strategy for AI validation, model evaluation, and data governance across our speech and GenAI platforms.</p><p>This role serves as the AI Quality Architect for real-time speech systems, NLP pipelines, and LLM-powered applications deployed in mission-critical healthcare environments. You will establish scalable evaluation frameworks, design AI testing platforms, define data governance standards, and ensure production reliability of AI systems at scale.</p><p>This is a high-impact architectural leadership role requiring deep expertise in LLM validation, RAG evaluation, speech benchmarking, automation, MLOps, and AI lifecycle governance.</p><h2><br /><b>What You Will Do</b></h2><h3><b>Enterprise AI Evaluation Architecture</b></h3><ul><li><p>Define and own the end-to-end AI evaluation architecture across speech, NLP, and GenAI platforms.</p></li><li><p>Establish standardized evaluation frameworks for:</p><p>ASR systems (WER, latency, robustness, domain adaptation),</p><p>NLP systems (intent accuracy, entity F1, confusion analysis),</p><p>LLM systems (hallucination rate, groundedness, factual accuracy, consistency, safety)</p></li><li><p>Define measurable AI quality SLAs and release gating criteria.</p></li><li><p>Architect benchmarking standards across model versions, prompt changes, and retrieval updates.</p></li><li><p>Institutionalize regression evaluation pipelines for all AI releases.</p></li></ul><h3><b>LLM &amp; RAG Reliability Strategy</b></h3><ul><li><p>Architect validation frameworks for:</p><p>RAG-based systems,</p><p>Prompt orchestration workflows,</p><p>Multi-agent or multi-model AI pipelines</p></li><li><p>Define groundedness measurement strategies for enterprise RAG.</p></li><li><p>Establish adversarial testing, stress testing, and edge-case validation frameworks.</p></li><li><p>Implement hallucination detection standards and mitigation measurement.</p></li><li><p>Drive responsible AI practices, including bias detection and safety validation.</p></li></ul><h3><b>AI Testing Platform &amp; Automation Architecture</b></h3><ul><li><p>Design and lead implementation of a scalable AI testing platform that includes:</p><p>Offline evaluation pipelines,</p><p>Golden dataset-driven regression systems,</p><p>Synthetic data generation frameworks,</p><p>Online A/B testing &amp; shadow deployment strategies</p></li><li><p>Integrate AI validation workflows into CI/CD and MLOps pipelines.</p></li><li><p>Define drift detection and performance degradation monitoring strategies.</p></li><li><p>Establish real-time observability dashboards for AI quality metrics.</p></li></ul><h3><b>AI Data Governance &amp; Lifecycle Management</b></h3><ul><li><p>Define enterprise-wide data governance strategy for AI systems, including:</p><p>Data collection and curation standards,</p><p>Annotation workflows and validation,</p><p>Dataset versioning and reproducibility,</p><p>Traceability across model iterations</p></li><li><p>Establish gold datasets for:</p><p>Speech systems,</p><p>NLP pipelines,</p><p>Clinical and conversational workflows</p></li><li><p>Drive continuous learning loops between production telemetry and training data.</p></li><li><p>Ensure compliance with healthcare data privacy and regulatory standards.</p></li></ul><h3><b>Speech &amp; Domain-Specific AI Validation</b></h3><ul><li><p>Define evaluation strategies for:</p><p>Accent variability,</p><p>Noisy clinical environments,</p><p>Domain-specific vocabulary adaptation</p></li><li><p>Establish measurable latency and reliability benchmarks for real-time AI systems.</p></li><li><p>Lead failure mode analysis and systemic AI quality improvements.</p></li></ul><h3><b>Technical Leadership &amp; Organizational Influence</b></h3><ul><li><p>Serve as the principal authority on AI testing and evaluation strategy.</p></li><li><p>Influence architecture decisions alongside Principal AI Architects and platform leaders.</p></li><li><p>Mentor senior engineers in AI validation, benchmarking, and data governance practices.</p></li><li><p>Drive AI quality maturity across multiple pods and engineering teams.</p></li><li><p>Partner with Product and Executive stakeholders to align AI quality metrics with business outcomes.</p></li><li><p>Shape long-term AI reliability roadmap for the organization.</p></li></ul><h2><br /><b>Required Qualifications</b></h2><ul><li><p>Bachelor’s or Master’s degree in Computer Science, Engineering, AI, or related field.</p></li><li><p>13&#43; years of experience in software engineering, AI engineering, or AI validation roles.</p></li><li><p>5&#43; years of hands-on experience with LLM, RAG, NLP, or speech-based AI platforms.</p></li><li><p>Proven experience designing AI evaluation or testing frameworks at scale.</p></li><li><p>Strong expertise in:</p><p>Hallucination detection,</p><p>Golden dataset regression strategies,</p><p>Adversarial and edge-case testing,</p><p>Prompt validation and benchmarking</p></li><li><p>Strong proficiency in Python and data analysis for AI evaluation.</p></li><li><p>Experience building automated AI validation pipelines integrated with CI/CD.</p></li><li><p>Strong system design and distributed architecture understanding.</p></li><li><p>Experience leading cross-team technical initiatives.</p></li></ul><h2><br /><b>Preferred / Strongly Desired Qualifications</b></h2><h3><b>AI &amp; GenAI</b></h3><ul><li><p>Experience in architecting evaluation frameworks for production RAG systems.</p></li><li><p>Familiarity with semantic search validation and retrieval benchmarking.</p></li><li><p>Experience designing LLM guardrails and structured output validation.</p></li><li><p>Knowledge of Responsible AI, fairness evaluation, and compliance auditing.</p></li></ul><h3><b>Speech &amp; Voice Systems</b></h3><ul><li><p>Experience evaluating ASR/TTS systems in production environments.</p></li><li><p>Strong understanding of speech benchmarking metrics and domain adaptation strategies.</p></li></ul><h3><b>Cloud &amp; Platform</b></h3><ul><li><p>Experience with Azure ML, Azure OpenAI, Azure AI Search.</p></li><li><p>Familiarity with MLOps and model lifecycle automation.</p></li><li><p>Experience designing scalable evaluation infrastructure in cloud-native environments.</p></li></ul><p style="text-align:inherit"></p><p style="text-align:inherit"></p>Travel Percentage: 10%<p style="text-align:inherit"></p><p style="text-align:inherit"></p><p style="text-align:inherit"></p><p style="text-align:inherit"></p><p></p><p></p><p></p><p></p>
Stryker

Stryker

MEDICAL DEVICES

Medical Devices

LocationCA - San Jose
Open Jobs1528
View Company Profile