Director PCS Cloud Operations SRE

Full-timeExecutive

Description

Job Description Summary

The Director – Cloud Operations provides leadership, innovation, and oversight for SRE and CloudOps across PCS. The role establishes the operating foundations, metrics, and automation needed to run mission‑critical, greenfield applications with high reliability and security, and is accountable for meeting product SLAs while scaling Cloud Operations and institutionalizing modern SRE practices in close partnership with product, platform, and security teams.

Job Description

Essential Responsibilities

Serve as the functional leader for the PCS Digital Cloud Operations team. Define the operating model, governance, and KPIs; drive automation and observability; and ensure secure, reliable deployments across environments with continuous improvement and tight collaboration with security. This role reports to the VP of Engineering – PCS Apps & Platform. Key responsibilities include:

  • Own Cloud Operations for PCS cloud applications; stand up and scale CloudOps capabilities to support multiple products while adhering to committed SLAs.
  • Institutionalize SRE practices: implement SLI/SLO/SLA frameworks, error budgets, incident/post‑mortem processes, and reliability runbooks; champion automation to reduce toil and improve service health and monitoring.
  • Build end‑to‑end observability (APM/RUM, logs, metrics, traces, health dashboards, proactive alerting) and evolve toward auto‑healing and AIOps for anomaly detection and closed‑loop remediation.
  • Drive change, incident, and problem management with clear RACI and stakeholder communications; reduce MTTR through streamlined L1–L4 escalation.
  • Establish and test DR/BCP posture; conduct AWS Well‑Architected and operational readiness reviews for services (AWS‑first, with multi‑cloud considerations as needed).
  • Lead FinOps practices: cost allocation and accountability, right‑sizing, savings plans/reserved instances, spend governance, and unit‑economics optimization.
  • Evolve the operating model in partnership with platform and application teams; standardize CI/CD templates and “everything‑as‑code” for speed and repeatability.
  • Build and develop a high‑performing team: hire, coach, and grow CloudOps/SRE talent and the next set of leaders; uphold high standards for quality and customer satisfaction.

Core KPIs & outcome metrics:

  • Service availability versus SLA/SLO and error‑budget burn rate.
  • MTTD/MTTR and incident recurrence; % incidents with post‑mortems completed.
  • Change failure rate and lead time for changes for production deployments.
  • % automated runbooks/toil reduction; % services with complete SLI/SLO coverage.

Basic Qualifications

  • Bachelor’s degree in computer science or a STEM field.
  • A minimum of 10 years experience in leading technical teams in complex, fast‑paced environments, including 5+ years of in Cloud Ops and SRE leadership roles
  • Proven expertise in the areas of DevSecOps, Day‑2 Ops, APM/RUM, and Cloud Operations.
  • Proficiency building and operating services on public cloud (AWS‑first) with CI/CD and Infrastructure‑as‑Code (e.g., Terraform/CloudFormation).
  • Track record establishing SLIs/SLOs/SLAs, observability, and incident/change management at scale.
  • Strong leadership and team management skills, with the ability to inspire and motivate a team of engineers.
  • Excellent project management skills, with the ability to manage multiple complex projects simultaneously.
  • In-depth knowledge of SaaS technologies, cloud computing, and medical device development processes.

Desired Characteristics

Technical competencies:

  • Experience scaling CloudOps/SRE for multiple products and customer deployments.
  • Deep fluency in SLI/SLO/SLA design, error budgets, runbooks, and auto‑healing patterns.
  • Strong AWS architecture and operations; Well‑Architected reviews; capacity and cost optimization (FinOps).
  • Modern observability (APM/RUM/logs/metrics/traces) and AIOps for predictive analytics/anomaly detection.
  • Security by design (DevSecOps, policy‑as‑code) and DR/BCP planning/testing.

Leadership competencies:

  • Clear, decisive communicator able to influence across product, platform, and security stakeholders.
  • Builder‑coach mindset: hire, mentor, and grow managers and ICs; create leaders of leaders.
  • Change agent who challenges the status quo while maintaining high standards for quality and customer satisfaction.
  • Operates with ownership, bias for action, and strong judgment in an ambiguous, high‑growth environment.

Top 5 Critical Competencies & Skills

  • SRE & Reliability Leadership — SLI/SLO/SLA management, error budgets, disciplined post‑mortems.
  • Cloud Operations at Scale (AWS‑first) — operational readiness, DR/BCP, change/incident/problem management, and Well‑Architected operations. Observability & AIOps — end‑to‑end telemetry, APM/RUM, automated remediation to reduce MTTR and toil.
  • DevSecOps & Policy‑as‑Code — secure‑by‑default pipelines and vulnerability management with measurable SLAs.
  • FinOps & Cost Governance — cost allocation, right‑sizing, and spend optimization to improve unit economics while scaling.

Additional Information

Relocation Assistance Provided: No

Like this job? Get alerts for similar ones

We'll notify you when matching roles are posted.

IND19-01-Bengaluru-EPIP 122 (Phase II)

Pipeline

GE Optima/Discovery® MRI data of the liverN/A
Continuous MonitoringN/A
Phase 4 Study to Demonstrate Prognostic Usefulness of AdreView™ Scintigraphy for Identifying SubjectN/A
single photon emission computed tomographyN/A
ABUSN/A