Site Reliability Engineer III
Full-timeSENIOREngineeringPhD
Market Rate — Chemical Engineers
25th
$92K
Median
$112K
75th
$139K
BLS 2024 data (national)
Description
<h2><b>Career Category</b></h2>Engineering<h2></h2><h2><b>Job Description</b></h2><h1>Position Overview</h1><p>The GCF5 Site Reliability Engineer is the senior technical leader for the HPC Enablement pillar. They define and socialize operational standards and patterns, lead multi-team delivery, mentor GCF4 engineers, and translate researcher needs into scalable compute enablement designs. They own pillar-level reliability, performance, cost efficiency, and SLA/SLO outcomes, and influence cross-team engineering quality.</p><p></p><p>This role reports to the GCF7 leader and partners closely with peer GCF5 domain leads across SCIP to ensure cohesive, scalable platform evolution.</p><h1>Core Responsibilities</h1><ul><li>Own the compute reliability and enablement roadmap within SCIP.</li><li>Define onboarding playbooks and golden paths for HPC workloads.</li><li>Establish containerization and reproducible runtime standards.</li><li>Optimize scheduler configuration and resource allocation policies.</li><li>Conduct workload profiling and performance tuning.</li><li>Define and manage SLOs, reliability standards, and operational guardrails.</li><li>Lead incident response and reduce recurring failures.</li><li>Mentor engineers and elevate reliability practices.</li><li>Partner with scientific teams to translate compute requirements into scalable infrastructure patterns.</li></ul><h1>Core Competencies</h1><ul><li>Deep expertise in HPC Enablement (HPC) with evidence of standard‑setting and reuse.</li><li>Systems design at scale (HPC); performance, security, and observability fundamentals.</li><li>Product/engineering thinking: road mapping, prioritization, and outcome‑oriented delivery.</li><li>Stakeholder influence across science, engineering, and governance forums; crisp written/verbal communication.</li></ul><h1>Core Success Measures</h1><ul><li>HPC job success rate improvement.</li><li>Reduction in MTTR for compute incidents.</li><li>Performance improvements relative to baseline.</li><li>Time-to-onboard new scientific workloads.</li><li>Improvement in cost-per-compute-hour efficiency.</li><li>Reduction in operational toil via automation.</li></ul><h1>Key Relationships</h1><ul><li>Collaborates with GCF6 Group Lead and cross‑functional leaders (R&D/PD/Dev).</li><li>Mentors and develops GCF4 Data and Software Engineers, partners with platform, data, ML, and research teams.</li><li>Interfaces with governance (architecture, security, compliance) and vendor/partner teams.</li></ul><h1>Decision Authority</h1><ul><li>Approve designs within the pillar; define and waive standards/patterns with rationale.</li><li>Recommend buy‑vs‑build; commit pillar resources to meet SLAs/SLOs; escalate risks.</li><li>Prioritize pillar backlog and roadmap in alignment with strategy and OKRs.</li></ul><h1>Qualifications</h1><p>Basic Qualifications:</p><ul><li>BS+8 / MS+6 / PhD in CS/Engineering/Data disciplines.</li><li>Demonstrated production delivery experience in HPC at scale.</li><li>Demonstrated literacy in a relevant scientific domain (e.g., biology, chemistry, therapeutic discovery).</li></ul><ul><li>Preferred Qualifications:</li><li>Depth in HPC Enablement (HPC).</li><li>Kubernetes and continuous integration/continuous delivery (CI/CD) at scale; observability, performance tuning, and security-by-design.</li><li>Evidence of standard‑setting and cross‑team influence; mentoring experience.</li></ul><p></p><p></p><p style="text-align:inherit"></p><p style="text-align:inherit"></p><p style="text-align:inherit"></p>.
Amgen
BIOTECHNOLOGY
Small Molecules, Biologics
LocationTHOUSAND OAKS, CA
Employees27,000
Open Jobs1215
OncologyCardiovascularBone HealthImmunologyNeuroscience
View Company ProfilePipeline
Physician SurveyN/A
Peds Metabolic Syndrome in PsoriasisN/A
Persistence With Prolia® (Denosumab) in Postmenopausal Women With OsteoporosisN/A
TAP® Micro Select DeviceN/A
ENBREL®N/A