Data Scientist

Roche·Pune
8h ago
Full-time

Description

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections,  where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.

The Position

The Opportunity

We are looking for a highly skilled Data Scientist to join our team. The ideal candidate will have expertise in machine learning, statistical modeling, and data analytics to extract valuable insights from diverse datasets. 

In this role, you'll be responsible for the end-to-end development of predictive models as well as ML/AI and GenAI solutions, from initial concept and prototyping to production and ongoing operations. As a pivotal contributor to the digital transformation of laboratory solutions, this position will partner with cross-functional teams and product leaders to shape digital roadmaps and deliver impactful product innovations.

Key Responsibilities

  • Data Analysis & Modeling: Collect, clean, and preprocess structured and unstructured data from various sources. Perform exploratory data analysis to uncover trends and patterns. Develop predictive and prescriptive models using machine learning and statistical techniques.

  • Algorithm Development: Design, develop, customize, optimize, and fine-tune algorithms tailored to specific use cases such as anomaly detection, predictive modeling, time-series forecasting, recommendation systems, text generation, summarization, information extraction, chatbots, AI agents, code generation, document analysis, sentiment analysis, data analysis, etc.

  • Application Development: Collaborate with developers and stakeholders to integrate statistical models, LLMs and classical AI techniques into end-user applications, focusing on user experience, and real-time performance.

  • End-to-End Pipeline Development: Build and maintain production-ready end-to-end pipelines, including data ingestion, preprocessing, training, evaluation, deployment, and monitoring. Automate workflows using DevOps / MLOps best practices to ensure scalability and efficiency.

  • Scalable Model Deployment: Collaborate with other developers to deploy models at scale, using cloud-based infrastructure (AWS, Azure).

  • Monitoring and Maintenance: Implement continuous monitoring and refining strategies for deployed models, using feedback loops and e.g. incremental fine-tuning to ensure ongoing accuracy and reliability; address drifts and biases as they arise.

  • Software Development: Apply software development best practices, including writing unit tests, configuring CI/CD pipelines, containerizing applications, prompt engineering and setting up APIs, ensure robust logging, experiment tracking, and model monitoring.

Qualifications

Candidate Qualifications & Experience

  • Master’s degree in data science, computer science, statistics, or related quantitative field.

  • 5+ years of industrial experience in data science or a related role.

  • 2+ years of industrial experience in AI/ML engineering, with exposure to both classical machine learning methods and language model-based applications.

  • Technical Skills: Advanced proficiency in Python, data analytics, statistical modeling and ML/AI. Experience with leading ML/AI frameworks and tools, with hands-on experience in designing and implementing end-to-end GenAI pipelines.

  • DevOps / MLOps Knowledge: Strong understanding of DevOps / MLOps tools and practices, including version control, CI/CD pipelines, containerization, orchestration, Infrastructure as Code, automated deployment.

  • Deployment: Experience in deploying statistical models, LLM and other AI models with cloud platforms (AWS, Azure) for robust and scalable productizations. 

  • Data Engineering: Expertise in working with structured and unstructured data, including data cleaning, feature engineering with data stores like vector, relational, NoSQL databases and data lakes through APIs.

  • Transferable Skills: Effective communication in English, problem-solving and innovative mindset, adaptability, curiosity, responsibility.

Additional Desired Candidate Qualifications & Experience

  • Advanced degree (Ph.D. or related) ideally with specialization in AI.

  • Experience in healthcare, diagnostics, or other highly regulated industries.

  • Knowledge of clinical laboratory domain, or related domains like medicine, biology, biochemistry, biostatistics, or biophysics.

  • Hands-on exposure to user research or design-thinking approaches to improve data tool usability and effectiveness.

  • Knowledge of data privacy principles and regulations.

  • Experience working in an Agile environment (e.g., SAFe).

 

 

Who we are

A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.


Let’s build a healthier future, together.

Roche is an Equal Opportunity Employer.

Like this job? Get alerts for similar ones

We'll notify you when matching roles are posted.

Pune
Roche

Roche

PHARMACEUTICAL

Small Molecules & Diagnostics

LocationSTAVANGER NORWAY, Norway
Employees100,000
Open Jobs1130

Pipeline

An Observational Study in Patients With Breast Cancer and Bone Metastases Receiving Ibandronate (BonN/A
SD BiosensorN/A
Impact on Quality of Life, Fatigue and Cognitive Function in Anti-angiogenesis in Patients With MetaN/A
An Observational Study of Lung Cancer Related Symptoms and Disease Control Rate in Patients With NonN/A
Blood samples collectionN/A