Jupyter Notebooks in production

How we scaled Jupyter Notebooks for business impact, turning prototyping tools into production powerhouses.

05/12/2025

Our Data Science Platforms team built a scalable, production-ready Jupyter Notebooks platform that empowers analytical scientists to deploy their work without code rewrites or engineering handovers.

This innovative solution has reduced time-to-market (TTM) from weeks to hours, improved data reliability and enabled over 50 analytical scientists to deliver faster, more scalable business insights.

What are Jupyter Notebooks?

Jupyter Notebooks are an open-source, interactive web application empowering users to create and share dynamic documents that integrate live code, equations, visualisations and narrative text.

Renowned for their adaptability and user-friendly interface, Jupyter Notebooks have become a cornerstone in data science, machine learning, research and education.

Why we needed Jupyter Notebooks in production

As Capitec’s data science and machine learning community grew across departments, so did the complexity and scale of the problems we were solving. Jupyter Notebooks quickly became the preferred development environment for our data scientists, credit decisioning scientists and machine learning engineers (collectively referred to as analytical scientists) due to their intuitive nature. Their flexibility, interactivity and ease of use made them ideal for rapid experimentation, prototyping and insight generation.

But with that growth came a bottleneck.

While Notebooks are great for development, they’re not designed for enterprise-scale production. We’d develop high-value models and workflows in Notebooks, but moving them into production meant facing friction, rework and risk.

The transition from prototype to production required reworking code into scripts, involving engineering teams, and navigating multiple layers of infrastructure. This slowed our TTM and created dependencies that clashed with our need for agility.

In some cases, critical Notebooks were being manually executed in development environments, posing clear operational risks and making it hard to ensure consistency, monitoring and governance.

We needed a solution that:

Lets analytical scientists move from idea to production without translation or delay
Maintained enterprise-grade governance, monitoring and scalability
Reduced operational risk and supported Capitec’s growth trajectory

What makes our approach unique

While several leading companies, like Netflix, have successfully integrated Jupyter Notebooks into production, our approach stands out in 3 ways:

Scientist-first design: We didn’t force data scientists to learn a new system. We built a platform around their existing workflows then layered governance, automation and observability on top
Reusable and modular architecture: One pipeline supports multiple use cases, with minimal custom provisioning. This lowers infrastructure overhead and accelerates adoption
Seamless test-to-prod path: By ensuring similarity between the development and production environments, we removed one of the biggest sources of deployment friction

Our solution: Operationalising Jupyter Notebooks

To speed up the journey from development to production without compromising on control, our Data Science Platforms team reimagined Jupyter Notebooks as production-grade, governed assets.

The goal was to enable analytical scientists to deploy and manage their work independently, with minimal engineering support, while ensuring scalability, reliability and compliance.

Here's a summary of how we did it:

A user-first production framework

Our solution had to meet enterprise standards but feel familiar and intuitive to our analytical scientists.

We developed a modular pipeline architecture that runs Notebooks in production through scalable cloud infrastructure, using familiar development tools like AWS SageMaker Studio and AWS Sagemaker Processing Jobs. This reduces friction by allowing scientists to build and test in the same environment they’ll use in production.

Each Notebook runs as a standalone job or part of a sequence of Notebooks, enabling teams to break complex logic into smaller, manageable components.

Streamlined version control and deployment

All Notebooks are version-controlled in GitHub, with structured continuous integration/continuous deployment (CI/CD) pipelines to manage reviews, approvals and releases.

Instead of rewriting code into scripts just to meet process requirements, we extended GitHub’s native Notebook support, allowing everything to happen in one central platform, streamlining code reviews and collaboration.

Scalability and governed data access

Each Notebook runs in a fully scalable environment, adapting to the processing needs of tasks like feature generation, model scoring or performance monitoring.

Jobs read from and write to our Enterprise Data Warehouse (EDW) and Feature Platform using a custom Python utility library, enforcing strong data governance through existing metadata, lineage and quality checks.

Built-in observability and monitoring

Every executed Notebook, successful or failed, is stored in for debugging and traceability. Execution logs are captured through CloudWatch, and real-time notifications are sent to Microsoft Teams, PagerDuty or email. This gives operational stakeholders full visibility into what’s running, when it’s running and how it’s performing.

And thanks to Papermill, we can inject runtime variables into any Notebook, making it easy to rerun jobs with different inputs or promote across environments without manual changes.

The impact

Today, over 50 analytical scientists across Capitec use Jupyter Notebooks in their day-to-day work. With our new platform, they can move from prototype to production in hours instead of weeks, without sacrificing control or compliance.

We’ve reduced TTM for analytical models, improved the reliability of data products and given analytical scientists the autonomy to own their solutions from start to finish.

By turning Notebooks into first-class production assets, we’ve reached a new level of scale and impact for data science at Capitec.

Written by: Daniel Nieuwenhuizen, Technical Lead – Modelling Platform

Project engineers:

Koeshen Moodley, Machine Learning Engineer
Jacobie Mouton, Machine Learning Engineer
Daniel Nieuwenhuizen, Technical Lead – Modelling Platform

Was this article helpful?