Putting machine learning (ML) models in production is considered an operational challenge that is performed after all the hard work on training and optimizing the model is completed. In contrast, serverless ML starts with a minimal model, including the operational feature pipeline(s) and inference pipeline. The feature and inference pipelines are needed to ensure the viability of the data that will be fed to the model for both predictions. In this article, we show you that writing feature pipelines and inference pipelines should not be hard and if you don’t have to configure/build the MLOps infrastructure yourself, getting to a minimal viable production model within a couple of weeks should be feasible for most models - like it was for the 90% of successful ML systems in our scalable ML course at KTH, where projects took, on average, 2 weeks to complete.
One of the best practices for systems software development is to start by getting to a working MVP (minimal viable product) as soon as possible. For machine learning, the MVP is a model that makes predictions on new data and publishes predictions for either users or downstream services.
But many Data Scientists, who work mostly with notebooks, find it impossible to even imagine getting to a working ML system. One answer to getting there is to decompose this bigger problem into separate manageable programs that together make up your working ML system.
All ML systems can be decomposed into 3 pipelines:
These pipelines have a shared data layer (a feature store and model registry) that mean each pipeline can run independently at its own cadence. For example, new data might arrive once per hour, so you run the feature pipeline once per hour. The inference pipeline might be a batch job run once per day, so you schedule it to run once per day. The training pipeline might be run on-demand, for example, when your model is stale or when you have more/better data to train your model with.
Note that there is no single “ML Pipeline”. There are 3 pipelines that, working together, make up your ML system. So, when people say “this ML pipeline” - ask them, is it a feature pipeline, a training pipeline, an inference pipeline? Some people might like to couple them together in a single monolithic pipeline, but if you want to write production systems, we strongly recommend against it!
In the course KTH ID2223 on scalable machine learning, students start by building complete ML systems. The first lab introduced them to the concepts of a feature pipeline, training pipeline, and a batch inference pipeline using the well-known Titanic survival dataset. They implemented a synthetic passenger generator function as a feature pipeline so that new data would keep getting created, a Dashboard in Gradio/Hugging Faces as the inference pipeline (to show predictions if the new passenger survived or not). The students also implemented a UI in Gradio to monitor the performance of their model. Model training was typically a Colab/Jupyter notebook with an XGBoost classifier model.
So, students started by building a complete ML system - they put their models to work in their first lab. In fact, more than 90% of students succeeded in putting their first model into production, and for many of them, it was their first exposure to ML (they came from a software engineering background), while many were well versed in ML theory, but not practice.
After the first 2 labs, the students were confident in building a complete ML system with a feature pipeline and a UI. They then undertook a project where they would identify their own prediction problem and dataset, and build a ML system that adds value by making predictions to be consumed by users.
A list of selected student projects is available here. The table below shows a few of them, and how they decomposed the ML system into feature/training/inference pipelines.
There is no such thing as a single machine learning pipeline - there are feature pipelines, training pipelines, and inference pipelines. And if you structure your ML systems as such, you too will be able to quickly build an end-to-end working ML system that can be iteratively improved. The next step after getting to a working ML system is to follow best practices for testing and versioning your ML assets: your features and your models. In future posts, we will write more about MLOps principles of testing and versioning that you should follow, without first having to learn how to install and manage MLOps infrastructure.