Serverless machine learning solves the problem of how to build and operate supervised machine learning systems in Python without having to first learn how to install, configure, and operate complex computational and data storage infrastructure.
Improved infrastructural support for the serverless orchestration of feature pipelines, training pipelines, and inference pipelines enables Python developers to build and operate ML systems without first having to become experts in either Kubernetes or cloud infrastructure.
Serverless machine learning (ML) is a new category of loosely coupled serverless services that provide the operational services (compute and storage) for AI-enabled products and services. Serverless compute services orchestrate and run the feature pipelines, training pipelines, and batch inference pipelines. The outputs of these ML pipelines include reusable features, training data, models, and prediction logs, and a serverless feature/model store manages state in serverless ML. Finally, serverless model serving provides support for online models that are accessible via network endpoints.
Machine learning systems implement a data flow of processing and storage steps, starting from input data to features to trained models, finishing with a prediction service (that uses the model and inference data) and a monitoring service with prediction logs for observability and debugging.
In a serverless ML system, the machine learning pipeline stages are refactored into independent feature engineering, training, and inference pipelines. These pipelines communicate by storing their outputs and reading their inputs in a feature store or model registry. Even prediction logs (features and predicted labels) can be stored back in the feature store to enable the monitoring of models for correctness, performance, and observability.
Serverless ML is not for everybody, yet. Many serverless services do not have a managed cloud offering, where the service can be deployed and managed inside the customer’s cloud account. For this reason, some enterprises will eschew using SaaS services that manage sensitive data about customers.
You don't need to learn Kubernetes or Cloud Infrastructure to put ML in production. However, you do need to know both the principles of MLOps and how to apply them to operate a ML system in production.
They key principles of MLOps are versioning and testing of ML assets. The two most important assets are (1) models, and (2) data (features).
It is widely known that you should version your ML models, so that you can perform A/B tests of those models, helping you figure out if the new model you trained should replace the old one or not. In operational ML systems, models typically require historical data (e.g., about a user) or context data (what is trending). So, your versioned models also need versioned data (features).
In fact, there is a hierarchy of dependencies between data, models, and the ML-Applications that use the models. The data we use to train models and make predictions with models is called features. If you don't test your features, it is hard to trust the models that are built on those features. So, you should test your features with data validation and unit tests for feature logic.
Similarly, ML-enabled applications build on models that should be tested against bias and poor performance. So you should have model validation tests that need to pass before a model can be deployed in production in a A/B test.
We run a free online course on Serverless ML. In the course we bulid both analytical ML systems and interactive ML systems.
The analytical ML systems are typically a Dashboard that has new predictions once day/hour. Some example Dashboards built from the course include predicting surf height at a beach in Ireland, predicting BitCoin sentiment based on recent tweets, air quality predictions for Poland, and predicting electricity demand/prices for the coming 24 hours. These system runs a feature pipeline once/day to synthetically generate a new Iris Flower and write it to the feature store. Then a batch inference pipeline, that also runs once/day but just after the feature pipeline, reads the single flower added that day, and downloads the Iris model trained to predict the type of Iris flower based on the 4 input features: sepal length, sepal width, petal length, and petal width. The model’s prediction is written to an online Dashboard, and actual flower (outcome) is read from the feature store and also published to the same Dashboard - so you can see if the model predicted correctly.
The interactive ML systems are typically a Gradio or Streamlit UI (on Hugging Face Spaces or Streamlit Cloud) and work with a model either hosted or downloaded from Hopsworks. They typically take user input and join it with historical features from Hopsworks Feature Store, and produce predictions in the UI. Some examples of systems built in the course include predicting the house price for a given address in Stockholm, song recommendation for a given playlist on Spotify, vocals removal from a song/sound file, and predicting whether a post to Reddit will be liked or not.
All of the above systems also include a monitoring dashboard to evaluate model performance, helping inform on errors/bugs and when to retrain models.
There are many SaaS platforms for machine learning that can be considered to as building blocks for Serverless ML Systems. In the figure below, we loosely categorize them into the raw data layer (data warehouses and object stores), the compute layer for orchestrated pipelines (feature and inference pipelines), the ML Development services for model training and experiment management, and the state layer for features and models as well as model serving and model monitoring.
With Serverless Machine Learning, Data Scientists can move beyond Jupyter notebooks and just training models to building fully fledged prediction services that use ML models. With Severless ML, all that is needed is Python skills to build interactive, self-monitoring ML systems.