Challenge 5: Make it work and make it scale

Introduction

Having a model is only the first step, we can now make predictions using that model. This is typically called inferencing (or scoring) and can be done

In an online fashion with an HTTP endpoint that can generate predictions for incoming data in real-time,
Or in batch by running the model on a large set of files or a database table.

From this challenge onwards you’ll have the option to either do online inferencing or batch inferencing. Please choose your path:

Online Inferencing
Batch Inferencing

Online Inferencing

Description

So, you’ve chosen for online inferencing. In order to use the model to serve predictions in an online fashion it has to be deployed to an endpoint. Luckily Agent Platform provides exactly what we need, a managed service for serving predictions, called Endpoints.

Create a new Agent Platform Endpoint and deploy the freshly trained model. Use the smallest machine type but make sure that it can scale to more than 1 node by configuring autoscaling. Stick to the defaults for everything else.

Note
The deployment of the model will take ~10 minutes to complete.

Warning
Note that the Qwiklab environment we’re using has a quota on the endpoint throughput (30K requests per minute), do not exceed that.

Success Criteria

The model has been deployed to an endpoint and can serve requests.
Show that the Endpoint has scaled to more than 1 instance under load.
No code was modified.

Tips

Read the requirements for Autoscaling before you complete your configuration.
Verify first that you’re getting predictions from the endpoint before generating load (for example using cURL)
In order to generate load you can use any tool you want, but we recommend oha on Cloud Shell or your notebook environment. You can download the latest version from here (you’ll need the oha-linux-amd64 version for Cloud Shell or your notebook environment).

Learning Resources

Documentation on Online Predictions deployment
More info on the request data format. Remember that we’ve used the scikit-learn framework to train our model.

Batch Inferencing

Description

So, you’ve chosen for the batch inferencing path. We’re going to use Agent Platform Batch Inference to get predictions for data in a BigQuery table. First, go ahead and create a new table with at most 10K rows that’s going to be used for generating the predictions. Once the table is created, create a new Batch Inference job with that table as the input and a new BigQuery table as the output, using the previously created model. Choose a small machine type and 2 compute nodes. Don’t turn on Model Monitoring yet as that’s for the next challenge.

Note
The batch inferencing will take roughly ~10 minutes, most of that is the overhead of starting the cluster, so increasing the number of instances won’t help with the small table we’re using.

Success Criteria

There’s a properly structured input table in BigQuery with 10K rows.
There’s a succesful Batch Inference job.
There are predictions in a new BigQuery table.
No code was modified.

Tips

The pipeline that we’ve used in the previous challenge contains a task to prepare the data using BigQuery, have a look at that for inspiration.
As output you can also choose a BigQuery dataset, in which case a new output table with the right schema will be created for you automatically.
Make sure that the input table has the exact same number of input columns as required by the model. Remember, for training extra data is needed which is not an input for the model at inferencing time ;)

Learning Resources

Creating BigQuery datasets
Creating BigQuery tables
BigQuery public datasets
Agent Platform Batch Predictions

Previous Challenge Next Challenge