Deploying custom model on Vertex AI

Mehar RamzanMehar Ramzan
8 min read

Vertex AI is a machine learning platform provided by Google Cloud. It is designed to help organizations build, deploy, and manage machine learning models and pipelines. Vertex AI offers a wide range of tools and services to simplify the end-to-end machine learning workflow.

Here are the steps to deploy a custom model on VertexAI.

  1. Create FastAPI

  2. Create and upload a docker image

  3. Deploy Model to the Endpoint

  4. Call Endpoint

Prerequisites:

Before we get started, please make sure you meet the following requirements:

  1. A basic understanding of machine learning and Python.

  2. gcloud and docker installed on your system.

1. Create FastAPI

Here is a simple FastAPI with two endpoints:

  • /health (HTTP GET):

    This endpoint is a health check, and when accessed, it returns a JSON response with a health status of "Server is up" and an HTTP status code of 200.

  • /predict (HTTP POST):

    The '/predict' endpoint is designed to handle HTTP POST requests and takes user-specific data in the request body. Specifically, it extracts the user's name from the request, personalizes a greeting message, and returns it as part of the response.

      import json
      import time
      from fastapi import FastAPI, Request, Response
    
      APP = FastAPI()
    
      @APP.get('/health')
      def get_health():
          return Response(
              content=json.dumps({'health_status': 'Server is up'}),
              status_code=200
          )
    
      def _predict(request_body):
          instances = request_body['instances']
          name = instances[0]['name'] 
          message = f"Hello {name}, have a good day"
          return message
    
      @APP.post('/predict')
      async def predict(request: Request):
          tic = time.time()
          try:
              body = await request.json()
              message = _predict(body)
              response = {
                  'predictions': [message],
                  'time_taken': time.time() - tic
                         }
              return Response(
                  status_code=200,
                  content=json.dumps(response)
              )
          except Exception as e:
              response = {
                  'predictions': [{
                      'error': str(e),
                      'time_taken': time.time() - tic
                  }]
              }
              return Response(
                  status_code=500,
                  content=json.dumps(response)
              )
    

    Request and Response Format:
    It is imperative to ensure that both the request and response adhere to a predefined format to facilitate seamless communication with the endpoints.

    • Request format :

      The request format for the VertexAI endpoint should conform to the following structure, where 'instances' is a list containing dictionaries with key-value pairs:

        {
            "instances": [
                {
                    "key": "value"
                }
            ]
        }
      
    • Response format :

      The response generated by the VertexAI endpoint adheres to a consistent structure, presenting the results in a standardized format, with 'predictions' encapsulating the output data:

        {
            "predictions": [
                {
                    "key": "value"
                }
            ]
        }
      

2. Create, Test and Upload Docker Container Image

Docker is a platform for creating lightweight, portable containers that package applications and their dependencies, ensuring consistent and efficient deployment across different environments. It simplifies software development, enhances scalability, and is a cornerstone of modern containerization and cloud computing.

  1. Create Dockerfile

    A Dockerfile is a script used to create a Docker container image. It specifies the base image, installs dependencies, copies files, sets environment variables, and defines how the container should run.

     # Pull a base image.
     FROM python:3.8
    
     # Install requirements
     RUN pip install fastapi uvicorn
    
     # Copy model files
     COPY . .
    
     EXPOSE 8080
     CMD ["uvicorn", "app:APP", "--host", "0.0.0.0", "--port", "8080"]
    

    FROM python:3.8: This line specifies the base image for your Docker container. In this case, it's using an official Python 3.8 image as the starting point. The base image provides the foundational operating system and environment for your application.

    RUN pip install fastapi uvicorn: This line instructs Docker to run a command inside the container. It uses the pip package manager to install the required Python packages, FastAPI, and Uvicorn. These packages are necessary for your application to run.

    COPY . .: This command copies the files from your local directory into the container. The first dot (.) represents the source directory on your local machine, and the second dot (.) represents the destination directory within the container. This is how you include your application code and any other necessary files in the container image.

    EXPOSE 8080: This line informs Docker that the container will listen on port 8080. It doesn't publish the port to the host machine; it's just a way to document which ports the container will use. To map this port to the host, you'd do that when running the container with the -p or -P option.

    CMD ["uvicorn", "app:APP", "--host", "0.0.0.0", "--port", "8080"]: This specifies the command that will be executed when the container starts. In this case, it's running Uvicorn to serve your FastAPI application. It tells Uvicorn to listen on all available network interfaces (0.0.0.0) at port 8080.

  2. Build and Test Docker Container

    • Build a Docker Container:

      To build a Docker container using the Dockerfile (assuming your Dockerfile is in the current directory), run the following command:

        docker build -t your-container-name:tag .
      

      For Example:

        docker build -t test:v1 .
      
    • Run the Docker Container:

      To run the Docker container, use the following command:

        docker run -it -p host-port:container-port your-container-name:tag
      

      For Example:

        docker run -it -p 8080:8080 test:v1
      
    • Test Docker Container:

      When the Docker container is running, open another terminal and execute the following curl command:

    •               curl --request POST \
                      --url http://localhost:8080/predict \
                      --header 'Content-Type: application/json' \
                      --data '{
                        "instances": [
                          {
                            "name": "ramzan"
                          }
                        ]
                      }'
      
    • Upload Docker Container Image
      Tag your docker container

        docker tag your-container-name:tag gcr.io/project_id/your-container-name:tag
      

      Upload your docker container to container registry.

        docker push gcr.io/project_id/your-container-name:tag
      

3. Deploy Model to the Endpoint

  • Upload Model to Model Registry

      gcloud beta ai models upload \
        --region=$REGION \
        --display-name=$MODEL_NAME \
        --container-image-uri=gcr.io/$PROJECT_ID/$IMAGE_NAME:tag \
        --container-ports=8080 \
        --container-health-route=/health \
        --container-predict-route=/predict
    
    • gcloud beta ai models upload: This is the main command to upload a custom model to AI Platform.

    • --region=$REGION: Specifies the Google Cloud region where you want to deploy the model. Replace $REGION with the desired region, such as us-central1.

    • --display-name=$MODEL_NAME: Sets the display name for the model. Replace $MODEL_NAME with the name you want to give to your model.

    • --container-image-uri=gcr.io/$PROJECT_ID/$IMAGE_NAME:tag: Specifies the container image URI for your model. Replace $PROJECT_ID with your Google Cloud project ID, $IMAGE_NAME with the name of your container image, and tag with the version or tag of the image you want to use.

    • --container-ports=8080: Specifies the port on which your model's service will listen. In this case, it's set to port 8080.

    • --container-health-route=/health: Defines the route where the health check endpoint is available in your container. The health check route is typically used to determine the health of the service.

    • --container-predict-route=/predict: Specifies the route where prediction requests will be sent to your model. In this example, prediction requests are expected to be sent to the /predict route.

  •                gcloud beta ai models list \
                    --region=$REGION \
                    --filter=display_name=$MODEL_NAME
    

    This command helps you get important details like the model_id and display_name. Just make sure to use the same region as in the previous command when you uploaded the model to the model registry:

  • Create Endpoint

      gcloud beta ai endpoints create \
        --region=$REGION \
        --display-name=$ENDPOINT_NAME
    

    This command is used to create model endpoint :

    • --region=$REGION: Specifies the region where the endpoint should be created.

    • --display-name=$ENDPOINT_NAME: Assigns a display name to the created endpoint.

    • Get endpoint id by running this command :
      gcloud ai endpoints list --region=$REGION

  • Deploy Model to the Endpoint

    Using the following command deploy the model to the endpoint

      #if you don't want gpu's
      gcloud beta ai endpoints deploy-model $ENDPOINT_ID \
        --region=$REGION \
        --model=$MODEL_ID \
        --display-name=$DEPLOYED_MODEL_NAME \
        --machine-type=n1-standard-4 \
        --min-replica-count=1 \
        --max-replica-count=2 \
        --traffic-split=0=100
    
        #if you want gpus
        gcloud beta ai endpoints deploy-model $ENDPOINT_ID \
        --region=$REGION \
        --model=$MODEL_ID \
        --display-name=$DEPLOYED_MODEL_NAME \
        --machine-type=n1-standard-4 \
        --accelerator=count=1,type=nvidia-tesla-t4 \
        --min-replica-count=1 \
        --max-replica-count=2 \
        --traffic-split=0=100
    

4. Call Endpoint

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict \
-d "@${INPUT_DATA_FILE}"

Example

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict \
-d '{
          "instances": [
            {
              "name": "ramzan"
            }
          ]
        }'

Conclusion :

Summing up our journey, we've ventured into the intricate process of deploying a personalized machine learning model onto Google's Vertex AI platform.

Starting with FastAPI, we constructed a virtual gateway for our model, defining endpoints to check its health and receive personalized predictions. These endpoints act like doors, ensuring smooth communication between users and our model's capabilities.

Docker played a pivotal role as a containerization tool. It bundled our model and its necessities into a portable unit, ensuring consistency and hassle-free deployment across various environments. The Dockerfile acted as a blueprint, orchestrating the assembly of our model's environment and dependencies.

Thorough testing validated the functionality and readiness of our Docker container. This culminated in its upload to Google's Container Registry, granting accessibility and scalability within the Vertex AI infrastructure.

The deployment phase involved meticulous orchestration using Google Cloud's command-line interface. From registering our model in the Model Registry to crafting endpoints and deploying to the Vertex AI environment, each step was pivotal in establishing a functional and deployable model.

Whether leveraging GPU acceleration or standard compute resources, the deployment to the endpoint marked the convergence of our efforts. Our model found its place within the Vertex AI ecosystem, poised for inference and seamless interaction.

Finally, the endpoint's functionality was demonstrated via a straightforward cURL command, affirming its readiness to process user-specific data and deliver personalized responses.

In essence, this deployment journey embodies the fusion of sophisticated technology and methodical implementation. Empowered by a suite of tools and guided by a systematic approach, it showcases the potential for innovation and efficiency within the machine-learning realm, fostering an environment ripe for transformative advancements.

10
Subscribe to my newsletter

Read articles from Mehar Ramzan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mehar Ramzan
Mehar Ramzan