I'm hoping that model_serving_container_image_uri and serving_container_image_uri both refer to the URI for the model serving container I'm going to make. I've already made a training container that trains a model and saves saved_model.pb to Google Cloud Storage. Other than having a flask app that handles the prediction and health check routes and a Dockerfile that exposes a port for the flask app, what else will I need to do to ensure the model serving container works in this pipeline? Where in the code do I install the model from GCS? In the Dockerfile? How is the model serving container meant to work so that everything will go swimmingly in the construction of the pipeline? I'm having trouble finding any tutorials or examples of precisely what I'm trying to do anywhere even though this seems like a pretty common scenario.
Hello,
Thank you for providing more information and context. Nonetheless, could you please clarify what you mean by deploying an endpoint? I suppose you mean using private endpoints[0] to serve online predictions with Vertex AI?
Please clarify the details so we can have a better understanding of your intentions.
[0]https://cloud.google.com/vertex-ai/docs/predictions/using-private-endpoints