What is the VertexAI model garden - a practical example: Deploying Llama 3.1 and Claude 3.5 Sonnet

NOTE: This blog is an introduction to the Google Vertex AI Model Garden. We are not affiliated with Google. This article is also not to be understood as a review, but informational material as part of our experience using it.

Google Cloud's Vertex AI Model Garden is an established addition to their AI toolkit, designed to streamline the process of deploying pre-trained AI models. It's part of the broader Vertex AI ecosystem and aims to simplify the often complex task of integrating AI capabilities into applications.

The Model Garden is Google's answer to the growing demand for accessible AI model deployment. It's particularly interesting because it builds on Google's quite strong Google Cloud Platform and offers a remarkable collection of pretrained models out of the box. From state of the art LLMs like Claude 3.5 Sonnet, to multi-modal models like CLIP and image generation models like Stable Diffusion.

In this article, we'll take a practical look at what Vertex AI Model Garden offers. We'll explore how it can potentially speed up the development of AI-powered applications, and we'll walk through a hands-on example of selecting and deploying a model for a specific task. Whether you're a seasoned AI developer or just starting to explore the field, this guide should give you a clear picture of how Model Garden fits into the AI development landscape.

What is the Google Vertex AI Model Garden

Google Vertex AI Model Garden is a comprehensive platform within Google Cloud's Vertex AI suite, designed to streamline the process of working with AI models. It offers a huge collection of pre-trained models from Google, OpenAI, Mistral, Anthropic, Meta and select open-source communities, aiming to simplify the integration of AI capabilities into various applications.

Summary: Features of the Vertex AI Model Garden

Model Selection: The platform provides a range of AI models, categorized as foundation models, fine-tunable models, and task-specific solutions. Users can browse these models based on their intended use case.

Customization Options: Vertex AI Model Garden allows for some level of model customization. Users can fine-tune certain models using their own data, which can be useful for adapting models to specific tasks.

Deployment Process: The platform integrates with Google Cloud services, which can streamline the process of deploying models as API endpoints. This integration is intended to work within existing Google Cloud infrastructure.

Open Source Inclusion: Along with Google's proprietary models, the platform includes some open-source models. This broadens the range of available options for users.

Interface: Vertex AI Model Garden has a user interface designed to help users navigate and manage models. It includes filtering options to help users find models that match their needs. Deploying a model is designed to be done with a few clicks.

In essence, Google Vertex AI Model Garden is aimed to be a significant step towards making AI more accessible and manageable. It offers a good foundation for businesses and developers looking to use AI without having to invest in model operations.

Vertex AI Model
Garden Vertex AI Model Garden

Why using the Vertex AI Model Garden to deploy models?

Before diving into the nitty gritty details of how to actually use the Model Garden, let's first think about why anyone would use the Model Garden.

Huge Model Repository: Vertex AI Model Garden provides a vast collection of models from both Google, other commercial model providers like OpenAI and Mistral and the open-source community. This repository includes foundation models, fine-tunable models, and task-specific solutions, covering a wide range of applications and industries. The availability of such a diverse set of models in one place simplifies the process of finding the right model. And furthermore, it drastically reduced complexity of your deployment pipeline. You have one single place where you can invoke all your model needs.
Ease of Customization and Fine-Tuning: One of the standout features of Vertex AI Model Garden is its support for model customization and fine-tuning. Users can adapt pre-trained models to their specific requirements using AI Studio, Vertex AI API, or custom notebooks. This flexibility allows for potentially significant increases in model performance in specific use-cases, without the need for extensive machine learning expertise.
Seamless Integration with Google Cloud Services: Vertex AI Model Garden integrates seamlessly with Google Cloud’s suite of services. This integration allows users to deploy models quickly and efficiently as API endpoints, making use of the robust infrastructure and scalability of Google Cloud. Additionally, these models can scale to meet demand while maintaining performance and reliability.
Simplified Deployment Process: The platform offers a user-friendly interface that simplifies the deployment process, making it accessible even to those with minimal AI experience. With features like one-click deployment and pre-configured settings for various models, users can deploy AI models with ease, reducing the time and resources typically required for such tasks.
Enhanced Security and Compliance: Deploying models within the Vertex AI ecosystem ensures that they benefit from Google Cloud’s advanced security features and compliance certifications. This is important for organizations that handle sensitive data and must adhere to strict regulatory requirements. The secure environment provided by Google Cloud helps safeguard data and models throughout the deployment lifecycle.

This is one of the major benefits of the model garden. Users get Llama, Claude, Gemeni, CLIP, etc. models all from one provider - Google. They only need to trust one company - and don't require to do due diligence for all the model providers they use.
Potential cost savings: As the whole deployment pipeline is provided by the Model Garden infrastructure, the costs of maintaining the "auxiliary infrastructure" around model deploument are reduced. Furthermore, Model Garden provides an API service deployment option for some models which uses a simple pay-per-use pricing - this might significantly reduce costs for some users.

In more layman terms, the Vertex AI Model Garden is a one-stop shop for AI model deployments. It's great that they provide many models in a single place and it drastically reduces efforts required for security, data privacy and compliance.

How to use the Vertex AI Agent Builder to deploy models

Ok, enough theory, let's get started. There are two ways to deploy models using the Vertex AI Model Garden.

First, the more traditional way, and the way most models ar available: Deploying the model on a virtual machine. In this deployment model, you select the model you want to deploy. Upon clicking on 'deploy', the Model Garden will automatically create a virtual machine as well as an API endpoint, serving the model. Most of the Llama models as well as Mistral models are served this way.

It's worth noting, that you are not responsible for maintaining the virtual machine - that's all done by Google. Nevertheless, you are paying for a full virtual machien on a per-hour basis.

The second - and arguibly more convenient - way is to use the Model Gardens API service model deployments. In this deployment model, you simply activate the model you want to deploy. Google will provide an API token token for you to use. You can then use this token to send requests to the model endpoint. You pay on a per request basis.

Deploy Claude 3.5 Sonnet on the Google Vertex AI Model Garden

Let's now start by deploying a model on a virtual machine. The more traditional way of deploying models with the Model Garden.

As the latest version of Claude, 3.5 Sonnet is getting raving reviews, let's use this model as an example.

Navigate the to Vertex AI Model Garden.
Use the search box to look for the model you want to deploy - Claude 3.5 in our example case.

Search for Claude 3.5 Sonnet
In the next screen, click on "Enable" to enable this model. This is only required the first time you deploy this specific model.

Enable Claude 3.5 Sonnet
For some models, Google requires us to fill out a form to get access to the model. This is due to the fact that Google simply acts as hosting provider for the 3rd party companies. For example Anthropic requires Google to allow Anthropic to approve usage of Claude 3.5 on a per user basis. So, fill out the firm if required and wait until you are confirmed.
Upon completing the form, you'll get pricing information. Accept the terms and conditions and click "Agree". Note that you don't have to pay anything just yet. Only when using the model.

Claude 3.5 Sonnet Pricing
Now in the final screen click on "Manage on Vertex AI" - and that's all there is.

We successfully deployed a model endpoint for using Claude 3.5 Sonnet via Google Cloud. In a technical sense we did not really deploy infrastrucutre. From what we see, we simply got access

In the next screen, Vertex AI provides you with information how you can use this model endpoint. For example, using python you can simply create chat inferences:

1from anthropic import AnthropicVertex
2
3LOCATION="europe-west4" # or "europe-west4"
4
5client = AnthropicVertex(region=LOCATION, project_id="PROJECT_ID")
6
7message = client.messages.create(
8  max_tokens=1024,
9  messages=[
10    {
11      "role": "user",
12      "content": "Send me a recipe for banana bread.",
13    }
14  ],
15  model="claude-35-sonnet@20240307",
16)
17print(message.model_dump_json(indent=2))

Using Llama 3.1 on the Model Garden API service

In a similar fashion - but even easier - is using the Llama 3.1 API service on the Model Garden.

Navigate to the Vertex AI Model Garden Llama 3.1 API Service. Click on "Enable" - and that's it.

Again, similar to the Claude 3.5 Sonnet model, you'll get access to the modemodel endpoint. You can then use this model endpoint to send requests to the model. The Model Garden does a good job in clearly describing how to use this endpoint.

Using the Llama 3.1 API
Service Using the Llama 3.1 API Service

Deploy a CLIP model on the Google Vertex AI Model Garden

Now that we explored two of the more prominent model deployments via API service, let's deploy a model on a virtual machine

Now that we explored two of the more prominent model deployments via API service, let's deploy a model on a virtual machine.

Navigate to the Vertex AI Model Garden.
Search for the CLIP model. CLIP is a model capable of classifying images without first needing to train on labeled data.

Search for CLIP
Click on "Deploy". A side-panel will open up.
In the side-panel:
- provide a name for your model and model endpoint
- select the virtual machine size. You mainly select between how many and which GPUs you want to use. The model garden provides only a reasonable list of sizes to choose from. So no matter what you select, you can be sure that the model is gonna work on this machine.
Deploy CLIP
After clicking "Deploy" the resources will be provisioned. This can take a while.

After the resources are provisioned, head over to "My endpints". On the top right, select the "Region" which you selected during model deployment. There you'll find your deployed model endpoint.

Model Garden deployed endpoint

Click on the model name to get more information about the deployed model and how to use it.

NOTE: In this deployment method, Model Garden explicitly sets up a virtual machine to deploy your model on. This provides the highest level of control and probably compliance - however it also means you pay per hour of usage. So, no matter whether you actually use the model or not, you'll get billed for as long as the deployment is active.

Deploying models from Hugging Face on the Google Vertex AI Model Garden

As our last example, we'll deploy a model from the Hugging Face model hub.

Yes that's right, the Model Garden also supports deploying models directly from Hugging Face - making the pool of supported models almost endless.

On the very top of the Model Garden interface, click on "Deploy from Hugging Face". A side panel will open up.
Enter the Hugging Face model name, your Hugging Face API token and specify the size of the virtual machine you want to use.

Deploy from Hugging Face
Click on "Deploy" - and once again that's all that needs to be done.

If you don't have a Hugging Face API token yet, you can get one by

Signing up for an account
Get a token in your Hugging Face profile. The token has a format like like hf_xxxxxxx.

Conclusion

The Google Vertex AI Model Garden represents a good step forward in democratizing access to advanced AI models and streamlining their deployment process. As we've explored in this article, the platform offers a great combination of features that make it an attractive option for businesses and developers looking to use AI without the complexities traditionally associated with model deployment and management.

Key takeaways from our exploration of the Vertex AI Model Garden include:

Simplified Access: The platform provides a centralized repository of diverse AI models, ranging from state-of-the-art language models like Claude 3.5 Sonnet and Llama 3.1 to specialized models like CLIP, all accessible through a unified interface.
Flexible Deployment Options: Users can choose between API service deployments for models like Claude 3.5 and Llama 3.1, offering pay-per-use pricing, or virtual machine deployments for models like CLIP, providing more control and potentially better compliance options.
Security and Compliance: By consolidating multiple models under Google Cloud's infrastructure, the Model Garden simplifies the security and compliance landscape for organizations working with AI.
User-Friendly Interface: Even for complex models, the deployment process is simplified to a few clicks, making advanced AI accessible to a broader range of users.

By lowering the barriers to entry and simplifying the deployment process, Google is providing an intriguing environment for AI usage.

While the platform is not without its considerations – such as potential lock-in to the Google Cloud ecosystem and the need for careful management of virtual machine deployments – the benefits it offers in terms of accessibility, variety, and ease of use make it a compelling option for many use cases.

Interested in building high-quality AI agent systems?

We prepared a comprehensive guide based on cutting-edge research for how to build robust, reliable AI agent systems that actually work in production. This guide covers:

Understanding the 14 systematic failure modes in multi-agent systems
Evidence-based best practices for agent design
Structured communication protocols and verification mechanisms

Get your free AI agents guide