How to Deploy LLaVA on Amazon SageMaker for Real-Time Image Analysis

How to Deploy LLaVA on Amazon SageMaker for Real-Time Image Analysis

Introduction

  • LLaVA (Large Language and Vision Assistant) is an open-source model classified as an LMM (Large Multimodal Model). It is being developed through a collaboration between Microsoft and various research institutions. One of its strengths is its exceptional image analysis capability, as it can infer the cultural and contextual background of an image. In this article, we will outline the steps to deploy the LLaVA-v1.5-7B model as an endpoint on SageMaker.

Why LLaVA?

  • The vision-integrated GPT-4o declines requests for detailed analysis of individuals in images, as shown below. In contrast, LLaVA provides specific person analysis as demonstrated.
[1] I'm unable to provide the detailed analysis you requested.
[2] Unfortunately, without the ability to see the specific facial features, a detailed analysis of the face is not possible.
[3] I cannot provide a detailed description of the facial features and expressions in the image.

How to Use the LLaVA Model

  • Use a cloud provider hosting the LLaVA model API. Examples include OpenRouter or Fireworks AI. This is the fastest and most convenient method as it requires no infrastructure setup and you only pay for what you use.

  • Deploy the LLaVA model directly as an endpoint on Amazon SageMaker. This option is ideal when the security of input and output data is crucial.

Prerequisites

  • Hugging Face account

  • Hugging Face access token with Write permissions

  • AWS account

  • Amazon SageMaker IAM Role

Generating a Hugging Face Access Token

  • To download repositories from Hugging Face, you need an Access Token with Write permissions.
Hugging Face
→ [Settings]
→ [Access Tokens]
→ [New token]
# Create a new access token
→ Name: {your-access-token-name}
→ Type: [Write]
→ [Generate a token]

Preparing for Amazon SageMaker Deployment

  • To create an endpoint on Amazon SageMaker, download the Hugging Face repository and upload it to Amazon S3.
# Install Hugging Face Hub
$ pip install huggingface_hub

# Login to Hugging Face Hub
$ Hugging Face Hub Login
Enter your token (input will not be visible): {your-hugging-face-access-token}

# Install Git LFS
$ sudo apt install git-lfs
$ git lfs install --skip-smudge

# Clone the LLaVA repository (using a fork modified for SageMaker deployment)
$ git clone https://huggingface.co/anymodality/llava-v1.5-7b
$ cd llava-v1.5-7b
$ git lfs pull
$ git lfs install --force

# Compress the model into model.tar.gz
$ tar -czvf model.tar.gz *

# Upload model.tar.gz to S3
$ aws s3 cp model.tar.gz s3://{your-s3-bucket}/llava-v1.5-7b

Creating an Amazon SageMaker Endpoint

  • Once the model is uploaded to Amazon S3, create an endpoint on Amazon SageMaker. You can specify the instance type and initial deployment count based on your project requirements.
# Write the Amazon SageMaker endpoint creation script
$ nano deploy.py
from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
    model_data='s3://{your-s3-bucket}/llava-v1.5-7b', 
    role='{your-sagemaker-iam-role-name}', 
    transformers_version="4.28.1", 
    pytorch_version="2.0.0", 
    py_version='py310', 
    model_server_workers=1
)

predictor = huggingface_model.deploy(
    endpoint_name='{your-sagemaker-endpoint-name}',
    initial_instance_count=1,
    instance_type="ml.g5.xlarge"
)

# Run the Amazon SageMaker endpoint creation script
$ python3 deploy.py

build.gradle.kts

  • Once the endpoint is created, you can call it. Add the following library dependency to the root of your Kotlin-based project.
dependencies {
    implementation("software.amazon.awssdk:sagemakerruntime:2.26.3")
}

Invoking the Amazon SageMaker Endpoint

  • At the code level, you can call the created endpoint as shown below.
// Create the request payload with the question and image URL
val requestPayload = """
    {
        "question":"Describe the image in detail.",
        "image":"{your-image-url}"
    }
""".trimIndent()

val sagemakerRuntimeClient = SageMakerRuntimeClient.builder()
    .build()

val invokeEndpointRequest = InvokeEndpointRequest.builder()
    .endpointName("{your-sagemaker-endpoint-name}")
    .contentType("application/json")
    .body(SdkBytes.fromUtf8String(requestPayload))
    .build()

// Invoke the SageMaker Endpoint
val response: InvokeEndpointResponse? = try {
    sagemakerRuntimeClient.invokeEndpoint(invokeEndpointRequest)
} catch (ex: ModelErrorException) {
    println(ex.originalMessage())
    null
}

// Print the response from the SageMaker Endpoint
println(response?.body()?.asUtf8String())

References