top of page
  • Writer's pictureEndri Veizaj

Monitoring LLMs With Grafana and CloudWatch.



In the realm of cutting-edge technology and Generative AI, Large Language Models (LLMs) are the driving force behind remarkable advancements. These language models, such as chatGPT and its successors, possess the ability to understand and generate human-like text on a scale never seen before. However, with great power comes the need for diligent monitoring.

Picture this scenario: you're utilizing a LLM to power your applications, enabling them to generate human-like text, answer questions, or even compose creative content. The performance of your LLM is pivotal, not just for user satisfaction but also for ensuring that the generated content aligns with your application's objectives. But how do you keep tabs on the intricate workings of an LLM at scale.


In this blog post, we dive into the world of monitoring LLMs, and we do so with a formidable trio: Grafana, AWS CloudWatch, and Lambda. These tools when orchestrated strategically, enable you to not only generate LLM metrics but also log them efficiently for analysis and visualisation.


Let's embark on the journey of LLM monitoring and harness the potential of Lambda, AWS CloudWatch, and Grafana to elevate your LLM-based applications.


LLM Metrics - Toxicity and Sentiment Analysis:


Monitoring Large Language Models (LLMs) involves capturing metrics that provide insight into various aspects of their performance. In our setup, we are particularly interested in two key metrics: Toxicity and Sentiment.


  • Toxicity Analysis with Martin-Ha/Toxic-Comment-Model: Toxicity analysis is a crucial aspect of content moderation and ensuring a safe and respectful user experience. To accomplish this we employ the martin-ha/toxic-comment-model, a Hugging Face model that excels in identifying toxic comments within text.

  • Sentiment Analysis with Vader Lexicon from NLTK: Understanding the sentiment of the text generated by our LLM is equally important and for this we rely on the Vader Lexicon from the Natural Language Toolkit (NLTK).


Both of these models are stored in an Amazon S3 Bucket, eliminating the need to download the model from the internet. This reduces latency and keeps all the operations within AWS network, optimising data transfer and resource utilisation. The Sentiment and Toxicity score are logged in AWS CloudWatch providing a centralised repository for monitoring and historical analysis.

AWS Architecture

Deploying Grafana in ECS from DockerHub:


Grafana is a powerful tool for visualizing and monitoring metrics, and deploying it in Amazon Elastic Container Service (ECS) can provide scalability and flexibility for your monitoring needs. Here's a step-by-step guide on how to set up Grafana using the official DockerHub image in ECS:


Step 1: Prepare Your ECS Cluster:


Before deploying Grafana, make sure you have an ECS cluster set up and ready. If you haven't created one yet, you can do so through AWS Management Console.


ECS Cluster Configuration

Step 2: Create a Task Definition:


In ECS, a task definition defines the containers that will run as part of your application. To create a task definition for Grafana:

  1. Navigate to the ECS service in the AWS Management Console

  2. Click "Task Definitions" on the left sidebar

  3. Click the "Create new Task Definition" button

  4. Select the "Fargate" launch type.

  5. Configure the container settings:

    1. Name: Enter a name for your Grafana Container

    2. Image: Use the official Grafana image from DockerHub (e.g., `grafana/grafana`)

    3. Port Mapping: Expose the port Grafana uses (usually 3000)

  6. In the "Task Role" section, create or select an ECS task role that has the necessary permission to access CloudWatch. This role will allow Grafana to connect with CloudWatch for monitoring purposes. Note: Ensure that the ECS task role is granted the appropriate permissions, including read access to CloudWatch for monitoring purposes.


ECS Task Definition

Step 3: Configure your ECS Service:

  1. In the ECS service dashboard, click "Clusters' and select your cluster.

  2. Click "Create Service"

  3. Choose your task definition created in the previous step.

  4. Configure your network settings, including VPC, subnets, and security groups

Step 4: Access Grafana:


Once your service is up and running, you can access Grafana bu finding the public IP of your ECS instances. Grafana typically runs on port 3000, so you can access it by navigating to `http://<public-ip>:3000` in your web browser.


Step 5: Set up Grafana:

  1. When you first access Grafana, you'll be prompted to log in. The default username and password are often "admin/admin". Be sure to change the password immediately

  2. Configure data source such as AWS CLoudWatch, to start collecting and visualizing your metrics.

  3. Create dashboards and panels to display the data you're monitoring.


Creating a CloudWatch Logs Insights Query:


Here's an example query that extracts relevant data from your CloudWatch Logs for analysis in Grafana.


fields @timestamp, @message

| parse @message "{'toxicity': *}" as toxicity

| parse @message "{'Sentiment': *}" as sentiment

| sort @timestamp desc


Grafana Dashboard for Toxicity and Sentiment Analysis for our LLM Models (example)

Summary:


In the realm of Generative AI, monitoring Large Language Models (LLMs) has become essential. This blog post has explored the world of LLM monitoring using powerful tools like Grafana, AWS CloudWatch, and Lambda.


We've also seen the advantages of deploying models in Amazon S3, which keeps data within the AWS network for optimal performance. Additionally, deploying Grafana in Amazon Elastic Container Service (ECS) using the official DockerHub image has allowed us to create dynamic dashboards for real-time monitoring.



Comments


bottom of page