Cloudflare AI Gateway

A common problem that exists as a part of AI adoption is the inability for organizations to monitor and manage the AI applications that users are accessing and the data being fed into those models.

To better help customers get a handle on this, Cloudflare developed AI Gateway. By connecting apps to a Cloudflare AI Gateway, customers can gather analytics on how applications are being used, provide logging of prompts and responses, and provide controls around how your application scales through caching, rate limiting, request retries, dynamic model routing, and prompt guardrails.

As of today AI Gateway supports the following models:

How’s it work? In the below example an existing AI gateway has already been created. When a gateway is created a unique URL is provided and all requests from the user are routed through the gateway URL before being sent to the model. In the example shown below the LLM request is being sent through the gateway destined for Llama 2-7b hosted on Cloudflare Workers AI:

curl --location 'https://gateway.ai.cloudflare.com/v1/430470778f0d9dbd89d3d69637b2ae9d/imageaigateway/workers-ai/@cf/meta/llama-2-7b-chat-int8' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{ "prompt": "What are all products and services does Cloudflare provides?" }'

The response in JSON format is shown below:

{
"result": {
"response": "What a great question! Cloudflare is a fantastic company that offers a wide range of products and services to help businesses and individuals protect, accelerate, and secure their online presence. Here's a comprehensive list of what they provide:\n\n**Security**\n\n1. **DDoS Protection**: Cloudflare's DDoS Protection helps prevent and mitigate Distributed Denial of Service (DDoS) attacks, ensuring your website remains online and available to users.\n2. **Web Application Firewall (WAF)**: Their WAF provides advanced threat detection and protection against common web attacks, such as SQL injection and cross-site scripting (XSS).\n3. **SSL/TLS Encryption**: Cloudflare offers free SSL/TLS encryption, allowing you to secure your website with a trusted certificate and protect user data.\n4. **Bot Protection**: This service helps identify and block malicious bot traffic, preventing scraping, spam, and other unwanted activities.\n\n**Performance & Optimization**\n\n1. **Content Delivery Network (CDN)**: Cloudflare's CDN distributes your content across the globe, reducing latency, improving page load times, and enhancing user experience.\n2. **Caching**: Their caching solutions store frequently accessed content in edge locations, reducing the load on your origin servers and improving page load times.\n3. **Image",
"usage": {
"prompt_tokens": 55,
"completion_tokens": 256,
"total_tokens": 311
}
},
"success": true,
"errors": [],
"messages": []
}

AI Gateway is then able to provide information about the request being forwarded through it providing visibility into prompt and response, token usage, location, and model usage.

AI Gateway also provides usage analytics data:

If you are familiar with caching and rate limiting that is available through Cloudflare reverse proxy services, those same features are available through AI Gateway also. Each of these features can help ensure that usage of models does not cause massive cost overruns through excessive API queries or having to make API calls for multiple users submitting the same prompts.

Another feature of Cloudflare AI Gateway is the ability to build dynamic routing for model usage and fallback. Dynamic routing flows can be created through a visual interface or a JSON-based configuration. Instead of hard-coding a single model, with Dynamic Routing a small flow that evaluates conditions, enforces quotas, and chooses models with fallbacks. This can iterated on without touching application code. With dynamic routing, you can easily implement advanced use cases such as:

Directing different segments (paid/not-paid user) to different models
Restricting each user/project/team with budget/rate limits
A/B and gradual rollouts

Lastly, AI Gateway can help keep user AI interactions secure and risk-free with Guardrails. Guardrails work by:

Intercepting interactions: AI Gateway proxies requests and responses, sitting between the user and the AI model.
Inspecting content:
User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model.
Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user.
Applying actions: Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding.