Public endpoints

Runpod public endpoints provide instant access to state-of-the-art AI models through simple API calls, with an API playground available through the Runpod Hub.

Available models

The following models are currently available:

Model	Description	Endpoint URL
Flux Dev	Offers exceptional prompt adherence, high visual fidelity, and rich image detail.	`https://api.runpod.ai/v2/black-forest-labs-flux-1-dev/`
Flux Schnell	Fastest and most lightweight FLUX model, ideal for local development, prototyping, and personal use.	`https://api.runpod.ai/v2/black-forest-labs-flux-1-schnell/`
Qwen3 32B AWQ	The latest LLM in the Qwen series, offering advancements in reasoning, instruction-following, agent capabilities, and multilingual support.	`https://api.runpod.ai/v2/qwen3-32b-awq/`

Public endpoint playground

The public endpoint playground provides a streamlined way to discover and experiment with AI models. The playground offers:

Interactive parameter adjustment: Modify prompts, dimensions, and model settings in real-time.
Instant preview: Generate images directly in the browser.
Cost estimation: See estimated costs before running generation.
API code generation: Create working code examples for your applications.

Access the playground

Navigate to the Runpod Hub in the console.
Select the Public endpoints section.
Browse the available models and select one that fits your needs.

Test a model

To test a model in the playground:

Select a model from the Runpod Hub.
Under Input, enter a prompt in the text box.
Enter a negative prompt if needed. Negative prompts tell the model what to exclude from the output.
Under Additional settings, you can adjust the seed, aspect ratio, number of inference steps, guidance scale, and output format.
Click Run to start generating.

Under Result, you can use the dropdown menu to show either a preview of the output, or the raw JSON.

Create a code example

After inputting parameters using the playground, you can automatically generate an API request to use in your application.

Select the API tab in the UI (above the Input field).
Using the dropdown menu, select the programming language (Python, JavaScript, cURL, etc.) and POST command you want to use (/run or /runsync).
Click the Copy icon to copy the code to your clipboard.

Make API requests to public endpoints

You can make API requests to public endpoints using any HTTP client. The endpoint URL is specific to the model you want to use. All requests require authentication using your Runpod API key, passed in the Authorization header. You can find and create API keys in the Runpod console under Settings > API Keys.

To learn more about the difference between synchronous and asynchronous requests, see Endpoint operations.

Synchronous request example

Here’s an example of a synchronous request to Flux Dev using the /runsync endpoint:

curl

curl -X POST "https://api.runpod.ai/v2/black-forest-labs-flux-1-dev/runsync" \
  -H "Authorization: Bearer RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "A serene mountain landscape at sunset",
      "width": 1024,
      "height": 1024,
      "num_inference_steps": 20,
      "guidance": 7.5
    }
  }'

Asynchronous request example

Here’s an example of an asynchronous request to Flux Dev using the /run endpoint:

curl

curl -X POST "https://api.runpod.ai/v2/black-forest-labs-flux-1-dev/run" \
  -H "Authorization: Bearer RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "A futuristic cityscape with flying cars",
      "width": 1024,
      "height": 1024,
      "num_inference_steps": 50,
      "guidance": 8.0
    }
  }'

You can check the status and retrieve results using the /status endpoint, replacing {job-id} with the job ID returned from the /run request:

curl

curl -X GET "https://api.runpod.ai/v2/black-forest-labs-flux-1-dev/status/{job-id}" \
  -H "Authorization: Bearer RUNPOD_API_KEY"

Response format

All endpoints return a consistent JSON response format:

{
  "delayTime": 17,
  "executionTime": 3986,
  "id": "sync-0965434e-ff63-4a1c-a9f9-5b705f66e176-u2",
  "output": {
    "cost": 0.02097152,
    "image_url": "https://image.runpod.ai/6/6/mCwUZlep6S/453ad7b7-67c6-43a1-8348-3ad3428ef97a.png"
  },
  "status": "COMPLETED",
  "workerId": "oqk7ao1uomckye"
}

Model-specific parameters

Each endpoint accepts a different set of parameters to control the generation process.

Flux Dev

Flux Dev is optimized for high-quality, detailed image generation. The model accepts several parameters to control the generation process:

{
  "input": {
    "prompt": "A serene mountain landscape at sunset",
    "negative_prompt": "Snow",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 20,
    "guidance": 7.5,
    "seed": 42,
    "image_format": "png"
  }
}

Parameter	Type	Required	Default	Range	Description
`prompt`	string	Yes	-	-	Text description of the desired image.
`negative_prompt`	string	No	-	-	Elements to exclude from the image.
`width`	integer	No	1024	256-1536	Image width in pixels. Must be divisible by 64.
`height`	integer	No	1024	256-1536	Image height in pixels. Must be divisible by 64.
`num_inference_steps`	integer	No	28	1-50	Number of denoising steps.
`guidance`	float	No	7.5	0.0-10.0	How closely to follow the prompt.
`seed`	integer	No	-1	-	Provide a seed for reproducible results. The default value (-1) will generate a random seed.
`image_format`	string	No	”jpeg"	"png” or “jpeg”	Output format.

Flux Schnell

Flux Schnell is optimized for speed and real-time applications:

{
  "input": {
    "prompt": "A quick sketch of a mountain",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 4,
    "guidance": 1.0,
    "seed": 123
  }
}

Parameter	Type	Required	Default	Range	Description
`prompt`	string	Yes	-	-	Text description of the desired image.
`negative_prompt`	string	No	-	-	Elements to exclude from the image.
`width`	integer	No	1024	256-1536	Image width in pixels. Must be divisible by 64.
`height`	integer	No	1024	256-1536	Image height in pixels. Must be divisible by 64.
`num_inference_steps`	integer	No	4	1-8	Number of denoising steps.
`guidance`	float	No	7.5	0.0-10.0	How closely to follow the prompt.
`seed`	integer	No	-1	-	Provide a seed for reproducible results. The default value (-1) will generate a random seed.
`image_format`	string	No	”jpeg"	"png” or “jpeg”	Output format.

Flux Schnell is optimized for speed and works best with lower step counts. Using higher values may not improve quality significantly.

vLLM endpoints

The following vLLM models are available:

Model	Endpoint URL	OpenAI API Model Name
Qwen3 32B AWQ	`https://api.runpod.ai/v2/qwen3-32b-awq/`	`Qwen/Qwen3-32B-AWQ`

To learn more about sending requests to vLLM public endpoints, see Send vLLM requests.

OpenAI API code example

Use the OpenAI API Model Name in the table above as the model name parameter. Here’s an example of how to use the Qwen3 32B AWQ model with the OpenAI API:

from openai import OpenAI
import os

PUBLIC_ENDPOINT_ID = "qwen3-32b-awq"
model_name = "Qwen/Qwen3-32B-AWQ"

client = OpenAI(
    api_key=RUNPOD_API_KEY,
    base_url=f"https://api.runpod.ai/v2/{PUBLIC_ENDPOINT_ID}/openai/v1",
)
messages = [
    {
        "role": "system",
        "content": "You are a pirate chatbot who always responds in pirate speak!",
    },
    {   "role": "user", 
        "content": "Give me a short introduction to LLMs."
    },
]

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    max_tokens=525,
)

To learn more about sending requests to vLLM public endpoints with the OpenAI-compatible API, see OpenAI API compatibility.

OpenAI API streaming example

You can stream responses from the OpenAI API using the stream and stream_options parameters:

from openai import OpenAI
import os

PUBLIC_ENDPOINT_ID = "qwen3-32b-awq"
model_name = "Qwen/Qwen3-32B-AWQ"

client = OpenAI(
    api_key=RUNPOD_API_KEY,
    base_url=f"https://api.runpod.ai/v2/{PUBLIC_ENDPOINT_ID}/openai/v1",
)
messages = [
    {
        "role": "system",
        "content": "You are a pirate chatbot who always responds in pirate speak!",
    },
    {   "role": "user", 
        "content": "Give me a short introduction to LLMs."
    },
]

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    max_tokens=525,
    stream_options={"include_usage": True},
    stream=True
)

stream_options={"include_usage": True} is required for streaming to work with vLLM public endpoints.

vLLM Response format

{
  "delayTime": 25,
  "executionTime": 3153,
  "id": "sync-0f3288b5-58e8-46fd-ba73-53945f5e8982-u2",
  "output": [
    {
      "choices": [
        {
          "tokens": [
            "Large Language Models (LLMs) are AI systems trained to predict and understand human language. They learn patterns from vast amounts of text data, enabling them to generate responses, answer questions, and complete tasks in natural language. Key characteristics of LLMs include:\n1. Language Understanding\n- Can analyze and comprehend language structure, context, and nuances\n- Process both inputs and outputs in natural human language\n\n2. Pattern Recognition\n- Learn common phrases and relationships"
          ]
        }
      ],
      "cost": 0.0001,
      "usage": {
        "input": 10,
        "output": 100
      }
    }
  ],
  "status": "COMPLETED",
  "workerId": "pkej0t9bbyjrgy"
}

Python example

Here is an example Python API request to Flux Dev using the /run endpoint:

import requests

headers = {"Content-Type": "application/json", "Authorization": "Bearer RUNPOD_API_KEY"}

data = {
    "input": {
        "prompt": "A serene mountain landscape at sunset",
        "image_format": "png",
        "num_inference_steps": 25,
        "guidance": 7,
        "seed": 50,
        "width": 1024,
        "height": 1024,
    }
}

response = requests.post(
    "https://api.runpod.ai/v2/black-forest-labs-flux-1-dev/run",
    headers=headers,
    json=data,
)

You can generate public endpoint API requests for Python and other programming languages using the public endpoint playground.

JavaScript/TypeScript integration with Vercel AI SDK

For JavaScript and TypeScript projects, you can use the @runpod/ai-sdk-provider package to integrate Runpod’s public endpoints with the Vercel AI SDK. Run this command to install the package:

npm install @runpod/ai-sdk-provider ai

To call a public endpoint for text generation:

import { runpod } from '@runpod/ai-sdk-provider';
import { generateText } from 'ai';

const { text } = await generateText({
  model: runpod('qwen3-32b-awq'),
  prompt: 'Write a Python function that sorts a list:',
});

For image generation:

import { runpod } from '@runpod/ai-sdk-provider';
import { experimental_generateImage as generateImage } from 'ai';

const { image } = await generateImage({
  model: runpod.imageModel('flux/flux-dev'),
  prompt: 'A serene mountain landscape at sunset',
  aspectRatio: '4:3',
});

For comprehensive documentation and examples, see the Node package documentation.

Pricing

Public endpoints use transparent, usage-based pricing:

Model	Price	Billing unit
Flux Dev	$0.02	Per megapixel
Flux Schnell	$0.0024	Per megapixel

Pricing is calculated based on the actual output resolution. You will not be charged for failed generations.

Pricing examples

Below are some pricing examples that show how you can estimate costs for different image sizes:

512×512 image (0.25 megapixels)
- Flux Dev: (512 * 512 / 1,000,000) * $0.02 = $0.00524288
- Flux Schnell: (512 * 512 / 1,000,000) * $0.0024 = $0.0006291456
1024×1024 image (1 megapixel)
- Flux Dev: (1024 * 1024 / 1,000,000) * $0.02 = $0.02097152
- Flux Schnell: (1024 * 1024 / 1,000,000) * $0.0024 = $0.0025165824

Runpod’s billing system rounds up after the first 10 decimal places.

Best practices

When working with public endpoints, following best practices will help you achieve better results and optimize performance.

Prompt engineering

For prompt engineering, be specific with detailed prompts as they generally produce better results. Include style modifiers such as art styles, camera angles, or lighting conditions. For Flux Dev, use negative prompts to exclude unwanted elements from your images. A good prompt example would be: “A professional portrait of a woman in business attire, studio lighting, high quality, detailed, corporate headshot style.”

Performance optimization

For performance optimization, choose the right model for your needs. Use Flux Schnell when you need speed, and Flux Dev when you need higher quality. Standard dimensions like 1024×1024 render fastest, so stick to these unless you need specific aspect ratios. For multiple images, use asynchronous endpoints to batch your requests. Consider caching results by storing generated images to avoid regenerating identical prompts.

Get started

Serverless

Hub

Pods

Instant Clusters

Fine-tuning

Reference

Available models

Public endpoint playground

Access the playground

Test a model

Create a code example

Make API requests to public endpoints

Synchronous request example

Asynchronous request example

Response format

Model-specific parameters

Flux Dev

Flux Schnell

vLLM endpoints

OpenAI API code example

OpenAI API streaming example

vLLM Response format

Python example

JavaScript/TypeScript integration with Vercel AI SDK

Pricing

Pricing examples

Best practices

Prompt engineering

Performance optimization

Get started

Serverless

Hub

Pods

Instant Clusters

Fine-tuning

Reference

​Available models

​Public endpoint playground

​Access the playground

​Test a model

​Create a code example

​Make API requests to public endpoints

​Synchronous request example

​Asynchronous request example

​Response format

​Model-specific parameters

​Flux Dev

​Flux Schnell

​vLLM endpoints

​OpenAI API code example

​OpenAI API streaming example

​vLLM Response format

​Python example

​JavaScript/TypeScript integration with Vercel AI SDK

​Pricing

​Pricing examples

​Best practices

​Prompt engineering

​Performance optimization

Available models

Public endpoint playground

Access the playground

Test a model

Create a code example

Make API requests to public endpoints

Synchronous request example

Asynchronous request example

Response format

Model-specific parameters

Flux Dev

Flux Schnell

vLLM endpoints

OpenAI API code example

OpenAI API streaming example

vLLM Response format

Python example

JavaScript/TypeScript integration with Vercel AI SDK

Pricing

Pricing examples

Best practices

Prompt engineering

Performance optimization