The private model repository feature is currently in beta. Please join our Discord if you’d like to provide feedback.
This guide provides an overview of how to use cached models with your Serverless endpoints, and instructions for managing private models with the Runpod CLI. The Runpod model repository allows you to upload your models directly to the Runpod ecosystem. By pre-caching models on our infrastructure, you can significantly reduce worker start times, lower costs, and improve the reliability of your Serverless endpoints.

Overview

Using cached models provides several key advantages:
  • Faster cold start times: Public models or private models stored in the repository are pre-cached on Runpod’s infrastructure, eliminating the need for workers to download them from external sources like Hugging Face.
  • Reduced costs: You aren’t billed for worker time while your model is being downloaded. This is especially impactful for large models that can take several minutes to load.
  • Centralized model management: Manage all your models directly within Runpod without needing to switch between platforms like Hugging Face or other model repositories.
  • Accelerated deployment: Deploy pre-cached models instantly without waiting for external downloads or transfers.
  • Version control: Store and manage different versions of the same model, allowing you to easily switch between versions for testing or production deployments.
  • Smaller container images: By decoupling models from your container image, you can create smaller, more focused images that contain only your serving logic.

Public vs. private models

There are two types of cached models: Public models are popular models that Runpod has pre-cached for all users. These models appear automatically in your model selection dropdown and don’t require any upload process. You can start using them immediately when creating or updating endpoints. Private models are models you upload to the repository using the Runpod CLI (runpodctl). Once uploaded, these models appear in the model selection dropdown alongside public models, giving you the same performance benefits while maintaining control over your proprietary or customized models.

How it works

When you select a model during Serverless endpoint creation, Runpod automatically tries to start your workers on hosts that already contain your selected model. If no pre-cached host machines are available, the system delays starting your workers until the model download completes, ensuring you still won’t be charged for the download time. The private model repository feature is available at no additional cost during the beta launch period.

Manage private models

Make sure you’ve installed the CLI and configured it with your API key.

Upload a model

You can upload any model from the Hugging Face Model Hub to the Runpod repository using the model identifier. To upload a model from Hugging Face, run the following command:
runpodctl create model \
    --provider huggingface \
    --name YOUR_MODEL_NAME
Replace YOUR_MODEL_NAME with the model identifier from Hugging Face. For example, to upload the stable-diffusion-xl-refiner-1.0 model, run:
runpodctl create model \
    --provider huggingface \
    --name stabilityai/stable-diffusion-xl-refiner-1.0

List your models

To see a list of all models you’ve uploaded to the repository, run the following command:
runpodctl get models
This will display all the models in your repository, allowing you to confirm successful uploads and check for duplicates. You should see output similar to the following:
ID         NAME              SOURCE          STATUS     SIZE(GB)  VERSION(SHORT)  
mdl_123    custom-llama-v1   HUGGING_FACE        READY      24.7        9f1c2ab           
mdl_456    llama31-8b        HUGGING_FACE    DOWNLOADING    -           -          

Get model details

To get detailed information about a specific model, run:
runpodctl get model YOUR_MODEL_ID
Replace YOUR_MODEL_ID with the ID of your uploaded model. For example, running runpodctl get model 4oqrsweux0fkcp on the example output above would return:
provider:   huggingface
name:   stabilityai/stable-diffusion-xl-refiner-1.0
createdDate:    2023-08-03T22:31:36.289Z
storagePath:    /stabilityai-stable-diffusion-xl-refiner-1.0/
id:             4oqrsweux0fkcp
bucketId:       pllmb-staging-cloud
regionSpecs:
- regionName:   Staging
  bucketName:    pllmb-staging-cloud
  multiplier:    8
  maxQuantity:   30
  maxIncrement:  5
  amount:        22

Remove a model

When you no longer need a model uploaded to the private repository, you can remove it using runpodctl. This cleans up your repository list and frees up storage space. To remove a model, run the following command:
runpodctl remove model \
  --provider huggingface \
    --name lodestones/Chroma
Before removing a model, ensure that none of your active endpoints are using it.

Use models in Serverless

When creating a new Serverless endpoint or updating an existing one, you can select models from your private model repository. To select a model from the repository, follow these steps:
  1. Navigate to the Serverless section of the Runpod console.
  2. Click New Endpoint, or edit an existing endpoint.
  3. In the Endpoint Configuration step, scroll down to Model (optional) and click the dropdown. Your uploaded models will be listed under Organization Repository.
  4. Select your model from the list.
  5. Enter a Hugging Face token if you’re using a gated model.
  6. Complete your endpoint configuration and click Deploy Endpoint.