The private model repository feature is currently in beta. Please join our Discord if you’d like to provide feedback.
Overview
Using cached models provides several key advantages:- Faster cold start times: Public models or private models stored in the repository are pre-cached on Runpod’s infrastructure, eliminating the need for workers to download them from external sources like Hugging Face.
- Reduced costs: You aren’t billed for worker time while your model is being downloaded. This is especially impactful for large models that can take several minutes to load.
- Centralized model management: Manage all your models directly within Runpod without needing to switch between platforms like Hugging Face or other model repositories.
- Accelerated deployment: Deploy pre-cached models instantly without waiting for external downloads or transfers.
- Version control: Store and manage different versions of the same model, allowing you to easily switch between versions for testing or production deployments.
- Smaller container images: By decoupling models from your container image, you can create smaller, more focused images that contain only your serving logic.
Public vs. private models
There are two types of cached models: Public models are popular models that Runpod has pre-cached for all users. These models appear automatically in your model selection dropdown and don’t require any upload process. You can start using them immediately when creating or updating endpoints. Private models are models you upload to the repository using the Runpod CLI (runpodctl
). Once uploaded, these models appear in the model selection dropdown alongside public models, giving you the same performance benefits while maintaining control over your proprietary or customized models.
How it works
When you select a model during Serverless endpoint creation, Runpod automatically tries to start your workers on hosts that already contain your selected model. If no pre-cached host machines are available, the system delays starting your workers until the model download completes, ensuring you still won’t be charged for the download time. The private model repository feature is available at no additional cost during the beta launch period.Manage private models
Make sure you’ve installed the CLI and configured it with your API key.Upload a model
You can upload any model from the Hugging Face Model Hub to the Runpod repository using the model identifier. To upload a model from Hugging Face, run the following command:YOUR_MODEL_NAME
with the model identifier from Hugging Face.
For example, to upload the stable-diffusion-xl-refiner-1.0
model, run:
List your models
To see a list of all models you’ve uploaded to the repository, run the following command:Get model details
To get detailed information about a specific model, run:YOUR_MODEL_ID
with the ID of your uploaded model.
For example, running runpodctl get model 4oqrsweux0fkcp
on the example output above would return:
Remove a model
When you no longer need a model uploaded to the private repository, you can remove it usingrunpodctl
. This cleans up your repository list and frees up storage space.
To remove a model, run the following command:
Before removing a model, ensure that none of your active endpoints are using it.
Use models in Serverless
When creating a new Serverless endpoint or updating an existing one, you can select models from your private model repository. To select a model from the repository, follow these steps:- Navigate to the Serverless section of the Runpod console.
- Click New Endpoint, or edit an existing endpoint.
- In the Endpoint Configuration step, scroll down to Model (optional) and click the dropdown. Your uploaded models will be listed under Organization Repository.
- Select your model from the list.
- Enter a Hugging Face token if you’re using a gated model.
- Complete your endpoint configuration and click Deploy Endpoint.