Model Ranking Insights#
Overview#
Model Ranking Insights (MRI) provides a powerful way to compare and rank different AI models based on their performance on specific tasks. They allow you to create standardized evaluation environments where multiple models can be tested against the same prompts and ranked based on human feedback.
How to use MRI#
1. Leaderboard Creation#
You start by creating a leaderboard with specific settings:
- Name: Identifies your leaderboard in the overview
- Instruction: The criteria upon which labelers choose the better model
- Prompts: A set of registered prompts that will be used by all models for media generation
- Show Prompt: Whether to display the prompt to evaluators. Including this option adds complexity and cost, so it is advised to only include it in settings where the prompt is necessary for the labelers to follow the instruction (e.g., prompt alignment).
2. Model Evaluation#
Once your leaderboard is set up, you can evaluate models by the following:
- Media: Images, videos, or audio files generated by your model
- Prompts: Each media file must be paired with the exact prompt used to generate it
All prompts must be from the leaderboard's registered prompt set (available through the prompts
attribute of the leaderboard)
Note: You are not limited to one media per prompt; you can create multiple variations based on the same prompt by adding the same prompt multiple times.
3. Matchmaking and Ranking#
MRI creates fair comparisons by:
- Prompt-based matching: Only media generated from the same prompt are compared against each other
- Mixed evaluation: New models are matched up with existing models to maximize the information gained
- User-driven assessment: Human evaluators compare model outputs based on the instruction to determine rankings
4. Results and Visibility#
Your leaderboard results are:
- Directly viewable on the Rapidata dashboard at app.rapidata.ai/mri/leaderboards
- Continuously updated as new models are added and evaluated
- Provides deeper insights into model performances over time
Getting Started#
Creating a Leaderboard#
Use the RapidataClient
to authenticate yourself and create a new leaderboard:
from rapidata import RapidataClient
# Initialize the client
# Running this the first time will open a browser window and ask you to login
client = RapidataClient()
# Create a new leaderboard
leaderboard = client.mri.create_new_leaderboard(
name="AI Art Competition",
instruction="Which image do you prefer?",
prompts=[
"A serene mountain landscape at sunset",
"A futuristic city with flying cars",
"A portrait of a wise old wizard"
],
show_prompt=False
)
Retrieving Existing Leaderboards#
You can retrieve leaderboards by ID or search for them:
# Get a specific leaderboard by ID
leaderboard = client.mri.get_leaderboard_by_id("leaderboard_id_here")
# Find leaderboards by name
recent_leaderboards = client.mri.find_leaderboards(
name="AI Art",
amount=5
)
Evaluating Models#
Add your model's outputs to the leaderboard:
# Evaluate a model
leaderboard.evaluate_model(
name="MyAIModel_v2.1",
media=[
"path/to/mountain_sunset.jpg",
"path/to/futuristic_city.jpg",
"path/to/wizard_portrait.jpg"
],
prompts=[
"A serene mountain landscape at sunset",
"A futuristic city with flying cars",
"A portrait of a wise old wizard"
]
)