Model Ranking Insights#
Overview#
Model Ranking Insights (MRI) provides a powerful way to compare and rank different AI models based on their performance on specific tasks. They allow you to create standardized evaluation environments where multiple models can be tested against each other and ranked based on human feedback.
How to use MRI#
1. Benchmark Creation#
You start by creating a benchmark with specific settings:
- Name: Identifies your benchmark in the overview
- Identifiers: A list of registered identifiers that will be used to identify the uploaded media in the evaluation step.
- Prompts: A list of prompts that will be shown on the matchup for leaderboards where the
show_prompt
attribute is set toTrue
.
2. Leaderboard Creation#
Once your benchmark is set up, you can create leaderboards for it.
- Name: Identifies your leaderboard in the overview
- Instruction: The criteria upon which labelers choose the better model
- Show Prompt: Whether to display the prompt to evaluators. Including this option adds complexity and cost, so it is advised to only include it in settings where the prompt is necessary for the labelers to follow the instruction (e.g., prompt alignment).
Note: You can find all leaderboards for a benchmark by using the
leaderboards
attribute of the benchmark.
3. Model Evaluation#
Once your benchmark and leaderboard are set up, you can evaluate models by the following:
- Media: Images, videos, or audio files generated by your model
- Identifiers: Each media file must be paired with an identifier
All identifiers must be from the benchmark's registered identifier set (available through the identifiers
attribute of the benchmark)
Note: You are not limited to one media per identifier; you can supply the same identifier multiple times.
4. Matchmaking and Ranking#
MRI creates fair comparisons by:
- Identifier-based matching: Only media with the same identifier are compared against each other
- Mixed evaluation: New models are matched up with existing models to maximize the information gained
- User-driven assessment: Human evaluators compare model outputs based on the instruction to determine rankings
5. Results and Visibility#
Your leaderboard results are:
- Directly viewable on the Rapidata dashboard at app.rapidata.ai/mri/leaderboards
- Continuously updated as new models are added and evaluated
- Provides deeper insights into model performances over time
Getting Started#
Creating a Leaderboard#
Use the RapidataClient
to authenticate yourself and create a new leaderboard:
from rapidata import RapidataClient
# Initialize the client
# Running this the first time will open a browser window and ask you to login
client = RapidataClient()
# Create a new benchmark
benchmark = client.mri.create_new_benchmark(
name="AI Art Competition",
prompts=[
"A serene mountain landscape at sunset",
"A futuristic city with flying cars",
"A portrait of a wise old wizard"
],
identifiers=[
"id1",
"id2",
"id3"
]
)
Creating a Leaderboard on a Benchmark#
# Create a new leaderboard on a benchmark
leaderboard = benchmark.create_leaderboard(
name="Realism",
instruction="Which image is more realistic?",
show_prompt=False
)
Retrieving Existing Benchmarks#
You can retrieve benchmarks by ID or search for them:
# Get a specific benchmark by ID
benchmark = client.mri.get_benchmark_by_id("benchmark_id_here")
# Find benchmarks by name
recent_benchmarks = client.mri.find_benchmarks(
name="AI Art",
amount=10
)
Evaluating Models#
Add your model's outputs to the benchmark:
# Evaluate a model
benchmark.evaluate_model(
name="MyAIModel_v2.1",
media=[
"path/to/mountain_sunset.jpg",
"path/to/futuristic_city.jpg",
"path/to/wizard_portrait.jpg"
],
identifiers=[
"id1",
"id2",
"id3"
]
)
Retrieving Results#
# Get the leaderboard
leaderboard = benchmark.leaderboards[0]
# Get the standings
standings = leaderboard.get_standings() # Returns a pandas dataframe