Rapidata benchmark
RapidataBenchmark #
An instance of a Rapidata benchmark.
Used to interact with a specific benchmark in the Rapidata system, such as retrieving prompts and evaluating models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name that will be used to identify the benchmark on the overview. |
required |
id
|
str
|
The id of the benchmark. |
required |
openapi_service
|
OpenAPIService
|
The OpenAPI service to use to interact with the Rapidata API. |
required |
Source code in src/rapidata/rapidata_client/benchmark/rapidata_benchmark.py
prompts
property
#
Returns the prompts that are registered for the leaderboard.
prompt_assets
property
#
Returns the prompt assets that are registered for the benchmark.
leaderboards
property
#
leaderboards: list[RapidataLeaderboard]
Returns the leaderboards that are registered for the benchmark.
add_prompt #
add_prompt(
identifier: str,
prompt: str | None = None,
asset: str | None = None,
tags: Optional[list[str]] = None,
)
Adds a prompt to the benchmark.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier
|
str
|
The identifier of the prompt/asset/tags that will be used to match up the media. |
required |
prompt
|
str | None
|
The prompt that will be used to evaluate the model. |
None
|
asset
|
str | None
|
The asset that will be used to evaluate the model. Provided as a link to the asset. |
None
|
tags
|
Optional[list[str]]
|
The tags can be used to filter the leaderboard results. They will NOT be shown to the users. |
None
|
Source code in src/rapidata/rapidata_client/benchmark/rapidata_benchmark.py
create_leaderboard #
create_leaderboard(
name: str,
instruction: str,
show_prompt: bool = False,
show_prompt_asset: bool = False,
inverse_ranking: bool = False,
level_of_detail: Literal[
"low", "medium", "high", "very high"
] = "low",
min_responses_per_matchup: int = 3,
) -> RapidataLeaderboard
Creates a new leaderboard for the benchmark.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the leaderboard. (not shown to the users) |
required |
instruction
|
str
|
The instruction decides how the models will be evaluated. |
required |
show_prompt
|
bool
|
Whether to show the prompt to the users. (default: False) |
False
|
show_prompt_asset
|
bool
|
Whether to show the prompt asset to the users. (only works if the prompt asset is a URL) (default: False) |
False
|
inverse_ranking
|
bool
|
Whether to inverse the ranking of the leaderboard. (if the question is inversed, e.g. "Which video is worse?") |
False
|
level_of_detail
|
Literal['low', 'medium', 'high', 'very high']
|
The level of detail of the leaderboard. This will effect how many comparisons are done per model evaluation. (default: "low") |
'low'
|
min_responses_per_matchup
|
int
|
The minimum number of responses required to be considered for the leaderboard. (default: 3) |
3
|
Source code in src/rapidata/rapidata_client/benchmark/rapidata_benchmark.py
evaluate_model #
Evaluates a model on the benchmark across all leaderboards.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the model. |
required |
media
|
list[str]
|
The generated images/videos that will be used to evaluate the model. |
required |
identifiers
|
list[str]
|
The identifiers that correspond to the media. The order of the identifiers must match the order of the media. The identifiers that are used must be registered for the benchmark. To see the registered identifiers, use the identifiers property. |
required |