Job Definition Parameter Reference#

This guide provides a comprehensive reference for all parameters available when creating job definitions in the Rapidata Python SDK.

Overview#

When creating a job definition, you'll use parameters to control:

What data is shown to labelers (datapoints, contexts)
How many responses you need (responses_per_datapoint)
How tasks are displayed (settings)
Quality assurance (confidence_threshold, quorum_threshold)

Core Parameters#

These parameters are required or commonly used across all job types.

`name`#

Property	Value
Type	`str`
Required	Yes

A descriptive name for your job definition. Used to identify the job in the Rapidata Dashboard and when retrieving jobs programmatically. This name is not shown to labelers.

name="Image Quality Rating v2 - January Batch"

`instruction`#

Property	Value
Type	`str`
Required	Yes

The task instruction shown to labelers. This should clearly explain what action they need to take.

Best Practices:

Be specific and unambiguous
Use action verbs ("Select", "Choose", "Identify")
For comparisons, use comparative language ("Which looks better?")
See Human Prompting for detailed guidance

instruction="Which image follows the prompt more accurately?"

`datapoints`#

Property	Value
Type	`list[str]` or `list[list[str]]`
Required	Yes

The data to be labeled. The format depends on the job type:

Job Type	Format	Description
Classification	`list[str]`	Single items to classify
Compare	`list[list[str]]`	Pairs of items (exactly 2 per inner list)
Locate	`list[str]`	Single items to locate within
Draw	`list[str]`	Single items to draw on
Select Words	`list[str]`	Single items, each paired with a sentence from `sentences`
Free Text	`list[str]`	Single items to answer about
Ranking	`list[list[str]]`	Independent rankings (each inner list is one set to rank)

Supported Formats:

Public URLs (https://...)
Local file paths (will be uploaded automatically)

# Classification - list of single items
datapoints=["https://example.com/img1.jpg", "https://example.com/img2.jpg"]

# Compare - list of pairs
datapoints=[
    ["https://example.com/a1.jpg", "https://example.com/b1.jpg"],
    ["https://example.com/a2.jpg", "https://example.com/b2.jpg"],
]

`responses_per_datapoint`#

Property	Value
Type	`int`
Required	No
Default	`10`

The minimum number of responses to collect for each datapoint. The actual number may slightly exceed this due to concurrent labelers.

Best Practices:

Use 15-25 for ambiguous or subjective tasks
Use 5-10 for clear-cut decisions

responses_per_datapoint=15

Data Type#

`data_type`#

Property	Value
Type	`Literal["media", "text"]`
Required	No
Default	`"media"`

Specifies how datapoints should be interpreted and displayed.

Value	Description
`"media"`	Datapoints are URLs or paths to images, videos, or audio files
`"text"`	Datapoints are raw text strings to be displayed directly

# Comparing two text responses
job_definition = client.job.create_compare_job_definition(
    name="LLM Response Comparison",
    instruction="Which response is more helpful?",
    datapoints=[
        ["Response A text here...", "Response B text here..."],
    ],
    data_type="text",
)

Context Parameters#

Context parameters allow you to provide additional information alongside each datapoint.

`contexts`#

Property	Value
Type	`Optional[list[str]]`
Required	No
Default	`None`

Text context shown alongside each datapoint. Commonly used to provide prompts, descriptions, or additional instructions specific to each item.

Constraints: If provided, must have the same length as datapoints.

datapoints=["image1.jpg", "image2.jpg"],
contexts=["A cat sitting on a red couch", "A blue car in the rain"]

Length limit: A context may be at most 400 characters; the backend rejects longer ones. If a context exceeds the limit, a warning is logged at creation time. Enable automatic shortening (see below) to have over-long contexts trimmed for you.

Automatic shortening#

Set rapidata_config.upload.autoShortenContext = True to have any context longer than the 400-character limit automatically shortened — tuned to the instruction so only the part relevant to the question is kept — before upload. When left at its default (False), an over-long context is left unchanged and a warning is logged explaining the backend would reject it.

from rapidata import rapidata_config

rapidata_config.upload.autoShortenContext = True

order = rapi.order.create_classification_order(
    name="Outfit check",
    instruction="Does the main character wear the right clothing?",
    answer_options=["Yes", "No"],
    datapoints=["scene.jpg"],
    contexts=["<a very long, detailed beach-scene description ...>"],
)

You can also shorten contexts directly via the client, without creating an order:

short = rapi.context.shorten_context(
    context="<a very long description ...>",
    question="Does the main character wear the right clothing?",
)

# Or a batch of (context, question) pairs in one call:
shortened = rapi.context.shorten_contexts([
    (context_a, question_a),
    (context_b, question_b),
])

`media_contexts`#

Property	Value
Type	`Optional[list[list[str]]]`
Required	No
Default	`None`

Image URLs shown as reference context alongside each datapoint. Useful when you need to show one or more reference images alongside the item being evaluated.

Constraints: If provided, must have the same length as datapoints. Each entry is itself a list of image URLs / paths. Use a single-element inner list for one image per datapoint, or multiple entries to display several images.

# One reference image per datapoint (each inner list has one entry)
datapoints=["edited1.jpg", "edited2.jpg"],
media_contexts=[["original1.jpg"], ["original2.jpg"]]

# Multiple reference images per datapoint
datapoints=["edited1.jpg", "edited2.jpg"],
media_contexts=[
    ["original1_a.jpg", "original1_b.jpg"],
    ["original2_a.jpg", "original2_b.jpg"],
]

Quality Control Parameters#

`confidence_threshold`#

Property	Value
Type	`Optional[float]`
Required	No
Default	`None`
Range	`0.0` to `1.0` (typically `0.99` to `0.999`)

Enables early stopping when a specified confidence level is reached. The system stops collecting responses once consensus is achieved, reducing costs while maintaining quality.

How It Works: Uses labeler trust scores (userScore) to calculate statistical confidence for each category.

Related: Confidence Stopping

job_definition = client.job.create_classification_job_definition(
    name="Cat or Dog with Early Stopping",
    instruction="What animal is in this image?",
    answer_options=["Cat", "Dog"],
    datapoints=["pet1.jpg", "pet2.jpg"],
    responses_per_datapoint=50,  # Maximum responses
    confidence_threshold=0.99,   # Stop at 99% confidence
)

`quorum_threshold`#

Property	Value
Type	`Optional[int]`
Required	No
Default	`None`

Enables early stopping when a specified number of responses agree on the same answer. The system stops collecting responses once quorum is reached, or when quorum becomes mathematically impossible, or after responses_per_datapoint votes.

Cannot be used together with confidence_threshold.

Related: Early Stopping

job_definition = client.job.create_classification_job_definition(
    name="Cat or Dog with Quorum Stopping",
    instruction="What animal is in this image?",
    answer_options=["Cat", "Dog"],
    datapoints=["pet1.jpg", "pet2.jpg"],
    responses_per_datapoint=10,  # Maximum responses
    quorum_threshold=7,          # Stop when 7 agree
)

Settings#

Settings allow you to customize how tasks are displayed.

Property	Value
Type	`Sequence[RapidataSetting]`
Required	No
Default	`[]`

Commonly Used Settings#

`NoShuffleSetting()`#

Keeps answer options in the order you specified. By default, options are randomized to reduce bias. Use this for Likert scales or any ordered options.

from rapidata import NoShuffleSetting

job_definition = client.job.create_classification_job_definition(
    instruction="Rate the quality of this image",
    answer_options=["1: Poor", "2: Fair", "3: Good", "4: Excellent"],
    datapoints=["image.jpg"],
    settings=[NoShuffleSetting()]
)

`MarkdownSetting()`#

Enables limited markdown rendering for text datapoints. Useful when comparing formatted text like LLM outputs.

from rapidata import MarkdownSetting

job_definition = client.job.create_compare_job_definition(
    name="LLM Response Comparison",
    instruction="Which response is better formatted?",
    datapoints=[["**Bold** and _italic_", "Plain text only"]],
    data_type="text",
    settings=[MarkdownSetting()]
)

`AllowNeitherBothSetting()`#

For Compare jobs, allows labelers to select "Neither" or "Both" instead of forcing a choice.

from rapidata import AllowNeitherBothSetting

job_definition = client.job.create_compare_job_definition(
    name="Image Quality Comparison",
    instruction="Which image is higher quality?",
    datapoints=[["img_a.jpg", "img_b.jpg"]],
    settings=[AllowNeitherBothSetting()]
)

Job-Specific Parameters#

Classification Job#

Parameter	Type	Description
`answer_options`	`list[str]`	List of categories to classify into

job_definition = client.job.create_classification_job_definition(
    name="Animal Classification",
    instruction="What animal is in the image?",
    answer_options=["Cat", "Dog", "Bird", "Other"],
    datapoints=["image1.jpg", "image2.jpg"],
)

Compare Job#

Parameter	Type	Description
`a_b_names`	`Optional[list[str]]`	Custom labels for the two options (list of exactly 2 strings)

job_definition = client.job.create_compare_job_definition(
    name="Model Comparison",
    instruction="Which image is better?",
    datapoints=[["model_a.jpg", "model_b.jpg"]],
    a_b_names=["Flux", "Midjourney"],  # Results will show these names
)

Locate Job#

Locate has no job-specific parameters — it uses only the core parameters. The instruction describes what labelers should locate, and each response is the set of points they tapped on the datapoint.

job_definition = client.job.create_locate_job_definition(
    name="Artifact Detection",
    instruction="Tap on any visual glitches or errors in the image.",
    datapoints=["image1.jpg", "image2.jpg"],
)

Draw Job#

Draw has no job-specific parameters — it uses only the core parameters. The instruction describes what labelers should draw, and each response is the set of lines they drew on the datapoint.

job_definition = client.job.create_draw_job_definition(
    name="Object Marking",
    instruction="Color in all the blue books",
    datapoints=["image1.jpg", "image2.jpg"],
)

Select Words Job#

Parameter	Type	Description
`sentences`	`list[str]`	One sentence per datapoint, split up by spaces for the labeler to select words from (must have the same length as `datapoints`)

job_definition = client.job.create_select_words_job_definition(
    name="Prompt Alignment",
    instruction="Select the words that are not depicted in the image.",
    datapoints=["image1.jpg", "image2.jpg"],
    sentences=["A cat on a red couch", "A blue car in the rain"],
)

Free Text Job#

Free Text has no job-specific parameters — it uses only the core parameters. The instruction is the question labelers answer, and each response is the text they typed.

job_definition = client.job.create_free_text_job_definition(
    name="Prompt Collection",
    instruction="What would you like to ask an AI?",
    datapoints=["image1.jpg"],
)

Ranking Job#

Parameter	Type	Description
`comparison_budget_per_ranking`	`int`	Number of pairwise matchups collected per ranking (per inner list of `datapoints`)
`responses_per_comparison`	`int`	Number of responses collected per matchup (default `1`) — replaces `responses_per_datapoint`
`random_comparisons_ratio`	`float`	Ratio of random matchups to total matchups (default `0.5`); the rest are close matchups between similarly-rated datapoints

job_definition = client.job.create_ranking_job_definition(
    name="Image Ranking",
    instruction="Which image looks better?",
    datapoints=[["img1.jpg", "img2.jpg", "img3.jpg"]],
    comparison_budget_per_ranking=50,
)

Parameter Availability Matrix#

Parameter	Classification	Compare	Locate	Draw	Select Words	Free Text	Ranking
`name`
`instruction`
`datapoints`
`responses_per_datapoint`
`data_type`
`contexts`
`media_contexts`
`confidence_threshold`
`quorum_threshold`
`settings`
`answer_options`
`a_b_names`
`sentences`
`comparison_budget_per_ranking`
`responses_per_comparison`
`random_comparisons_ratio`