Job Definition Parameter Reference#

This guide provides a comprehensive reference for all parameters available when creating job definitions in the Rapidata Python SDK.

Overview#

When creating a job definition, you'll use parameters to control:

What data is shown to labelers (datapoints, contexts)
How many responses you need (responses_per_datapoint)
How tasks are displayed (settings)
Quality assurance (confidence_threshold, quorum_threshold)

Core Parameters#

These parameters are required or commonly used across all job types.

`name`#

Property	Value
Type	`str`
Required	Yes

A descriptive name for your job definition. Used to identify the job in the Rapidata Dashboard and when retrieving jobs programmatically. This name is not shown to labelers.

name="Image Quality Rating v2 - January Batch"

`instruction`#

Property	Value
Type	`str`
Required	Yes

The task instruction shown to labelers. This should clearly explain what action they need to take.

Best Practices:

Be specific and unambiguous
Use action verbs ("Select", "Choose", "Identify")
For comparisons, use comparative language ("Which looks better?")
See Human Prompting for detailed guidance

instruction="Which image follows the prompt more accurately?"

`datapoints`#

Property	Value
Type	`list[str]` or `list[list[str]]`
Required	Yes

The data to be labeled. The format depends on the job type:

Job Type	Format	Description
Classification	`list[str]`	Single items to classify
Compare	`list[list[str]]`	Pairs of items (exactly 2 per inner list)

Supported Formats:

Public URLs (https://...)
Local file paths (will be uploaded automatically)

# Classification - list of single items
datapoints=["https://example.com/img1.jpg", "https://example.com/img2.jpg"]

# Compare - list of pairs
datapoints=[
    ["https://example.com/a1.jpg", "https://example.com/b1.jpg"],
    ["https://example.com/a2.jpg", "https://example.com/b2.jpg"],
]

`responses_per_datapoint`#

Property	Value
Type	`int`
Required	No
Default	`10`

The minimum number of responses to collect for each datapoint. The actual number may slightly exceed this due to concurrent labelers.

Best Practices:

Use 15-25 for ambiguous or subjective tasks
Use 5-10 for clear-cut decisions

responses_per_datapoint=15

Data Type#

`data_type`#

Property	Value
Type	`Literal["media", "text"]`
Required	No
Default	`"media"`

Specifies how datapoints should be interpreted and displayed.

Value	Description
`"media"`	Datapoints are URLs or paths to images, videos, or audio files
`"text"`	Datapoints are raw text strings to be displayed directly

# Comparing two text responses
job_definition = client.job.create_compare_job_definition(
    name="LLM Response Comparison",
    instruction="Which response is more helpful?",
    datapoints=[
        ["Response A text here...", "Response B text here..."],
    ],
    data_type="text",
)

Context Parameters#

Context parameters allow you to provide additional information alongside each datapoint.

`contexts`#

Property	Value
Type	`Optional[list[str]]`
Required	No
Default	`None`

Text context shown alongside each datapoint. Commonly used to provide prompts, descriptions, or additional instructions specific to each item.

Constraints: If provided, must have the same length as datapoints.

datapoints=["image1.jpg", "image2.jpg"],
contexts=["A cat sitting on a red couch", "A blue car in the rain"]

`media_contexts`#

Property	Value
Type	`Optional[list[str]]`
Required	No
Default	`None`

Media URLs shown as reference context alongside each datapoint. Useful when you need to show a reference image or video alongside the item being evaluated.

Constraints: If provided, must have the same length as datapoints.

# Show original image as context while evaluating edited versions
datapoints=["edited1.jpg", "edited2.jpg"],
media_contexts=["original1.jpg", "original2.jpg"]

Quality Control Parameters#

`confidence_threshold`#

Property	Value
Type	`Optional[float]`
Required	No
Default	`None`
Range	`0.0` to `1.0` (typically `0.99` to `0.999`)

Enables early stopping when a specified confidence level is reached. The system stops collecting responses once consensus is achieved, reducing costs while maintaining quality.

How It Works: Uses labeler trust scores (userScore) to calculate statistical confidence for each category.

Related: Confidence Stopping

job_definition = client.job.create_classification_job_definition(
    name="Cat or Dog with Early Stopping",
    instruction="What animal is in this image?",
    answer_options=["Cat", "Dog"],
    datapoints=["pet1.jpg", "pet2.jpg"],
    responses_per_datapoint=50,  # Maximum responses
    confidence_threshold=0.99,   # Stop at 99% confidence
)

`quorum_threshold`#

Property	Value
Type	`Optional[int]`
Required	No
Default	`None`

Enables early stopping when a specified number of responses agree on the same answer. The system stops collecting responses once quorum is reached, or when quorum becomes mathematically impossible, or after responses_per_datapoint votes.

Cannot be used together with confidence_threshold.

Related: Early Stopping

job_definition = client.job.create_classification_job_definition(
    name="Cat or Dog with Quorum Stopping",
    instruction="What animal is in this image?",
    answer_options=["Cat", "Dog"],
    datapoints=["pet1.jpg", "pet2.jpg"],
    responses_per_datapoint=10,  # Maximum responses
    quorum_threshold=7,          # Stop when 7 agree
)

Settings#

Settings allow you to customize how tasks are displayed.

Property	Value
Type	`Sequence[RapidataSetting]`
Required	No
Default	`[]`

Commonly Used Settings#

`NoShuffleSetting()`#

Keeps answer options in the order you specified. By default, options are randomized to reduce bias. Use this for Likert scales or any ordered options.

from rapidata import NoShuffleSetting

job_definition = client.job.create_classification_job_definition(
    instruction="Rate the quality of this image",
    answer_options=["1: Poor", "2: Fair", "3: Good", "4: Excellent"],
    datapoints=["image.jpg"],
    settings=[NoShuffleSetting()]
)

`MarkdownSetting()`#

Enables limited markdown rendering for text datapoints. Useful when comparing formatted text like LLM outputs.

from rapidata import MarkdownSetting

job_definition = client.job.create_compare_job_definition(
    name="LLM Response Comparison",
    instruction="Which response is better formatted?",
    datapoints=[["**Bold** and _italic_", "Plain text only"]],
    data_type="text",
    settings=[MarkdownSetting()]
)

`AllowNeitherBothSetting()`#

For Compare jobs, allows labelers to select "Neither" or "Both" instead of forcing a choice.

from rapidata import AllowNeitherBothSetting

job_definition = client.job.create_compare_job_definition(
    name="Image Quality Comparison",
    instruction="Which image is higher quality?",
    datapoints=[["img_a.jpg", "img_b.jpg"]],
    settings=[AllowNeitherBothSetting()]
)

Job-Specific Parameters#

Classification Job#

Parameter	Type	Description
`answer_options`	`list[str]`	List of categories to classify into

job_definition = client.job.create_classification_job_definition(
    name="Animal Classification",
    instruction="What animal is in the image?",
    answer_options=["Cat", "Dog", "Bird", "Other"],
    datapoints=["image1.jpg", "image2.jpg"],
)

Compare Job#

Parameter	Type	Description
`a_b_names`	`Optional[list[str]]`	Custom labels for the two options (list of exactly 2 strings)

job_definition = client.job.create_compare_job_definition(
    name="Model Comparison",
    instruction="Which image is better?",
    datapoints=[["model_a.jpg", "model_b.jpg"]],
    a_b_names=["Flux", "Midjourney"],  # Results will show these names
)

Parameter Availability Matrix#

Parameter	Classification	Compare
`name`
`instruction`
`datapoints`
`responses_per_datapoint`
`data_type`
`contexts`
`media_contexts`
`confidence_threshold`
`quorum_threshold`
`settings`
`answer_options`
`a_b_names`

Job Definition Parameter Reference#

Overview#

Core Parameters#

name#

instruction#

datapoints#

responses_per_datapoint#