Quickstart Guide#

Get real humans to label your data. This guide shows you how to create a labeling job using the Rapidata API.

The workflow consists of three main concepts:

Audience: A group of labelers who will work on your tasks
Job Definition: The configuration for your labeling task (instruction, datapoints, settings)
Job: A running labeling task assigned to an audience

Installation#

Install Rapidata using pip:

pip install -U rapidata

Usage#

All operations are managed through the RapidataClient.

Create a client as follows:

from rapidata import RapidataClient

client = RapidataClient() # (1)!

The first time you run this on a machine, it will open a browser window to log in. Your credentials are saved to ~/.config/rapidata/credentials.json so you don't have to log in again.

Alternatively, authenticate with a client ID and secret from Rapidata Settings:

from rapidata import RapidataClient
client = RapidataClient(client_id="Your client ID", client_secret="Your client secret")

Step 1: Get an Audience#

The simplest way to get started is with a curated audience:

audience = client.audience.get_audience_by_id("aud_MU1GZYoESyO") # (1)!

Curated audiences are pre-existing pools of labelers trained on a specific type of task — this is the Alignment audience. You can browse the curated audiences and copy their ids from the Rapidata Dashboard.

Note

The curated audience gets you started quickly, but results may be less accurate than a custom audience trained with examples specific to your task. For higher quality, see Custom Audiences.

Step 2: Create a Job Definition#

A job definition configures what you want labeled:

job_definition = client.job.create_compare_job_definition(
    name="Example Image Prompt Alignment",
    instruction="Which image matches the description better?", # (1)!
    datapoints=[ # (2)!
        ["https://assets.rapidata.ai/midjourney-5.2_37_3.jpg",
         "https://assets.rapidata.ai/flux-1-pro_37_0.jpg"]
    ],
    contexts=["A small blue book sitting on a large red book."] # (3)!
)

The instruction shown to labelers. Should be clear and unambiguous.
For compare jobs, each datapoint is a pair of items. Supports URLs, local paths, or text.
Optional text context shown alongside each datapoint (must match the length of datapoints).

Tip

If some datapoints fail to upload, a FailedUploadException will be raised. Learn how to handle this in the Error Handling Guide.

For a detailed explanation of all available parameters (including name, instruction, datapoints, contexts, quality control options, and more), see the Job Definition Parameters Reference.

Step 3: Preview the Job Definition#

Before running your job, preview it to see exactly what labelers will see:

job_definition.preview() # (1)!

Opens your browser where you can review and adjust the job configuration.

Step 4: Run and Get Results#

job = audience.assign_job(job_definition) # (1)!
job.display_progress_bar()
results = job.get_results() # (2)!

Assigns the job definition to the audience and starts collecting responses.
Blocks until the job is complete and returns the results. You can also monitor progress on the Rapidata Dashboard.

To understand the results format, see the Understanding the Results guide.

Retrieve Existing Resources#

Find Audiences#

# Find audiences by name
audiences = client.audience.find_audiences("alignment")

# Get a specific audience by ID
audience = client.audience.get_audience_by_id("audience_id")

Find Job Definitions#

# Find job definitions by name
job_definitions = client.job.find_job_definitions("Example Image Prompt Alignment")

# Get a specific job definition by ID
job_definition = client.job.get_job_defintion_by_id("job_definition_id")

Find Jobs#

# Find jobs by name
jobs = client.job.find_jobs("Example Image Prompt Alignment")

# Get a specific job by ID
job = client.job.get_job_by_id("job_id")

# Find jobs for a specific audience
audience = client.audience.get_audience_by_id("audience_id")
jobs = audience.find_jobs("Prompt Alignment")

Note

The find_* can be executed without the name parameter to return the most recent resources.

Complete Example#

Here's the full workflow using the curated alignment audience:

from rapidata import RapidataClient

client = RapidataClient()

audience = client.audience.get_audience_by_id("aud_MU1GZYoESyO")

job_definition = client.job.create_compare_job_definition(
    name="Example Image Prompt Alignment",
    instruction="Which image matches the description better?",
    datapoints=[
        ["https://assets.rapidata.ai/midjourney-5.2_37_3.jpg",
         "https://assets.rapidata.ai/flux-1-pro_37_0.jpg"]
    ],
    contexts=["A small blue book sitting on a large red book."]
)

job_definition.preview() # (1)!

job = audience.assign_job(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)

Optional — opens a browser preview of what labelers will see.

Next Steps#

Create Custom Audiences for higher quality results
Learn about Classification Jobs for categorizing data
Understand the Results Format
Configure Early Stopping based on confidence thresholds
Let your AI agent write the integration code for you — one-line install for Claude Code, Cursor, Copilot, and many more