Custom Audiences#

Custom audiences let you train labelers with qualification examples specific to your task, resulting in higher quality labels.

Audience Types#

Audience Type	Speed	Quality	Best For
Global	Fastest	Baseline	Quick prototyping, simple tasks
Curated	Fast	Good	Tasks with a known domain (e.g. prompt alignment)
Custom	Slower initial setup	Highest	Production workloads, nuanced tasks

The global audience is the broadest pool of labelers, ready to work on any task immediately.

A curated audience is a pre-existing pool of labelers trained on a specific type of task. It offers better quality than the global audience without requiring any setup.

A custom audience filters labelers through qualification examples before they can work on your data. Only labelers who demonstrate they understand your tasks will be included, leading to the most accurate results.

Note

You can see the curated audiences along with your own in the Rapidata Dashboard.

Creating a Custom Audience#

Step 1: Create the Audience#

from rapidata import RapidataClient

client = RapidataClient()
audience = client.audience.create_audience(name="Custom Prompt Alignment Audience") # (1)!

Creates a new, empty audience. Labelers join by passing the qualification examples you add next.

Step 2: Add Qualification Examples#

Qualification examples are questions with known correct answers. Labelers must answer these correctly to join your audience.

Review your qualification examples carefully

Every qualification example with its associated truth must be manually and thoroughly reviewed before use. If an example has a wrong or ambiguous truth value, the qualification process will filter out good labelers who answer correctly while letting through bad labelers who happen to match the incorrect answer — completely inverting your quality control. Always verify that each example has a clear, unambiguous correct answer.

DATAPOINTS = [
    ["https://assets.rapidata.ai/flux_sign_diffusion.jpg", "https://assets.rapidata.ai/mj_sign_diffusion.jpg"],
    ["https://assets.rapidata.ai/flux_duck.jpg", "https://assets.rapidata.ai/mj_duck.jpg"],
    ["https://assets.rapidata.ai/flux_book.jpg", "https://assets.rapidata.ai/mj_book.jpg"],
    ["https://assets.rapidata.ai/flux_flower.jpg", "https://assets.rapidata.ai/mj_flower.jpg"],
    ["https://assets.rapidata.ai/flux_store_front.jpg", "https://assets.rapidata.ai/mj_store_front.jpg"],
    ["https://assets.rapidata.ai/flux_hand.jpg", "https://assets.rapidata.ai/mj_hand.jpg"],
    ["https://assets.rapidata.ai/flux_traffic_lights.jpg", "https://assets.rapidata.ai/mj_traffic_lights.jpg"],
    ["https://assets.rapidata.ai/flux_plane.jpg", "https://assets.rapidata.ai/mj_plane.jpg"],
]
PROMPTS = [
    "A sign that says 'Diffusion'.",
    "A psychedelic duck with glasses",
    "A small blue book sitting on a large red book.",
    "A yellow flower sticking out of a bright green pot.",
    "A store front with 'hello world' written on it.",
    "A yellow hand on a black stone.",
    "A green, yellow and red traffic light.",
    "A plane flying over a person.",
]

for prompt, datapoint in zip(PROMPTS, DATAPOINTS):
    audience.add_compare_example(
        instruction="Which image follows the prompt more accurately?",
        datapoint=datapoint, # (1)!
        truth=datapoint[0], # (2)!
        context=prompt # (3)!
    )

The items to compare — a list of URLs, local paths, or text strings.
The correct answer — must match one of the datapoint items exactly.
Additional context shown alongside the comparison (optional).

Note

In practice you'd want to add more examples to the audience to improve the quality of the results.

Step 3: Create and Assign a Job#

Once your audience is set up, create a job definition and assign it to the audience:

job_definition = client.job.create_compare_job_definition(
    name="Prompt Alignment Job",
    instruction="Which image follows the prompt more accurately?",
    datapoints=[
        ["https://assets.rapidata.ai/flux_book.jpg",
         "https://assets.rapidata.ai/mj_book.jpg"]
    ],
    contexts=["A small blue book sitting on a large red book."]
)

job_definition.preview()

job = audience.assign_job(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)

Complete Example#

Here's the full workflow — creating a custom audience, adding qualification examples, and running a labeling job:

from rapidata import RapidataClient

client = RapidataClient()

audience = client.audience.create_audience(name="Custom Prompt Alignment Audience")

DATAPOINTS = [
    ["https://assets.rapidata.ai/flux_sign_diffusion.jpg", "https://assets.rapidata.ai/mj_sign_diffusion.jpg"],
    ["https://assets.rapidata.ai/flux_duck.jpg", "https://assets.rapidata.ai/mj_duck.jpg"],
    ["https://assets.rapidata.ai/flux_book.jpg", "https://assets.rapidata.ai/mj_book.jpg"],
    ["https://assets.rapidata.ai/flux_flower.jpg", "https://assets.rapidata.ai/mj_flower.jpg"],
    ["https://assets.rapidata.ai/flux_store_front.jpg", "https://assets.rapidata.ai/mj_store_front.jpg"],
    ["https://assets.rapidata.ai/flux_hand.jpg", "https://assets.rapidata.ai/mj_hand.jpg"],
    ["https://assets.rapidata.ai/flux_traffic_lights.jpg", "https://assets.rapidata.ai/mj_traffic_lights.jpg"],
    ["https://assets.rapidata.ai/flux_plane.jpg", "https://assets.rapidata.ai/mj_plane.jpg"],
]
PROMPTS = [
    "A sign that says 'Diffusion'.",
    "A psychedelic duck with glasses",
    "A small blue book sitting on a large red book.",
    "A yellow flower sticking out of a bright green pot.",
    "A store front with 'hello world' written on it.",
    "A yellow hand on a black stone.",
    "A green, yellow and red traffic light.",
    "A plane flying over a person.",
]

for prompt, datapoint in zip(PROMPTS, DATAPOINTS):
    audience.add_compare_example(
        instruction="Which image follows the prompt more accurately?",
        datapoint=datapoint,
        truth=datapoint[0],
        context=prompt
    )

job_definition = client.job.create_compare_job_definition(
    name="Prompt Alignment Job",
    instruction="Which image follows the prompt more accurately?",
    datapoints=[
        ["https://assets.rapidata.ai/flux_book.jpg",
         "https://assets.rapidata.ai/mj_book.jpg"]
    ],
    contexts=["A small blue book sitting on a large red book."]
)

job_definition.preview()

job = audience.assign_job(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)

Matching the Job UI with Settings#

Qualification examples default to the standard UI for their task type. If your job uses settings to change how the task is rendered (e.g. NoShuffleSetting to keep answer options in order, AllowNeitherBothSetting to add an "Unsure" button), pass the same settings to the example so the labeler qualifies on the exact UI they will later see.

from rapidata import NoShuffleSetting

audience.add_classification_example(
    instruction="How well does the image match the description?",
    answer_options=[
        "1: Not at all",
        "2: A little",
        "3: Moderately",
        "4: Very well",
        "5: Perfectly",
    ],
    datapoint="https://assets.rapidata.ai/email-4o.png",
    truth=["5: Perfectly", "4: Very well"],
    context="A laptop screen with clearly readable text, addressed to the marketing team.",
    settings=[NoShuffleSetting()], # (1)!
)

Applies the setting as a feature flag on this single example. Use the same RapidataSetting subclasses you would pass to settings= on a job or order (e.g. NoShuffleSetting, MarkdownSetting, AllowNeitherBothSetting, ComparePanoramaSetting). All add_*_example methods accept settings.

Reusing Audiences#

Once created, you can reuse your audience for multiple jobs:

audiences = client.audience.find_audiences("Custom Prompt Alignment Audience")
audience = client.audience.get_audience_by_id("audience_id")

job = audience.assign_job(new_job_definition)

Filtered Audiences#

A filtered audience is a lightweight subset of an existing audience's qualified labelers — derived by applying filters on top of the base audience. No new qualification or recruiting takes place; the filtered audience reuses the same pool. Use it when you want to target a specific slice (e.g. by country or language) of an audience that you have already trained.

Deriving a filtered audience with `.filter()`#

Call .filter(...) on any RapidataAudience with a list of one or more filters. The call returns a RapidataFilteredAudience — a slim handle that reuses the base audience's qualified pool. Multiple filters in the list are combined with logical AND.

from rapidata import CountryFilter, LanguageFilter

base = client.audience.get_audience_by_id("audience_id")

us_english_speakers = base.filter([
    CountryFilter(["US"]),
    LanguageFilter(["en"]),
])

job = us_english_speakers.assign_job(new_job_definition)

The returned object is a RapidataFilteredAudience — a slim variant that exposes only the operations that make sense for a filtered view (assign_job, find_jobs, delete, and use as audience_id on leaderboard creation). It deliberately does not offer add_classification_example, update_filters, or further nested .filter(...) calls: those would either mutate the base audience's qualification pool (which the filtered view shares) or chain filters in a way that's better expressed as a single combined filter on the base.

Supported filters#

Filter	Targets labelers by
`CountryFilter`	ISO-3166 country code (e.g. `["US", "CA"]`)
`LanguageFilter`	Spoken / device language (e.g. `["en", "de"]`)

Combining filters#

The list form combines filters with logical AND. For anything richer, build a single top-level filter explicitly with AndFilter / OrFilter / NotFilter, or use the equivalent & / | / ~ operators:

from rapidata import CountryFilter, LanguageFilter

# "US or Canadian labelers, but not French speakers"
audience_slice = base.filter([
    (CountryFilter(["US"]) | CountryFilter(["CA"]))
    & ~LanguageFilter(["fr"]),
])

Using a filtered audience with a leaderboard#

RapidataFilteredAudience is a valid audience_id anywhere a regular audience id is accepted, including benchmark.create_leaderboard. Pass the object directly — no need to read .id yourself:

us_english = base.filter([
    CountryFilter(["US"]),
    LanguageFilter(["en"]),
])

leaderboard = benchmark.create_leaderboard(
    name="Realism (US, English)",
    instruction="Which image is more realistic?",
    audience_id=us_english, # (1)!
)

Accepts an id string, a RapidataAudience, or a RapidataFilteredAudience. Defaults to the global audience when omitted.

Next Steps#

Learn about Classification Jobs for categorizing data
Understand the Results Format
Configure Early Stopping based on confidence thresholds