Early Stopping#
To improve the efficiency and cost-effectiveness of your data labeling tasks, Rapidata offers Early Stopping features that automatically stop collecting responses for a datapoint once a stopping condition is met, saving time and resources without compromising quality.
There are two early stopping strategies:
- Confidence Stopping: Stops when a statistical confidence threshold is reached, using labeler trust scores.
- Quorum Stopping: Stops when a fixed number of responses agree on the same answer.
You can use one or the other, but not both at the same time.
Why Use Early Stopping?#
In traditional data labeling workflows, you might request a fixed number of responses per datapoint to ensure accuracy. However, once a consensus is reached, continuing to collect more responses becomes redundant and incurs unnecessary costs.
Early Stopping addresses this by:
- Reducing Costs: Stop collecting responses when sufficient agreement is achieved.
- Improving Efficiency: Accelerate the labeling process by focusing resources where they are most needed.
- Maintaining Quality: Ensure that each datapoint meets your specified stopping condition before stopping.
Confidence Stopping#
How it Works#
The Early Stopping feature leverages the trustworthiness, quantified through their userScores, to calculate the confidence level of each category for any given datapoint.
Confidence Calculation#
- UserScores: Each labeler has a
userScorebetween 0 and 1, representing their reliability. More information - Aggregated Confidence: By combining the userScores of labelers who selected a particular category, the system computes the probability that this category is the correct one.
- Threshold Comparison: If the calculated confidence exceeds your specified threshold, the system stops collecting further responses for that datapoint.
Understanding the Confidence Threshold#
We've created a plot based on empirical data aided by simulations to give you an estimate of the number of responses required to reach a certain confidence level.
There are a few things to keep in mind when interpreting the results:
- Unambiguous Scenario: The graph represents an ideal situation such as in the example below with no ambiguity which category is the correct one. A counter-example would be subjective tasks like "Which image do you prefer?", where there's no clear correct answer.
- Real-World Variability: Actual required responses may vary based on task complexity.
- Guidance Tool: Use the graph as a reference to set realistic expectations for your jobs.
- Response Overflow: The number of responses per datapoint may exceed the specified amount due to multiple users answering simultaneously.
Note
The Early Stopping feature is supported for the Classification and Comparison workflows. The number of categories is the number of options in the Classification task. For the Comparison task, the number of categories is always 2.
Using Confidence Stopping in Your Job#
You simply add the confidence_threshold parameter when creating the job definition.
Example: Classification Job with Confidence Stopping#
from rapidata import RapidataClient
client = RapidataClient()
audience = client.audience.create_audience(name="Animal Classification Audience")
audience.add_classification_example(
instruction="What do you see in the image?",
answer_options=["Cat", "Dog"],
datapoint="https://assets.rapidata.ai/cat.jpeg",
truth=["Cat"]
)
job_definition = client.job.create_classification_job_definition(
name="Test Classification with Early Stopping",
instruction="What do you see in the image?",
answer_options=["Cat", "Dog"],
datapoints=["https://assets.rapidata.ai/dog.jpeg"],
responses_per_datapoint=50, # (1)!
confidence_threshold=0.99, # (2)!
)
job_definition.preview()
job = audience.assign_job(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)
- Sets the maximum number of responses per datapoint.
- Stops collecting once 99% confidence is reached — for clear-cut tasks like this, expect roughly 4 responses.
When to Use Confidence Stopping#
We recommend using Confidence Stopping when:
- Cost Efficiency: You want to optimize costs by reducing the number of responses per datapoint.
- Clear Correct Answer: The task has a clear correct answer, and you're not interested in a distribution.
Analyzing Confidence Stopping Results#
When using Confidence Stopping, the results will additionally include a confidencePerCategory field for each datapoint. This field shows the confidence level for each of the categories in the task.
Example:
{
"info": {
"createdAt": "2099-12-30T00:00:00.000000+00:00",
"version": "3.0.0"
},
"results": {
"globalAggregatedData": {
"Dog": 4,
"Cat": 0
},
"data": [
{
"originalFileName": "dog.jpeg",
"aggregatedResults": {
"Dog": 4,
"Cat": 0
},
"aggregatedResultsRatios": {
"Dog": 1.0,
"Cat": 0.0
},
"summedUserScores": {
"Dog": 2.0865,
"Cat": 0.0
},
"summedUserScoresRatios": {
"Dog": 1.0,
"Cat": 0.0
},
# this only appears when using early stopping
"confidencePerCategory": {
"Dog": 0.9943,
"Cat": 0.0057
},
"detailedResults": [
{
"selectedCategory": "Dog",
"userDetails": {
"country": "PT",
"language": "pt",
"userScore": 0.3
}
},
{
"selectedCategory": "Dog",
"userDetails": {
"country": "RS",
"language": "sr",
"userScore": 0.8486
}
},
{
"selectedCategory": "Dog",
"userDetails": {
"country": "SG",
"language": "en",
"userScore": 0.4469
}
},
{
"selectedCategory": "Dog",
"userDetails": {
"country": "IN",
"language": "en",
"userScore": 0.4911
}
}
]
}
]
}
}
Quorum Stopping#
How it Works#
Quorum Stopping uses a simple vote-counting approach. A task is completed when:
- A minimum number of responses (
quorum_threshold) agree on the same answer, OR - Quorum becomes mathematically impossible to reach, OR
- The maximum number of votes (
responses_per_datapoint) is reached.
For example, with quorum_threshold=7 and responses_per_datapoint=10:
- The task completes when 7 responses agree (quorum reached).
- The task completes when both options have 4+ responses (quorum is impossible since neither can reach 7 out of 10).
- The task completes after 10 total votes if neither condition is met.
Note
Quorum Stopping is supported for the Classification and Comparison workflows, just like Confidence Stopping.
Using Quorum Stopping in Your Job#
You add the quorum_threshold parameter when creating the job definition.
Example: Classification Job with Quorum Stopping#
from rapidata import RapidataClient
client = RapidataClient()
audience = client.audience.create_audience(name="Animal Classification Audience")
audience.add_classification_example(
instruction="What do you see in the image?",
answer_options=["Cat", "Dog"],
datapoint="https://assets.rapidata.ai/cat.jpeg",
truth=["Cat"]
)
job_definition = client.job.create_classification_job_definition(
name="Test Classification with Quorum Stopping",
instruction="What do you see in the image?",
answer_options=["Cat", "Dog"],
datapoints=["https://assets.rapidata.ai/dog.jpeg"],
responses_per_datapoint=10, # (1)!
quorum_threshold=7, # (2)!
)
job_definition.preview()
job = audience.assign_job(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)
- Sets the maximum number of responses per datapoint.
- Stops collecting once 7 responses agree on the same answer.
When to Use Quorum Stopping#
Quorum Stopping is a good choice when:
- Simplicity: You want a straightforward stopping rule based on raw vote counts rather than statistical confidence.
- Predictable Costs: You want to set an upper bound on responses while still allowing early termination.
- Clear Correct Answer: The task has a clear correct answer, and you expect most labelers to agree.