nucleus.validate

Model CI Python Library.

EvaluationCriterion

An Evaluation Criterion is defined as an evaluation function, threshold, and comparator.

ScenarioTest

A Scenario Test combines a slice and at least one evaluation criterion. A ScenarioTest is not created through

Validate

Model CI Python Client extension.

class nucleus.validate.EvaluationCriterion(**data)

An Evaluation Criterion is defined as an evaluation function, threshold, and comparator. It describes how to apply an evaluation function

Notes

To define the evaluation criteria for a scenario test we’ve created some syntactic sugar to make it look closer to an actual function call, and we also hide away implementation details related to our data model that simply are not clear, UX-wise.

Instead of defining criteria like this:

from nucleus.validate.data_transfer_objects.eval_function import (
    EvaluationCriterion,
    ThresholdComparison,
)

criteria = [
    EvaluationCriterion(
        eval_function_id="ef_c6m1khygqk400918ays0",  # bbox_recall
        threshold_comparison=ThresholdComparison.GREATER_THAN,
        threshold=0.5,
    ),
]

we define it like this:

bbox_recall = client.validate.eval_functions.bbox_recall
criteria = [
    bbox_recall() > 0.5
]

The chosen method allows us to document the available evaluation functions in an IDE friendly fashion and hides away details like internal IDs (“ef_….”).

The actual EvaluationCriterion is created by overloading the comparison operators for the base class of an evaluation function. Instead of the comparison returning a bool, we’ve made it create an EvaluationCriterion with the correct signature to send over the wire to our API.

Parameters:
  • eval_function_id (str) – ID of evaluation function

  • threshold_comparison (ThresholdComparison) – comparator for evaluation. i.e. threshold=0.5 and threshold_comparator > implies that a test only passes if score > 0.5.

  • threshold (float) – numerical threshold that together with threshold comparison, defines success criteria for test evaluation.

  • eval_func_arguments – Arguments to pass to the eval function constructor

  • data (Any)

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

class nucleus.validate.ScenarioTest

A Scenario Test combines a slice and at least one evaluation criterion. A ScenarioTest is not created through the default constructor but using the instructions shown in Validate. This ScenarioTest class only simplifies the interaction with the scenario tests from this SDK.

id

The ID of the scenario test.

Type:

str

connection

The connection to Nucleus API.

Type:

Connection

name

The name of the scenario test.

Type:

str

slice_id

The ID of the associated Nucleus slice.

Type:

str

add_eval_function(eval_function)

Creates and adds a new evaluation metric to the ScenarioTest.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
scenario_test = client.validate.create_scenario_test(
    "sample_scenario_test", "slc_bx86ea222a6g057x4380"
)

e = client.validate.eval_functions
# Assuming a user would like to add all available public evaluation functions as criteria
scenario_test.add_eval_function(
    e.bbox_iou
)
scenario_test.add_eval_function(
    e.bbox_map
)
scenario_test.add_eval_function(
    e.bbox_precision
)
scenario_test.add_eval_function(
    e.bbox_recall
)
Parameters:

eval_function (nucleus.validate.eval_functions.available_eval_functions.EvalFunction) – EvalFunction

Raises:

NucleusAPIError – By adding this function, the scenario test mixes external with non-external functions which is not permitted.

Returns:

The created ScenarioTestMetric object.

Return type:

nucleus.validate.scenario_test_metric.ScenarioTestMetric

get_eval_functions()

Retrieves all criteria of the ScenarioTest.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
scenario_test = client.validate.scenario_tests[0]

scenario_test.get_eval_functions()
Returns:

A list of ScenarioTestMetric objects.

Return type:

List[nucleus.validate.scenario_test_metric.ScenarioTestMetric]

get_eval_history()

Retrieves evaluation history for ScenarioTest.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
scenario_test = client.validate.scenario_tests[0]

scenario_test.get_eval_history()
Returns:

A list of ScenarioTestEvaluation objects.

Return type:

List[nucleus.validate.scenario_test_evaluation.ScenarioTestEvaluation]

get_items(level=EntityLevel.ITEM)

Gets items within a scenario test at a given level, returning a list of Track, DatasetItem, or Scene objects.

Parameters:

level (nucleus.validate.constants.EntityLevel) – EntityLevel

Returns:

A list of ScenarioTestEvaluation objects.

Return type:

Union[List[nucleus.track.Track], List[nucleus.dataset_item.DatasetItem], List[nucleus.scene.Scene]]

set_baseline_model(model_id)

Sets a new baseline model for the ScenarioTest. In order to be eligible to be a baseline, this scenario test must have been evaluated using that model. The baseline model’s performance is used as the threshold for all metrics against which other models are compared.

import nucleus client = nucleus.NucleusClient(“YOUR_SCALE_API_KEY”) scenario_test = client.validate.scenario_tests[0]

scenario_test.set_baseline_model(‘my_baseline_model_id’)

Returns:

A list of ScenarioTestEvaluation objects.

Parameters:

model_id (str)

class nucleus.validate.Validate(api_key, endpoint, extra_headers=None)

Model CI Python Client extension.

Parameters:
  • api_key (Optional[str])

  • endpoint (str)

  • extra_headers (Optional[dict])

create_external_eval_function(name, level=EntityLevel.ITEM)

Creates a new external evaluation function. This external function can be used to upload evaluation results with functions defined and computed by the customer, without having to share the source code of the respective function.

Parameters:
  • name (str) – unique name of evaluation function

  • level (nucleus.validate.constants.EntityLevel) – level at which the eval function is run, defaults to EntityLevel.ITEM.

Raises:
  • NucleusAPIError – If the creation of the function fails on the server side.

  • ValidationError – If the evaluation name is not well defined.

Returns:

Created EvalFunctionConfig object.

Return type:

nucleus.validate.data_transfer_objects.eval_function.EvalFunctionEntry

create_scenario_test(name, slice_id, evaluation_functions)

Creates a new Scenario Test from an existing Nucleus Slice:.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")

scenario_test = client.validate.create_scenario_test(
    name="sample_scenario_test",
    slice_id="YOUR_SLICE_ID",
    evaluation_functions=[client.validate.eval_functions.bbox_iou()]
)
Parameters:
  • name (str) – unique name of test

  • slice_id (str) – id of (pre-defined) slice of items to evaluate test on.

  • evaluation_functions (List[nucleus.validate.eval_functions.base_eval_function.EvalFunctionConfig]) – EvalFunctionEntry defines an evaluation metric for the test. Created with an element from the list of available eval functions. See eval_functions.

Returns:

Created ScenarioTest object.

Return type:

nucleus.validate.scenario_test.ScenarioTest

delete_scenario_test(scenario_test_id)

Deletes a Scenario Test.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
scenario_test = client.validate.scenario_tests[0]

success = client.validate.delete_scenario_test(scenario_test.id)
Parameters:

scenario_test_id (str) – unique ID of scenario test

Returns:

Whether deletion was successful.

Return type:

bool

evaluate_model_on_scenario_tests(model_id, scenario_test_names)

Evaluates the given model on the specified Scenario Tests.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]
scenario_test = client.validate.create_scenario_test(
    "sample_scenario_test", "slc_bx86ea222a6g057x4380"
)

job = client.validate.evaluate_model_on_scenario_tests(
    model_id=model.id,
    scenario_test_names=["sample_scenario_test"],
)
job.sleep_until_complete() # Not required. Will block and update on status of the job.
Parameters:
  • model_id (str) – ID of model to evaluate

  • scenario_test_names (List[str]) – list of scenario test names of test to evaluate

Returns:

AsyncJob object of evaluation job

Return type:

nucleus.async_job.AsyncJob