nucleus.validate ================ .. py:module:: nucleus.validate .. autoapi-nested-parse:: Model CI Python Library. .. autoapisummary:: nucleus.validate.EvaluationCriterion nucleus.validate.ScenarioTest nucleus.validate.Validate .. py:class:: EvaluationCriterion(**data) An Evaluation Criterion is defined as an evaluation function, threshold, and comparator. It describes how to apply an evaluation function .. rubric:: Notes To define the evaluation criteria for a scenario test we've created some syntactic sugar to make it look closer to an actual function call, and we also hide away implementation details related to our data model that simply are not clear, UX-wise. Instead of defining criteria like this:: from nucleus.validate.data_transfer_objects.eval_function import ( EvaluationCriterion, ThresholdComparison, ) criteria = [ EvaluationCriterion( eval_function_id="ef_c6m1khygqk400918ays0", # bbox_recall threshold_comparison=ThresholdComparison.GREATER_THAN, threshold=0.5, ), ] we define it like this:: bbox_recall = client.validate.eval_functions.bbox_recall criteria = [ bbox_recall() > 0.5 ] The chosen method allows us to document the available evaluation functions in an IDE friendly fashion and hides away details like internal IDs (`"ef_...."`). The actual `EvaluationCriterion` is created by overloading the comparison operators for the base class of an evaluation function. Instead of the comparison returning a bool, we've made it create an `EvaluationCriterion` with the correct signature to send over the wire to our API. :param eval_function_id: ID of evaluation function :type eval_function_id: str :param threshold_comparison: comparator for evaluation. i.e. threshold=0.5 and threshold_comparator > implies that a test only passes if score > 0.5. :type threshold_comparison: :class:`ThresholdComparison` :param threshold: numerical threshold that together with threshold comparison, defines success criteria for test evaluation. :type threshold: float :param eval_func_arguments: Arguments to pass to the eval function constructor Create a new model by parsing and validating input data from keyword arguments. Raises ValidationError if the input data cannot be parsed to form a valid model. .. py:class:: ScenarioTest A Scenario Test combines a slice and at least one evaluation criterion. A :class:`ScenarioTest` is not created through the default constructor but using the instructions shown in :class:`Validate`. This :class:`ScenarioTest` class only simplifies the interaction with the scenario tests from this SDK. .. attribute:: id The ID of the scenario test. :type: str .. attribute:: connection The connection to Nucleus API. :type: Connection .. attribute:: name The name of the scenario test. :type: str .. attribute:: slice_id The ID of the associated Nucleus slice. :type: str .. py:method:: add_eval_function(eval_function) Creates and adds a new evaluation metric to the :class:`ScenarioTest`. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") scenario_test = client.validate.create_scenario_test( "sample_scenario_test", "slc_bx86ea222a6g057x4380" ) e = client.validate.eval_functions # Assuming a user would like to add all available public evaluation functions as criteria scenario_test.add_eval_function( e.bbox_iou ) scenario_test.add_eval_function( e.bbox_map ) scenario_test.add_eval_function( e.bbox_precision ) scenario_test.add_eval_function( e.bbox_recall ) :param eval_function: :class:`EvalFunction` :raises NucleusAPIError: By adding this function, the scenario test mixes external with non-external functions which is not permitted. :returns: The created ScenarioTestMetric object. .. py:method:: get_eval_functions() Retrieves all criteria of the :class:`ScenarioTest`. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") scenario_test = client.validate.scenario_tests[0] scenario_test.get_eval_functions() :returns: A list of ScenarioTestMetric objects. .. py:method:: get_eval_history() Retrieves evaluation history for :class:`ScenarioTest`. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") scenario_test = client.validate.scenario_tests[0] scenario_test.get_eval_history() :returns: A list of :class:`ScenarioTestEvaluation` objects. .. py:method:: get_items(level = EntityLevel.ITEM) Gets items within a scenario test at a given level, returning a list of Track, DatasetItem, or Scene objects. :param level: :class:`EntityLevel` :returns: A list of :class:`ScenarioTestEvaluation` objects. .. py:method:: set_baseline_model(model_id) Sets a new baseline model for the ScenarioTest. In order to be eligible to be a baseline, this scenario test must have been evaluated using that model. The baseline model's performance is used as the threshold for all metrics against which other models are compared. import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") scenario_test = client.validate.scenario_tests[0] scenario_test.set_baseline_model('my_baseline_model_id') :returns: A list of :class:`ScenarioTestEvaluation` objects. .. py:class:: Validate(api_key, endpoint, extra_headers = None) Model CI Python Client extension. .. py:method:: create_external_eval_function(name, level = EntityLevel.ITEM) Creates a new external evaluation function. This external function can be used to upload evaluation results with functions defined and computed by the customer, without having to share the source code of the respective function. :param name: unique name of evaluation function :param level: level at which the eval function is run, defaults to EntityLevel.ITEM. :raises NucleusAPIError: If the creation of the function fails on the server side. :raises ValidationError: If the evaluation name is not well defined. :returns: Created EvalFunctionConfig object. .. py:method:: create_scenario_test(name, slice_id, evaluation_functions) Creates a new Scenario Test from an existing Nucleus :class:`Slice`:. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") scenario_test = client.validate.create_scenario_test( name="sample_scenario_test", slice_id="YOUR_SLICE_ID", evaluation_functions=[client.validate.eval_functions.bbox_iou()] ) :param name: unique name of test :param slice_id: id of (pre-defined) slice of items to evaluate test on. :param evaluation_functions: :class:`EvalFunctionEntry` defines an evaluation metric for the test. Created with an element from the list of available eval functions. See :class:`eval_functions`. :returns: Created ScenarioTest object. .. py:method:: delete_scenario_test(scenario_test_id) Deletes a Scenario Test. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") scenario_test = client.validate.scenario_tests[0] success = client.validate.delete_scenario_test(scenario_test.id) :param scenario_test_id: unique ID of scenario test :returns: Whether deletion was successful. .. py:method:: evaluate_model_on_scenario_tests(model_id, scenario_test_names) Evaluates the given model on the specified Scenario Tests. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") model = client.list_models()[0] scenario_test = client.validate.create_scenario_test( "sample_scenario_test", "slc_bx86ea222a6g057x4380" ) job = client.validate.evaluate_model_on_scenario_tests( model_id=model.id, scenario_test_names=["sample_scenario_test"], ) job.sleep_until_complete() # Not required. Will block and update on status of the job. :param model_id: ID of model to evaluate :param scenario_test_names: list of scenario test names of test to evaluate :returns: AsyncJob object of evaluation job