nucleus.metrics.categorization_metrics¶
Evaluation method that matches categories and returns a CategorizationF1Result that aggregates to the F1 score 

Abstract class for metrics related to Categorization 

Base MetricResult class 
 class nucleus.metrics.categorization_metrics.CategorizationF1(confidence_threshold=0.0, f1_method='macro', annotation_filters=None, prediction_filters=None)¶
Evaluation method that matches categories and returns a CategorizationF1Result that aggregates to the F1 score
 Parameters:
confidence_threshold (float) – minimum confidence threshold for predictions to be taken into account for evaluation. Must be in [0, 1]. Default 0.0
f1_method (str) – {‘micro’, ‘macro’, ‘samples’,’weighted’, ‘binary’}, default=’macro’
targets. (This parameter is required for multiclass/multilabel)
None (If)
Otherwise (the scores for each class are returned.)
this
data (determines the type of averaging performed on the)
'binary' – Only report results for the class specified by
pos_label
. This is applicable only if targets (y_{true,pred}
) are binary.'micro' – Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro' – Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted' – Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an Fscore that is not between precision and recall.
'samples' – Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from
accuracy_score()
).annotation_filters (Optional[Union[nucleus.metrics.filtering.ListOfOrAndFilters, nucleus.metrics.filtering.ListOfAndFilters]]) –
Filter predicates. Allowed formats are: ListOfAndFilters where each Filter forms a chain of AND predicates.
or
ListOfOrAndFilters where Filters are expressed in disjunctive normal form (DNF), like [[MetadataFilter(“short_haired”, “==”, True), FieldFilter(“label”, “in”, [“cat”, “dog”]), …]. DNF allows arbitrary boolean logical combinations of single field predicates. The innermost structures each describe a single column predicate. The list of inner predicates is interpreted as a conjunction (AND), forming a more selective and multiple field predicate. Finally, the most outer list combines these filters as a disjunction (OR).
prediction_filters (Optional[Union[nucleus.metrics.filtering.ListOfOrAndFilters, nucleus.metrics.filtering.ListOfAndFilters]]) –
Filter predicates. Allowed formats are: ListOfAndFilters where each Filter forms a chain of AND predicates.
or
ListOfOrAndFilters where Filters are expressed in disjunctive normal form (DNF), like [[MetadataFilter(“short_haired”, “==”, True), FieldFilter(“label”, “in”, [“cat”, “dog”]), …]. DNF allows arbitrary boolean logical combinations of single field predicates. The innermost structures each describe a single column predicate. The list of inner predicates is interpreted as a conjunction (AND), forming a more selective and multiple field predicate. Finally, the most outer list combines these filters as a disjunction (OR).
 aggregate_score(results)¶
A metric must define how to aggregate results from single items to a single ScalarResult.
E.g. to calculate a R2 score with sklearn you could define a custom metric class
class R2Result(MetricResult): y_true: float y_pred: float
And then define an aggregate_score
def aggregate_score(self, results: List[MetricResult]) > ScalarResult: y_trues = [] y_preds = [] for result in results: y_true.append(result.y_true) y_preds.append(result.y_pred) r2_score = sklearn.metrics.r2_score(y_trues, y_preds) return ScalarResult(r2_score)
 Parameters:
results (List[CategorizationResult])
 Return type:
 call_metric(annotations, predictions)¶
A metric must override this method and return a metric result, given annotations and predictions.
 Parameters:
annotations (nucleus.annotation.AnnotationList)
predictions (nucleus.prediction.PredictionList)
 Return type:
 eval(annotations, predictions)¶
Notes: This is a little weird eval function. It essentially only does matching of annotation to label and the actual metric computation happens in the aggregate step since F1 score only makes sense on a collection.
 Parameters:
annotations (List[nucleus.annotation.CategoryAnnotation])
predictions (List[nucleus.prediction.CategoryPrediction])
 Return type:
 class nucleus.metrics.categorization_metrics.CategorizationMetric(confidence_threshold=0.0, annotation_filters=None, prediction_filters=None)¶
Abstract class for metrics related to Categorization
The Categorization class automatically filters incoming annotations and predictions for only categorization annotations. It also filters predictions whose confidence is less than the provided confidence_threshold.
Initializes CategorizationMetric abstract object.
 Parameters:
confidence_threshold (float) – minimum confidence threshold for predictions to be taken into account for evaluation. Must be in [0, 1]. Default 0.0
annotation_filters (Optional[Union[nucleus.metrics.filtering.ListOfOrAndFilters, nucleus.metrics.filtering.ListOfAndFilters]]) –
Filter predicates. Allowed formats are: ListOfAndFilters where each Filter forms a chain of AND predicates.
or
ListOfOrAndFilters where Filters are expressed in disjunctive normal form (DNF), like [[MetadataFilter(“short_haired”, “==”, True), FieldFilter(“label”, “in”, [“cat”, “dog”]), …]. DNF allows arbitrary boolean logical combinations of single field predicates. The innermost structures each describe a single column predicate. The list of inner predicates is interpreted as a conjunction (AND), forming a more selective and multiple field predicate. Finally, the most outer list combines these filters as a disjunction (OR).
prediction_filters (Optional[Union[nucleus.metrics.filtering.ListOfOrAndFilters, nucleus.metrics.filtering.ListOfAndFilters]]) –
Filter predicates. Allowed formats are: ListOfAndFilters where each Filter forms a chain of AND predicates.
or
ListOfOrAndFilters where Filters are expressed in disjunctive normal form (DNF), like [[MetadataFilter(“short_haired”, “==”, True), FieldFilter(“label”, “in”, [“cat”, “dog”]), …]. DNF allows arbitrary boolean logical combinations of single field predicates. The innermost structures each describe a single column predicate. The list of inner predicates is interpreted as a conjunction (AND), forming a more selective and multiple field predicate. Finally, the most outer list combines these filters as a disjunction (OR).
 abstract aggregate_score(results)¶
A metric must define how to aggregate results from single items to a single ScalarResult.
E.g. to calculate a R2 score with sklearn you could define a custom metric class
class R2Result(MetricResult): y_true: float y_pred: float
And then define an aggregate_score
def aggregate_score(self, results: List[MetricResult]) > ScalarResult: y_trues = [] y_preds = [] for result in results: y_true.append(result.y_true) y_preds.append(result.y_pred) r2_score = sklearn.metrics.r2_score(y_trues, y_preds) return ScalarResult(r2_score)
 Parameters:
results (List[CategorizationResult])
 Return type:
 call_metric(annotations, predictions)¶
A metric must override this method and return a metric result, given annotations and predictions.
 Parameters:
annotations (nucleus.annotation.AnnotationList)
predictions (nucleus.prediction.PredictionList)
 Return type:
 class nucleus.metrics.categorization_metrics.CategorizationResult¶
Base MetricResult class