nucleus#

Nucleus Python SDK.

AsyncJob

Object used to check the status or errors of a long running asynchronous operation.

BoxAnnotation

A bounding box annotation.

BoxPrediction

Prediction of a bounding box.

CameraParams

Camera position/heading used to record the image.

CategoryAnnotation

A category annotation.

CategoryPrediction

A prediction of a category.

CuboidAnnotation

A 3D Cuboid annotation.

CuboidPrediction

A prediction of 3D cuboid.

Dataset

Datasets are collections of your data that can be associated with models.

DatasetInfo

High-level Dataset information

DatasetItem

A dataset item is an image or pointcloud that has associated metadata.

Frame

Collection of sensor data pertaining to a single time step.

Keypoint

A 2D point that has an additional visibility flag.

KeypointsAnnotation

A keypoints annotation containing a list of keypoints and the structure

KeypointsPrediction

Prediction of keypoints.

LidarScene

Sequence of lidar pointcloud and camera images over time.

LineAnnotation

A polyline annotation consisting of an ordered list of 2D points.

LinePrediction

Prediction of a line.

Model

A model that can be used to upload predictions to a dataset.

NucleusClient

Client to interact with the Nucleus API via Python SDK.

Point

A point in 2D space.

Point3D

A point in 3D space.

PolygonAnnotation

A polygon annotation consisting of an ordered list of 2D points.

PolygonPrediction

Prediction of a polygon.

Quaternion

Quaternion objects are used to represent rotation.

SceneCategoryAnnotation

A scene category annotation.

SceneCategoryPrediction

A prediction of a category for a scene.

Segment

Segment represents either a class or an instance depending on the task type.

SegmentationAnnotation

A segmentation mask on a 2D image.

SegmentationPrediction

Predicted segmentation mask on a 2D image.

Slice

A Slice represents a subset of DatasetItems in your Dataset.

VideoScene

Video or sequence of images over time.

class nucleus.AsyncJob#

Object used to check the status or errors of a long running asynchronous operation.

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg")

# When kicking off an asynchronous job, store the return value as a variable
job = dataset.append(items=YOUR_DATASET_ITEMS, asynchronous=True)

# Poll for status or errors
print(job.status())
print(job.errors())

# Block until job finishes
job.sleep_until_complete()
errors()#

Fetches a list of the latest errors generated by the asynchronous job.

Useful for debugging failed or partially successful jobs.

Returns

A list of strings containing the 10,000 most recently generated errors.

[
    '{"annotation":{"label":"car","type":"box","geometry":{"x":50,"y":60,"width":70,"height":80},"referenceId":"bad_ref_id","annotationId":"attempted_annot_upload","metadata":{}},"error":"Item with id bad_ref_id does not exist."}'
]

Return type

List[str]

sleep_until_complete(verbose_std_out=True)#

Blocks until the job completes or errors.

Parameters

verbose_std_out (Optional[bool]) – Whether or not to verbosely log while sleeping. Defaults to True.

status()#

Fetches status of the job and an informative message on job progress.

Returns

A dict of the job ID, status (one of Running, Completed, or Errored), an informative message on the job progress, and number of both completed and total steps.

{
    "job_id": "job_c19xcf9mkws46gah0000",
    "status": "Completed",
    "message": "Job completed successfully.",
    "job_progress": "0.33",
    "completed_steps": "1",
    "total_steps:": "3",
}

Return type

Dict[str, str]

class nucleus.BoxAnnotation#

A bounding box annotation.

from nucleus import BoxAnnotation

box = BoxAnnotation(
    label="car",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="image_1",
    annotation_id="image_1_car_box_1",
    metadata={"vehicle_color": "red"},
    embedding_vector=[0.1423, 1.432, ..., 3.829],
    track_reference_id="car_a",
)
Parameters
  • label (str) – The label for this annotation.

  • x (Union[float, int]) – The distance, in pixels, between the left border of the bounding box and the left border of the image.

  • y (Union[float, int]) – The distance, in pixels, between the top border of the bounding box and the top border of the image.

  • width (Union[float, int]) – The width in pixels of the annotation.

  • height (Union[float, int]) – The height in pixels of the annotation.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and overwritten if update=True for dataset.annotate. If no annotation ID is passed, one will be automatically generated using the label, x, y, width, and height, so that you can make inserts idempotently as identical boxes will be ignored.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

  • embedding_vector – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.BoxPrediction(label, x, y, width, height, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, embedding_vector=None, track_reference_id=None)#

Prediction of a bounding box.

Parameters
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle)

  • x (Union[float, int]) – The distance, in pixels, between the left border of the bounding box and the left border of the image.

  • y (Union[float, int]) – The distance, in pixels, between the top border of the bounding box and the top border of the image.

  • width (Union[float, int]) – The width in pixels of the annotation.

  • height (Union[float, int]) – The height in pixels of the annotation.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. If no annotation ID is passed, one will be automatically generated using the label, x, y, width, and height, so that you can make inserts idempotently and identical boxes will be ignored.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • embedding_vector (Optional[List]) – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.CameraParams#

Camera position/heading used to record the image.

Parameters
  • position (Point3D) – World-normalized position of the camera

  • heading (Quaternion) – Vector4 indicating the quaternion of the camera direction; note that the z-axis of the camera frame represents the camera’s optical axis. See Heading Examples.

  • fx (float) – Focal length in x direction (in pixels).

  • fy (float) – Focal length in y direction (in pixels).

  • cx (float) – Principal point x value.

  • cy (float) – Principal point y value.

classmethod from_json(payload)#

Instantiates camera params object from schematized JSON dict payload.

Parameters

payload (Dict[str, Any]) –

to_payload()#

Serializes camera params object to schematized JSON dict.

Return type

dict

class nucleus.CategoryAnnotation#

A category annotation.

from nucleus import CategoryAnnotation

category = CategoryAnnotation(
    label="dress",
    reference_id="image_1",
    taxonomy_name="clothing_type",
    metadata={"dress_color": "navy"},
    track_reference_id="blue_and_black_dress",
)
Parameters
  • label (str) – The label for this annotation.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.CategoryPrediction(label, reference_id, taxonomy_name=None, confidence=None, metadata=None, class_pdf=None, track_reference_id=None)#

A prediction of a category.

Parameters
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle).

  • reference_id (str) – The reference ID of the image you wish to apply this annotation to.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this prediction. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.CuboidAnnotation#

A 3D Cuboid annotation.

from nucleus import CuboidAnnotation

cuboid = CuboidAnnotation(
    label="car",
    position=Point3D(100, 100, 10),
    dimensions=Point3D(5, 10, 5),
    yaw=0,
    reference_id="pointcloud_1",
    annotation_id="pointcloud_1_car_cuboid_1",
    metadata={"vehicle_color": "green"},
    track_reference_id="red_car",
)
Parameters
  • label (str) – The label for this annotation.

  • position (Point3D) – The point at the center of the cuboid

  • dimensions (Point3D) – The length (x), width (y), and height (z) of the cuboid

  • yaw (float) – The rotation, in radians, about the Z axis of the cuboid

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[str]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.CuboidPrediction(label, position, dimensions, yaw, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, track_reference_id=None)#

A prediction of 3D cuboid.

Parameters
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle)

  • position (Point3D) – The point at the center of the cuboid

  • dimensions (Point3D) – The length (x), width (y), and height (z) of the cuboid

  • yaw (float) – The rotation, in radians, about the Z axis of the cuboid

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[str]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.Dataset(dataset_id, client, name=None)#

Datasets are collections of your data that can be associated with models.

You can append DatasetItems or Scenes with metadata to your dataset, annotate it with ground truth, and upload model predictions to evaluate and compare model performance on your data.

Make sure that the dataset is set up correctly supporting the required datatype (see code sample below).

Datasets cannot be instantiated directly and instead must be created via API endpoint using NucleusClient.create_dataset(), or in the dashboard.

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)

# Create new dataset supporting DatasetItems
dataset = client.create_dataset(YOUR_DATASET_NAME, is_scene=False)

# OR create new dataset supporting LidarScenes
dataset = client.create_dataset(YOUR_DATASET_NAME, is_scene=True)

# Or, retrieve existing dataset by ID
# This ID can be fetched using client.list_datasets() or from a dashboard URL
existing_dataset = client.get_dataset("YOUR_DATASET_ID")
Parameters

client (nucleus.NucleusClient) –

add_taxonomy(taxonomy_name, taxonomy_type, labels, update=False)#

Creates a new taxonomy.

At the moment we only support taxonomies for category annotations and predictions.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg")

response = dataset.add_taxonomy(
    taxonomy_name="clothing_type",
    taxonomy_type="category",
    labels=["shirt", "trousers", "dress"],
    update=False
)
Parameters
  • taxonomy_name (str) – The name of the taxonomy. Taxonomy names must be unique within a dataset.

  • taxonomy_type (str) – The type of this taxonomy as a string literal. Currently, the only supported taxonomy type is “category.”

  • labels (List[str]) – The list of possible labels for the taxonomy.

  • update (bool) – Whether or not to update taxonomy labels on taxonomy name collision. Default is False. Note that taxonomy labels will not be deleted on update, they can only be appended.

Returns

Returns a response with dataset_id, taxonomy_name, and status of the add taxonomy operation.

{
    "dataset_id": str,
    "taxonomy_name": str,
    "status": "Taxonomy created"
}

annotate(annotations, update=DEFAULT_ANNOTATION_UPDATE_MODE, batch_size=5000, asynchronous=False, remote_files_per_upload_request=20, local_files_per_upload_request=10)#

Uploads ground truth annotations to the dataset.

Adding ground truth to your dataset in Nucleus allows you to visualize annotations, query dataset items based on the annotations they contain, and evaluate models by comparing their predictions to ground truth.

Nucleus supports Box, Polygon, Cuboid, Segmentation, Category, and Category annotations. Cuboid annotations can only be uploaded to a pointcloud DatasetItem.

When uploading an annotation, you need to specify which item you are annotating via the reference_id you provided when uploading the image or pointcloud.

Ground truth uploads can be made idempotent by specifying an optional annotation_id for each annotation. This id should be unique within the dataset_item so that (reference_id, annotation_id) is unique within the dataset.

See SegmentationAnnotation for specific requirements to upload segmentation annotations.

For ingesting large annotation payloads, see the Guide for Large Ingestions.

Parameters
  • annotations (Sequence[Annotation]) – List of annotation objects to upload.

  • update (bool) – Whether to ignore or overwrite metadata for conflicting annotations.

  • batch_size (int) – Number of annotations processed in each concurrent batch. Default is 5000. If you get timeouts when uploading geometric annotations, you can try lowering this batch size.

  • asynchronous (bool) – Whether or not to process the upload asynchronously (and return an AsyncJob object). Default is False.

  • remote_files_per_upload_request (int) – Number of remote files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with remote urls, you should lower this value from its default of 20.

  • local_files_per_upload_request (int) – Number of local files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with local files, you should lower this value from its default of 10. The maximum is 10.

Returns

If synchronous, payload describing the upload result:

{
    "dataset_id": str,
    "annotations_processed": int
}

Otherwise, returns an AsyncJob object.

Return type

Union[Dict[str, Any], nucleus.async_job.AsyncJob]

append(items, update=False, batch_size=20, asynchronous=False, local_files_per_upload_request=10)#

Appends items or scenes to a dataset.

Note

Datasets can only accept one of DatasetItems or Scenes, never both.

This behavior is set during Dataset creation with the is_scene flag.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")

local_item = nucleus.DatasetItem(
  image_location="./1.jpg",
  reference_id="image_1",
  metadata={"key": "value"}
)
remote_item = nucleus.DatasetItem(
  image_location="s3://your-bucket/2.jpg",
  reference_id="image_2",
  metadata={"key": "value"}
)

# default is synchronous upload
sync_response = dataset.append(items=[local_item])

# async jobs have higher throughput but can be more difficult to debug
async_job = dataset.append(
  items=[remote_item], # all items must be remote for async
  asynchronous=True
)
print(async_job.status())

A Dataset can be populated with labeled and unlabeled data. Using Nucleus, you can filter down the data inside your dataset using custom metadata about your images.

For instance, your local dataset may contain Sunny, Foggy, and Rainy folders of images. All of these images can be uploaded into a single Nucleus Dataset, with (queryable) metadata like {"weather": "Sunny"}.

To update an item’s metadata, you can re-ingest the same items with the update argument set to true. Existing metadata will be overwritten for DatasetItems in the payload that share a reference_id with a previously uploaded DatasetItem. To retrieve your existing reference_ids, use Dataset.items().

# overwrite metadata by reuploading the item
remote_item.metadata["weather"] = "Sunny"

async_job_2 = dataset.append(
  items=[remote_item],
  update=True,
  asynchronous=True
)
Parameters
  • items (Union[Sequence[nucleus.dataset_item.DatasetItem], Sequence[nucleus.scene.LidarScene], Sequence[nucleus.scene.VideoScene]]) – ( Union[ Sequence[DatasetItem], Sequence[LidarScene] Sequence[VideoScene] ]): List of items or scenes to upload.

  • batch_size (int) – Size of the batch for larger uploads. Default is 20. This is for items that have a remote URL and do not require a local upload. If you get timeouts for uploading remote urls, try decreasing this.

  • update (bool) – Whether or not to overwrite metadata on reference ID collision. Default is False.

  • asynchronous (bool) – Whether or not to process the upload asynchronously (and return an AsyncJob object). This is required when uploading scenes. Default is False.

  • files_per_upload_request – Optional; default is 10. We recommend lowering this if you encounter timeouts.

  • local_files_per_upload_request (int) – Optional; default is 10. We recommend lowering this if you encounter timeouts.

Returns

For scenes

If synchronous, returns a payload describing the upload result:

{
    "dataset_id: str,
    "new_items": int,
    "updated_items": int,
    "ignored_items": int,
    "upload_errors": int
}

Otherwise, returns an AsyncJob object.

For images

If synchronous returns nucleus.upload_response.UploadResponse otherwise AsyncJob

Return type

Union[Dict[Any, Any], nucleus.async_job.AsyncJob, nucleus.upload_response.UploadResponse]

autotag_items(autotag_name, for_scores_greater_than=0)#

Fetches the autotag’s items above the score threshold, sorted by descending score.

Parameters
  • autotag_name – The user-defined name of the autotag.

  • for_scores_greater_than (Optional[int]) – Score threshold between -1 and 1 above which to include autotag items.

Returns

List of autotagged items above the given score threshold, sorted by descending score, and autotag info, packaged into a dict as follows:

{
    "autotagItems": List[{
        ref_id: str,
        score: float,
        model_prediction_annotation_id: str | None
        ground_truth_annotation_id: str | None,
    }],
    "autotag": {
        id: str,
        name: str,
        status: "started" | "completed",
        autotag_level: "Image" | "Object"
    }
}

Note model_prediction_annotation_id and ground_truth_annotation_id are only relevant for object autotags.

autotag_training_items(autotag_name)#

Fetches items that were manually selected during refinement of the autotag.

Parameters

autotag_name – The user-defined name of the autotag.

Returns

List of user-selected positives and autotag info, packaged into a dict as follows:

{
    "autotagPositiveTrainingItems": List[{
        ref_id: str,
        model_prediction_annotation_id: str | None,
        ground_truth_annotation_id: str | None,
    }],
    "autotag": {
        id: str,
        name: str,
        status: "started" | "completed",
        autotag_level: "Image" | "Object"
    }
}

Note model_prediction_annotation_id and ground_truth_annotation_id are only relevant for object autotags.

build_slice(name, sample_size, sample_method, filters=None)#

Build a slice using Nucleus’ Smart Sample tool. Allowing slices to be built based on certain criteria, and filters.

Parameters
  • name (str) – Name for the slice being created. Must be unique per dataset.

  • sample_size (int) – Size of the slice to create. Capped by the size of the dataset and the applied filters.

  • sample_method (Union[str, nucleus.slice.SliceBuilderMethods]) – How to sample the dataset, currently supports ‘Random’ and ‘Uniqueness’

  • filters (Optional[nucleus.slice.SliceBuilderFilters]) – Apply filters to only sample from an existing slice or autotag

Return type

Union[str, Tuple[nucleus.async_job.AsyncJob, str], dict]

Examples

from nucleus.slice import SliceBuilderFilters, SliceBuilderMethods, SliceBuilderFilterAutotag

# random slice job = dataset.build_slice(“RandomSlice”, 20, SliceBuilderMethods.RANDOM)

# slice with filters filters = SliceBuilderFilters(

slice_id=”<some slice id>”, autotag=SliceBuilderFilterAutotag(“tag_cd41jhjdqyti07h8m1n1”, [-0.5, 0.5])

) job = dataset.build_slice(“NewSlice”, 20, SliceBuilderMethods.RANDOM, filters)

Returns: An async job

calculate_evaluation_metrics(model, options=None)#

Starts computation of evaluation metrics for a model on the dataset.

To update matches and metrics calculated for a model on a given dataset you can call this endpoint. This is required in order to sort by IOU, view false positives/false negatives, and view model insights.

You can add predictions from a model to a dataset after running the calculation of the metrics. However, the calculation of metrics will have to be retriggered for the new predictions to be matched with ground truth and appear as false positives/negatives, or for the new predictions effect on metrics to be reflected in model run insights.

During IoU calculation, bounding box Predictions are compared to GroundTruth using a greedy matching algorithm that matches prediction and ground truth boxes that have the highest ious first. By default the matching algorithm is class-agnostic: it will greedily create matches regardless of the class labels.

The algorithm can be tuned to classify true positives between certain classes, but not others. This is useful if the labels in your ground truth do not match the exact strings of your model predictions, or if you want to associate multiple predictions with one ground truth label, or multiple ground truth labels with one prediction. To recompute metrics based on different matching, you can re-commit the run with new request parameters.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset(dataset_id="YOUR_DATASET_ID")

model = client.get_model(model_id="YOUR_MODEL_PRJ_ID")

# Compute all evaluation metrics including IOU-based matching:
dataset.calculate_evaluation_metrics(model)

# Match car and bus bounding boxes (for IOU computation)
# Otherwise enforce that class labels must match
dataset.calculate_evaluation_metrics(model, options={
  'allowed_label_matches': [
    {
      'ground_truth_label': 'car',
      'model_prediction_label': 'bus'
    },
    {
      'ground_truth_label': 'bus',
      'model_prediction_label': 'car'
    }
  ]
})
Parameters
  • model (Model) – The model object for which to calculate metrics.

  • options (dict) –

    Dictionary of specific options to configure metrics calculation.

    class_agnostic

    Whether ground truth and prediction classes can differ when being matched for evaluation metrics. Default is True.

    allowed_label_matches

    Pairs of ground truth and prediction classes that should be considered matchable when computing metrics. If supplied, class_agnostic must be False.

    {
        "class_agnostic": bool,
        "allowed_label_matches": List[{
            "ground_truth_label": str,
            "model_prediction_label": str
        }]
    }
    

create_custom_index(embeddings_urls, embedding_dim)#

Processes user-provided embeddings for the dataset to use with autotag and simsearch.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")

all_embeddings = {
    "reference_id_0": [0.1, 0.2, 0.3],
    "reference_id_1": [0.4, 0.5, 0.6],
    ...
    "reference_id_10000": [0.7, 0.8, 0.9]
} # sharded and uploaded to s3 with the two below URLs

embeddings_url_1 = "s3://dataset/embeddings_map_1.json"
embeddings_url_2 = "s3://dataset/embeddings_map_2.json"

response = dataset.create_custom_index(
    embeddings_url=[embeddings_url_1, embeddings_url_2],
    embedding_dim=3
)
Parameters
  • embeddings_urls (List[str]) – List of URLs, each of which pointing to a JSON mapping reference_id -> embedding vector. Each embedding JSON must contain <5000 rows.

  • embedding_dim (int) – The dimension of the embedding vectors. Must be consistent across all embedding vectors in the index.

Returns

Asynchronous job object to track processing status.

Return type

AsyncJob

create_image_index()#

Creates or updates image index by generating embeddings for images that do not already have embeddings.

The embeddings are used for autotag and similarity search.

This endpoint is limited to index up to 2 million images at a time and the job will fail for payloads that exceed this limit.

Returns

Asynchronous job object to track processing status.

Return type

AsyncJob

create_object_index(model_run_id=None, gt_only=None)#

Creates or updates object index by generating embeddings for objects that do not already have embeddings.

These embeddings are used for autotag and similarity search. This endpoint only supports indexing objects sourced from the predictions of a specific model or the ground truth annotations of the dataset.

This endpoint is idempotent. If this endpoint is called again for a model whose predictions were indexed in the past, the previously indexed predictions will not have new embeddings recomputed. The same is true for ground truth annotations.

Note that this means if you change update a prediction or ground truth bounding box that already has an associated embedding, the embedding will not be updated, even with another call to this endpoint. For now, we recommend deleting the prediction or ground truth annotation and re-inserting it to force generate a new embedding.

This endpoint is limited to generating embeddings for 3 million objects at a time and the job will fail for payloads that exceed this limit.

Parameters
  • model_run_id (str) –

    The ID of the model whose predictions should be indexed. Default is None, but must be supplied in the absence of gt_only.

  • gt_only (bool) – Whether to only generate embeddings for the ground truth annotations of the dataset. Default is None, but must be supplied in the absence of model_run_id.

Returns

Asynchronous job object to track processing status.

Return type

AsyncJob

create_slice(name, reference_ids)#

Creates a Slice of dataset items within a dataset.

Parameters
  • name (str) – A human-readable name for the slice.

  • reference_ids (List[str]) – List of reference IDs of dataset items to add to the slice:

Returns

The newly constructed slice item.

Return type

Slice

delete_annotations(reference_ids=None, keep_history=True)#

Deletes all annotations associated with the specified item reference IDs.

Parameters
  • reference_ids (list) – List of user-defined reference IDs of the dataset items from which to delete annotations. Defaults to an empty list.

  • keep_history (bool) – Whether to preserve version history. We recommend skipping this parameter and using the default value of True.

Returns

Empty payload response.

Return type

AsyncJob

delete_custom_index(image=True)#

Deletes the custom index uploaded to the dataset.

Returns

Payload containing information that can be used to track the job’s status:

{
    "dataset_id": str,
    "job_id": str,
    "message": str
}

Parameters

image (bool) –

delete_item(reference_id)#

Deletes an item from the dataset by item reference ID.

All annotations and predictions associated with the item will be deleted as well.

Parameters

reference_id (str) – The user-defined reference ID of the item to delete.

Returns

Payload to indicate deletion invocation.

Return type

dict

delete_scene(reference_id)#

Deletes a sene from the Dataset by scene reference ID.

All items, annotations, and predictions associated with the scene will be deleted as well.

Parameters

reference_id (str) – The user-defined reference ID of the item to delete.

delete_taxonomy(taxonomy_name)#

Deletes the given taxonomy.

All annotations and predictions associated with the taxonomy will be deleted as well.

Parameters

taxonomy_name (str) – The name of the taxonomy.

Returns

Returns a response with dataset_id, taxonomy_name, and status of the delete taxonomy operation.

{
    "dataset_id": str,
    "taxonomy_name": str,
    "status": "Taxonomy successfully deleted"
}

delete_tracks(track_reference_ids)#

Deletes a list of tracks from the dataset, thereby unlinking their annotation and prediction instances.

Parameters
  • reference_ids (List[str]) – A list of reference IDs for tracks to delete.

  • track_reference_ids (List[str]) –

Return type

None

export_embeddings()#

Fetches a pd.DataFrame-ready list of dataset embeddings.

Returns

A list, where each item is a dict with two keys representing a row in the dataset:

List[{
    "reference_id": str,
    "embedding_vector": List[float]
}]

Return type

List[Dict[str, Union[str, List[float]]]]

export_predictions(model)#

Fetches all predictions of a model that were uploaded to the dataset.

Parameters

model (Model) – The model whose predictions to retrieve.

Returns

List of prediction objects from the model.

Return type

List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction, KeypointsPrediction, ]]

export_scale_task_info()#

Fetches info for all linked Scale tasks of items/scenes in the dataset.

Returns

A list of dicts, each with two keys, respectively mapping to items/scenes and info on their corresponding Scale tasks within the dataset:

List[{
    "item" | "scene": Union[:class:`DatasetItem`, :class:`Scene`],
    "scale_task_info": {
        "task_id": str,
        "task_status": str,
        "task_audit_status": str,
        "task_audit_review_comment": Optional[str],
        "project_name": str,
        "batch": str,
        "created_at": str,
        "completed_at": Optional[str]
    }[]
}]

get_image_indexing_status()#

Gets the primary image index progress for the dataset.

Returns

Response payload:

{
    "embedding_count": int
    "image_count": int
    "percent_indexed": float
    "additional_context": str
}

get_object_indexing_status(model_run_id=None)#

Gets the primary object index progress of the dataset. If model_run_id is not specified, this endpoint will retrieve the indexing progress of the ground truth objects.

Returns

Response payload:

{
    "embedding_count": int
    "object_count": int
    "percent_indexed": float
    "additional_context": str
}

get_scene(reference_id)#

Fetches a single scene in the dataset by its reference ID.

Parameters

reference_id (str) – The user-defined reference ID of the scene to fetch.

Returns

A scene object containing frames, which in turn contain pointcloud or image items.

Return type

Scene

get_scene_from_item_ref_id(item_reference_id)#

Given a dataset item reference ID, find the Scene it belongs to.

Parameters

item_reference_id (str) –

Return type

Optional[nucleus.scene.Scene]

get_slices(name=None, slice_type=None)#

Get a list of slices from its name or underlying slice type.

Parameters
  • name (Optional[str]) – Name of the desired slice to look up.

  • slice_type (Optional[Union[str, nucleus.slice.SliceType]]) – Type of slice to look up. This can be one of (‘dataset_item’, ‘object’, ‘scene’)

Raises

NotFound if no slice(s) were found with the given criteria

Returns

The Nucleus slice as an object.

Return type

Slice

ground_truth_loc(reference_id, annotation_id)#

Fetches a single ground truth annotation by ID.

Parameters
  • reference_id (str) – User-defined reference ID of the dataset item associated with the ground truth annotation.

  • annotation_id (str) – User-defined ID of the ground truth annotation.

Returns

Ground truth annotation object with the specified annotation ID.

Return type

Union[ BoxAnnotation, LineAnnotation, PolygonAnnotation, KeypointsAnnotation, CuboidAnnotation, SegmentationAnnotation CategoryAnnotation ]

iloc(i)#

Fetches dataset item and associated annotations by absolute numerical index.

Parameters

i (int) – Absolute numerical index of the dataset item within the dataset.

Returns

Payload describing the dataset item and associated annotations:

{
    "item": DatasetItem
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "keypoints": Optional[List[KeypointsAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
    }
}

Return type

dict

info()#

Fetches information about the dataset.

Returns

Information about the dataset including its Scale-generated ID, name, length, associated Models, Slices, and more.

Return type

DatasetInfo

ingest_tasks(task_ids)#

Ingest specific tasks from an existing Scale or Rapid project into the dataset.

Note: if you would like to create a new Dataset from an exisiting Scale labeling project, use NucleusClient.create_dataset_from_project().

For more info, see our Ingest From Labeling Guide.

Parameters

task_ids (List[str]) – List of task IDs to ingest.

Returns

Payload describing the asynchronous upload result:

{
    "ingested_tasks": int,
    "ignored_tasks": int,
    "pending_tasks": int
}

Return type

dict

items_and_annotation_generator()#

Provides a generator of all DatasetItems and Annotations in the dataset.

Returns

Generator where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type.

Iterable[{
    "item": DatasetItem,
    "annotations": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "line": Optional[List[LineAnnotation]],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
        "keypoints": List[KeypointsAnnotation],
    }
}]

Return type

Iterable[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_and_annotations()#

Returns a list of all DatasetItems and Annotations in this dataset.

Returns

A list of dicts, each with two keys representing a row in the dataset:

List[{
    "item": DatasetItem,
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
        "keypoints": Optional[List[KeypointsAnnotation]],
    }
}]

Return type

List[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_generator(page_size=100000)#

Generator yielding all dataset items in the dataset.

collected_ref_ids = []
for item in dataset.items_generator():
    print(f"Exporting item: {item.reference_id}")
    collected_ref_ids.append(item.reference_id)
Parameters

page_size (int, optional) – Number of items to return per page. If you are experiencing timeouts while using this generator, you can try lowering the page size.

Yields

DatasetItem – A single DatasetItem object.

Return type

Iterable[nucleus.dataset_item.DatasetItem]

jobs(job_types=None, from_date=None, to_date=None, limit=JOB_REQ_LIMIT, show_completed=False, stats_only=False)#

Fetch jobs pertaining to this particular dataset.

Parameters
  • job_types (Optional[List[nucleus.job.CustomerJobTypes]]) – Filter on set of job types, if None, fetch all types, ie: [‘uploadDatasetItems’]

  • from_date (Optional[Union[str, datetime.datetime]]) – beginning of date range, as a string ‘YYYY-MM-DD’ or datetime object. For example: ‘2021-11-05’, parser.parse(‘Nov 5 2021’), or datetime(2021,11,5)

  • to_date (Optional[Union[str, datetime.datetime]]) – end of date range

  • limit (int) – number of results to fetch, max 50_000

  • show_completed (bool) – dont fetch jobs with Completed status

  • stats_only (bool) – return overview of jobs, instead of a list of job objects

list_autotags()#

Fetches all autotags of the dataset.

Returns

List of autotag payloads:

List[{
    "id": str,
    "name": str,
    "status": "completed" | "pending",
    "autotag_level": "Image" | "Object"
}]

loc(dataset_item_id)#

Fetches a dataset item and associated annotations by Nucleus-generated ID.

Parameters

dataset_item_id (str) – Nucleus-generated dataset item ID (starts with di_). This can be retrieved via Dataset.items() or a Nucleus dashboard URL.

Returns

Payload containing the dataset item and associated annotations:

{
    "item": DatasetItem
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "keypoints": Optional[List[KeypointsAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
    }
}

Return type

dict

prediction_loc(model, reference_id, annotation_id)#

Fetches a single ground truth annotation by id.

Parameters
  • model (Model) – Model object from which to fetch the prediction.

  • reference_id (str) – User-defined reference ID of the dataset item associated with the model prediction.

  • annotation_id (str) – User-defined ID of the ground truth annotation.

Returns

Model prediction object with the specified annotation ID.

Return type

Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction KeypointsPrediction ]

predictions_iloc(model, index)#

Fetches all predictions of a dataset item by its absolute index.

Parameters
  • model (Model) – Model object from which to fetch the prediction.

  • index (int) – Absolute index of the dataset item within the dataset.

Returns

Dictionary mapping prediction type to a list of such prediction objects from the given model:

{
    "box": List[BoxPrediction],
    "polygon": List[PolygonPrediction],
    "cuboid": List[CuboidPrediction],
    "segmentation": List[SegmentationPrediction],
    "category": List[CategoryPrediction],
    "keypoints": List[KeypointsPrediction],
}

Return type

List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction, KeypointsPrediction, ]]

predictions_refloc(model, reference_id)#

Fetches all predictions of a dataset item by its reference ID.

Parameters
  • model (Model) – Model object from which to fetch the prediction.

  • reference_id (str) – User-defined ID of the dataset item from which to fetch all predictions.

Returns

Dictionary mapping prediction type to a list of such prediction objects from the given model:

{
    "box": List[BoxPrediction],
    "polygon": List[PolygonPrediction],
    "cuboid": List[CuboidPrediction],
    "segmentation": List[SegmentationPrediction],
    "category": List[CategoryPrediction],
    "keypoints": List[KeypointsPrediction],
}

Return type

List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction, KeypointsPrediction, ]]

query_items(query)#

Fetches all DatasetItems that pertain to a given structured query.

Parameters

query (str) – Structured query compatible with the Nucleus query language.

Returns

A list of DatasetItem query results.

Return type

Iterable[nucleus.dataset_item.DatasetItem]

refloc(reference_id)#

Fetches a dataset item and associated annotations by reference ID.

Parameters

reference_id (str) – User-defined reference ID of the dataset item.

Returns

Payload containing the dataset item and associated annotations:

{
    "item": DatasetItem
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "keypoints": Option[List[KeypointsAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
    }
}

Return type

dict

set_continuous_indexing(enable=True)#

Toggle whether embeddings are automatically generated for new data.

Sets continuous indexing for a given dataset, which will automatically generate embeddings for use with autotag whenever new images are uploaded.

Parameters

enable (bool) – Whether to enable or disable continuous indexing. Default is True.

Returns

Response payload:

{
    "dataset_id": str,
    "message": str
    "backfill_job": AsyncJob,
}

set_primary_index(image=True, custom=False)#

Sets the primary index used for Autotag and Similarity Search on this dataset.

Parameters
  • image (bool) – Whether to configure the primary index for images or objects. Default is True (set primary image index).

  • custom (bool) – Whether to set the primary index to use custom or Nucleus-generated embeddings. Default is True (use custom embeddings as the primary index).

Returns

{

“success”: bool,

}

update_autotag(autotag_id)#

Rerun autotag inference on all items in the dataset.

Currently this endpoint does not try to skip already inferenced items, but this improvement is planned for the future. This means that for now, you can only have one job running at a time, so please await the result using job.sleep_until_complete() before launching another job.

Parameters

autotag_id (str) – ID of the autotag to re-inference. You can retrieve the ID you want with list_autotags(), or from its URL in the “Manage Autotags” page in the dashboard.

Returns

Asynchronous job object to track processing status.

Return type

AsyncJob

update_item_metadata(mapping, asynchronous=False)#

Update (merge) dataset item metadata for each reference_id given in the mapping. The backend will join the specified mapping metadata to the existing metadata. If there is a key-collision, the value given in the mapping will take precedence.

This method may also be used to udpate the camera_params for a particular set of items. Just specify the key camera_params in the metadata for each reference_id along with all the necessary fields.

Parameters
  • mapping (Dict[str, dict]) – key-value pair of <reference_id>: <metadata>

  • asynchronous (bool) – if True, run the update as a background job

Examples

>>> mapping = {"item_ref_1": {"new_key": "foo"}, "item_ref_2": {"some_value": 123, "camera_params": {...}}}
>>> dataset.update_item_metadata(mapping)
Returns

A dictionary outlining success or failures.

Parameters
  • mapping (Dict[str, dict]) –

  • asynchronous (bool) –

update_scene_metadata(mapping, asynchronous=False)#

Update (merge) scene metadata for each reference_id given in the mapping. The backend will join the specified mapping metadata to the existing metadata. If there is a key-collision, the value given in the mapping will take precedence.

Parameters
  • mapping (Dict[str, dict]) – key-value pair of <reference_id>: <metadata>

  • asynchronous (bool) – if True, run the update as a background job

Examples

>>> mapping = {"scene_ref_1": {"new_key": "foo"}, "scene_ref_2": {"some_value": 123}}
>>> dataset.update_scene_metadata(mapping)
Returns

A dictionary outlining success or failures.

Parameters
  • mapping (Dict[str, dict]) –

  • asynchronous (bool) –

upload_predictions(model, predictions, update=False, asynchronous=False, batch_size=5000, remote_files_per_upload_request=20, local_files_per_upload_request=10)#

Uploads predictions and associates them with an existing Model.

Adding predictions to your dataset in Nucleus allows you to visualize discrepancies against ground truth, query dataset items based on the predictions they contain, and evaluate your models by comparing their predictions to ground truth.

Nucleus supports Box, Polygon, Cuboid, Segmentation, Category, and Category predictions. Cuboid predictions can only be uploaded to a pointcloud DatasetItem.

When uploading an prediction, you need to specify which item you are annotating via the reference_id you provided when uploading the image or pointcloud.

Ground truth uploads can be made idempotent by specifying an optional annotation_id for each prediction. This id should be unique within the dataset_item so that (reference_id, annotation_id) is unique within the dataset.

See SegmentationPrediction for specific requirements to upload segmentation predictions.

For ingesting large prediction payloads, see the Guide for Large Ingestions.

Parameters
  • model (Model) – Nucleus-generated model ID (starts with prj_). This can be retrieved via list_models() or a Nucleus dashboard URL.

  • predictions (List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction, CategoryPrediction SceneCategoryPrediction ]]) – List of prediction objects to upload.

  • update (bool) – Whether or not to overwrite metadata or ignore on reference ID collision. Default is False.

  • asynchronous (bool) – Whether or not to process the upload asynchronously (and return an AsyncJob object). Default is False.

  • batch_size (int) – Number of predictions processed in each concurrent batch. Default is 5000. If you get timeouts when uploading geometric predictions, you can try lowering this batch size. This is only relevant for asynchronous=False

  • remote_files_per_upload_request (int) – Number of remote files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with remote urls, you should lower this value from its default of 20. This is only relevant for asynchronous=False.

  • local_files_per_upload_request (int) – Number of local files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with local files, you should lower this value from its default of 10. The maximum is 10. This is only relevant for asynchronous=False

Returns

Payload describing the synchronous upload:

{
    "dataset_id": str,
    "model_run_id": str,
    "predictions_processed": int,
    "predictions_ignored": int,
}

class nucleus.DatasetInfo#

High-level Dataset information

dataset_id#

Nucleus-generated dataset ID

name#

User-defined name of dataset

length#

Number of DatasetItem in Dataset

model_run_ids#

(deprecated)

slice_ids#

List Slice IDs associated with the Dataset

annotation_metadata_schema#

Dict defining annotation-level metadata schema.

item_metadata_schema#

Dict defining item metadata schema.

class nucleus.DatasetItem#

A dataset item is an image or pointcloud that has associated metadata.

Note: for 3D data, please include a CameraParams object under a key named “camera_params” within the metadata dictionary. This will allow for projecting 3D annotations to any image within a scene.

Parameters
  • image_location (Optional[str]) – Required if pointcloud_location is not present: The location containing the image for the given row of data. This can be a local path, or a remote URL. Remote formats supported include any URL (http:// or https://) or URIs for AWS S3, Azure, or GCS (i.e. s3://, gcs://).

  • pointcloud_location (Optional[str]) – Required if image_location is not present: The remote URL containing the pointcloud JSON. Remote formats supported include any URL (http:// or https://) or URIs for AWS S3, Azure, or GCS (i.e. s3://, gcs://).

  • reference_id (Optional[str]) – A user-specified identifier to reference the item.

  • metadata (Optional[dict]) –

    Extra information about the particular dataset item. ints, floats, string values will be made searchable in the query bar by the key in this dict. For example, {"animal": "dog"} will become searchable via metadata.animal = "dog".

    Categorical data can be passed as a string and will be treated categorically by Nucleus if there are less than 250 unique values in the dataset. This means histograms of values in the “Insights” section and autocomplete within the query bar.

    Numerical metadata will generate histograms in the “Insights” section, allow for sorting the results of any query, and can be used with the modulo operator For example: metadata.frame_number % 5 = 0

    All other types of metadata will be visible from the dataset item detail view.

    It is important that string and numerical metadata fields are consistent - if a metadata field has a string value, then all metadata fields with the same key should also have string values, and vice versa for numerical metadata. If conflicting types are found, Nucleus will return an error during upload!

    The recommended way of adding or updating existing metadata is to re-run the ingestion (dataset.append) with update=True, which will replace any existing metadata with whatever your new ingestion run uses. This will delete any metadata keys that are not present in the new ingestion run. We have a cache based on image_location that will skip the need for a re-upload of the images, so your second ingestion will be faster than your first.

    For 3D (sensor fusion) data, it is highly recommended to include camera intrinsics the metadata of your camera image items. Nucleus requires these intrinsics to create visualizations such as cuboid projections. Refer to our guide to uploading 3D data for more info.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

    Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying { “context_attachments”: [ { “attachment”: ‘https://example.com/1’ }, { “attachment”: ‘https://example.com/2’ }, … ] }.

  • upload_to_scale (Optional[bool]) –

    Set this to false in order to use privacy mode.

    Setting this to false means the actual data within the item will not be uploaded to scale meaning that you can send in links that are only accessible to certain users, and not to Scale. Skipping upload to Scale is currently only implemented for images.

classmethod from_json(payload)#

Instantiates dataset item object from schematized JSON dict payload.

Parameters

payload (dict) –

to_json()#

Serializes dataset item object to schematized JSON string.

Return type

str

to_payload(is_scene=False)#

Serializes dataset item object to schematized JSON dict.

Return type

dict

class nucleus.Frame(**kwargs)#

Collection of sensor data pertaining to a single time step.

For 3D data, each Frame houses a sensor-to-data mapping and must have exactly one pointcloud with any number of camera images.

Parameters

**kwargs (Dict[str, DatasetItem]) – Mappings from sensor name to dataset item. Each frame of a lidar scene must contain exactly one pointcloud and any number of images (e.g. from different angles).

Refer to our guide to uploading 3D data for more info!

add_item(item, sensor_name)#

Adds DatasetItem object to frame as sensor data.

Parameters
  • item (DatasetItem) – Pointcloud or camera image item to add.

  • sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Return type

None

classmethod from_json(payload)#

Instantiates frame object from schematized JSON dict payload.

Parameters

payload (dict) –

get_item(sensor_name)#

Fetches the DatasetItem object associated with the given sensor.

Parameters

sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Returns

DatasetItem object pertaining to the sensor.

Return type

DatasetItem

get_items()#

Fetches all items in the frame.

Returns

List of all DatasetItem objects in the frame.

Return type

List[DatasetItem]

get_sensors()#

Fetches all sensor names of the frame.

Returns

List of all sensor names of the frame.

Return type

List[str]

to_payload()#

Serializes frame object to schematized JSON dict.

Return type

dict

class nucleus.Keypoint#

A 2D point that has an additional visibility flag.

Keypoints are intended to be part of a larger collection, and connected via a pre-defined skeleton. A keypoint in this skeleton may be visible or not-visible, and may be unlabeled and not visible. Because of this, the x, y coordinates may be optional, assuming that the keypoint is not visible, and would not be shown as part of the combined label.

Parameters
  • x (Optional[float]) – The x coordinate of the point.

  • y (Optional[float]) – The y coordinate of the point.

  • visible (bool) – The visibility of the point.

class nucleus.KeypointsAnnotation#

A keypoints annotation containing a list of keypoints and the structure of those keypoints: the naming of each point and the skeleton that connects those keypoints.

from nucleus import KeypointsAnnotation

keypoints = KeypointsAnnotation(
    label="face",
    keypoints=[Keypoint(100, 100), Keypoint(120, 120), Keypoint(visible=False), Keypoint(0, 0)],
    names=["point1", "point2", "point3", "point4"],
    skeleton=[[0, 1], [1, 2], [1, 3], [2, 3]],
    reference_id="image_2",
    annotation_id="image_2_face_keypoints_1",
    metadata={"face_direction": "forward"},
    track_reference_id="face_1",
)
Parameters
  • label (str) – The label for this annotation.

  • keypoints (List[Keypoint]) – The list of keypoints objects.

  • names (List[str]) – A list that corresponds to the names of each keypoint.

  • skeleton (List[List[int]]) – A list of 2-length lists indicating a beginning and ending index for each line segment in the skeleton of this keypoint label.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.KeypointsPrediction(label, keypoints, names, skeleton, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, track_reference_id=None)#

Prediction of keypoints.

Parameters
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle).

  • keypoints (List[Keypoint]) – The list of keypoints objects.

  • names (List[str]) – A list that corresponds to the names of each keypoint.

  • skeleton (List[List[int]]) – A list of 2-length lists indicating a beginning and ending index for each line segment in the skeleton of this keypoint label.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.LidarScene#

Sequence of lidar pointcloud and camera images over time.

Nucleus 3D datasets are comprised of LidarScenes, which are sequences of lidar pointclouds and camera images over time. These sequences are in turn comprised of Frames.

By organizing data across multiple sensors over time, LidarScenes make it easier to interpret pointclouds, allowing you to see objects move over time by clicking through frames and providing context in the form of corresponding images.

You can think of scenes and frames as nested groupings of sensor data across time:

  • LidarScene for a given location
    • Frame at timestep 0
      • DatasetItem of pointcloud

      • DatasetItem of front camera image

      • DatasetItem of rear camera image

    • Frame at timestep 1
  • LidarScene for another location

LidarScenes are uploaded to a Dataset with any accompanying metadata. Frames do not accept metadata, but each of its constituent DatasetItems does.

Note: Uploads with a different number of frames/items will error out (only on scenes that now differ). Existing scenes are expected to retain the same structure, i.e. the same number of frames, and same items per frame. If a scene definition is changed (for example, additional frames added) the update operation will be ignored. If you would like to alter the structure of a scene, please delete the scene and re-upload.

Parameters
  • reference_id (str) – User-specified identifier to reference the scene.

  • frames (Optional[List[Frame]]) – List of frames to be a part of the scene. A scene can be created before frames or items have been added to it, but must be non-empty when uploading to a Dataset.

  • metadata (Optional[Dict]) –

    Optional metadata to include with the scene.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

    Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying { “context_attachments”: [ { “attachment”: ‘https://example.com/1’ }, { “attachment”: ‘https://example.com/2’ }, … ] }.

Refer to our guide to uploading 3D data for more info!

add_frame(frame, index, update=False)#

Adds frame to scene at the specified index.

Parameters
  • frame (Frame) – Frame object to add.

  • index (int) – Serial index at which to add the frame.

  • update (bool) – Whether to overwrite the frame at the specified index, if it exists. Default is False.

Return type

None

add_item(index, sensor_name, item)#

Adds DatasetItem to the specified frame as sensor data.

Parameters
  • index (int) – Serial index of the frame to which to add the item.

  • item (DatasetItem) – Pointcloud or camera image item to add.

  • sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Return type

None

classmethod from_json(payload, client=None, skip_validate=False)#

Instantiates scene object from schematized JSON dict payload.

Parameters
get_frame(index)#

Fetches the Frame object at the specified index.

Parameters

index (int) – Serial index for which to retrieve the Frame.

Returns

Frame object at the specified index.

Return type

Frame

get_frames()#

Fetches a sorted list of Frames of the scene.

Returns

List of Frames, sorted by index ascending.

Return type

List[Frame]

get_item(index, sensor_name)#

Fetches the DatasetItem object of the given frame and sensor.

Parameters
  • index (int) – Serial index of the frame from which to fetch the item.

  • sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Returns

DatasetItem object of the frame and sensor.

Return type

DatasetItem

get_items()#

Fetches all items in the scene.

Returns

Unordered list of all DatasetItem objects in the scene.

Return type

List[DatasetItem]

get_items_from_sensor(sensor_name)#

Fetches all DatasetItem objects of the given sensor.

Parameters

sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Returns

List of DatasetItem objects associated with the specified sensor.

Return type

List[DatasetItem]

get_sensors()#

Fetches all sensor names of the scene.

Returns

List of all sensor names associated with frames in the scene.

Return type

List[str]

info()#

Fetches information about the scene.

Returns

Payload containing:

{
    "reference_id": str,
    "length": int,
    "num_sensors": int
}

to_json()#

Serializes scene object to schematized JSON string.

Return type

str

to_payload()#

Serializes scene object to schematized JSON dict.

Return type

dict

class nucleus.LineAnnotation#

A polyline annotation consisting of an ordered list of 2D points. A LineAnnotation differs from a PolygonAnnotation by not forming a closed loop, and by having zero area.

from nucleus import LineAnnotation

line = LineAnnotation(
    label="face",
    vertices=[Point(100, 100), Point(200, 300), Point(300, 200)],
    reference_id="person_image_1",
    annotation_id="person_image_1_line_1",
    metadata={"camera_mode": "portrait"},
    track_reference_id="face_human",
)
Parameters
  • label (str) – The label for this annotation.

  • vertices (List[Point]) – The list of points making up the line.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.LinePrediction(label, vertices, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, track_reference_id=None)#

Prediction of a line.

Parameters
  • label (str) – The label for this prediction (e.g. car, pedestrian, bicycle).

  • vertices (List[Point]) – The list of points making up the line.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this prediction. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.Model(model_id, name, reference_id, metadata, client, bundle_name=None, tags=None)#

A model that can be used to upload predictions to a dataset.

By uploading model predictions to Nucleus, you can compare your predictions to ground truth annotations and discover problems with your Models or Dataset.

You can also upload predictions for unannotated images, letting you query them based on model predictions. This can help you prioritize which unlabeled data to label next.

Within Nucleus, Models work in the following way:

  1. You first create a Model. You can do this just once and reuse the model on multiple datasets.

  2. You then upload predictions to a dataset.

  3. Trigger calculation of metrics in order to view model debugging insights.

The above steps above will allow you to visualize model performance within Nucleus, or compare multiple models that have been run on the same Dataset.

Note that you can always add more predictions to a dataset, but then you will need to re-run the calculation of metrics in order to have them be correct.

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
dataset = client.get_dataset(YOUR_DATASET_ID)

prediction_1 = nucleus.BoxPrediction(
    label="label",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="1",
    confidence=0.9,
    class_pdf={"label": 0.9, "other_label": 0.1},
)
prediction_2 = nucleus.BoxPrediction(
    label="label",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="2",
    confidence=0.2,
    class_pdf={"label": 0.2, "other_label": 0.8},
)

model = client.create_model(
    name="My Model", reference_id="My-CNN", metadata={"timestamp": "121012401"}
)

# For small ingestions, we recommend synchronous ingestion
response = dataset.upload_predictions(model, [prediction_1, prediction_2])

# For large ingestions, we recommend asynchronous ingestion
job = dataset.upload_predictions(
    model, [prediction_1, prediction_2], asynchronous=True
)
# Check current status
job.status()
# Sleep until ingestion is done
job.sleep_until_complete()
# Check errors
job.errors()

dataset.calculate_evaluation_metrics(model)

Models cannot be instantiated directly and instead must be created via API endpoint, using NucleusClient.create_model().

add_tags(tags)#

Tag the model with custom tag names.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.add_tags(["tag_A", "tag_B"])
Parameters

tags (List[str]) – list of tag names

evaluate(scenario_test_names)#

Evaluates this on the specified Unit Tests.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]
scenario_test = client.validate.create_scenario_test(
    "sample_scenario_test", "YOUR_SLICE_ID"
)

model.evaluate(["sample_scenario_test"])
Parameters

scenario_test_names (List[str]) – list of unit tests to evaluate

Returns

AsyncJob object of evaluation job

Return type

nucleus.async_job.AsyncJob

classmethod from_json(payload, client)#

Instantiates model object from schematized JSON dict payload.

Parameters

payload (dict) –

remove_tags(tags)#

Remove tag(s) from the model.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.remove_tags(["tag_x"])
Parameters

tags (List[str]) – list of tag names to remove

run(dataset_id, slice_id)#

Runs inference on the bundle associated with the model on the dataset.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.run("ds_123456")
Parameters
  • dataset_id (str) – The ID of the dataset to run inference on.

  • job_id – The ID of the AsyncJob used to track job progress.

  • slice_id (Optional[str]) – The ID of the slice of the dataset to run inference on.

Return type

str

class nucleus.NucleusClient(api_key=None, use_notebook=False, endpoint=None)#

Client to interact with the Nucleus API via Python SDK.

Parameters
  • api_key (Optional[str]) – Follow this guide to retrieve your API keys.

  • use_notebook (bool) – Whether the client is being used in a notebook (toggles tqdm style). Default is False.

  • endpoint (str) – Base URL of the API. Default is Nucleus’s current production API.

append_to_slice(slice_id, reference_ids, dataset_id)#

Appends dataset items or scenes to an existing slice.

Parameters
  • slice_id (str) – Nucleus-generated dataset ID (starts with slc_). This can be retrieved via Dataset.slices() or a Nucleus dashboard URL.

  • reference_ids (List[str]) – List of user-defined reference IDs of dataset items or scenes to append to the slice.

  • dataset_id (str) – ID of dataset this slice belongs to.

Returns

Empty payload response.

Return type

dict

create_dataset(name, is_scene=None, item_metadata_schema=None, annotation_metadata_schema=None)#

Creates a new, empty dataset.

Make sure that the dataset is created for the data type you would like to support. Be sure to set the is_scene parameter correctly.

Parameters
  • name (str) – A human-readable name for the dataset.

  • is_scene (Optional[bool]) – Whether the dataset contains strictly scenes or items. This value is immutable. Default is False (dataset of items).

  • item_metadata_schema (Optional[Dict]) – Dict defining item-level metadata schema. See below.

  • annotation_metadata_schema (Optional[Dict]) –

    Dict defining annotation-level metadata schema.

    Metadata schemas must be structured as follows:

    {
        "field_name": {
            "type": "category" | "number" | "text" | "json"
            "choices": List[str] | None
            "description": str | None
        },
        ...
    }
    

Returns

The newly created Nucleus dataset as an object.

Return type

Dataset

create_dataset_from_project(project_id, last_n_tasks=None, name=None)#

Create a new dataset from an existing Scale or Rapid project.

If you already have Annotation, SegmentAnnotation, VideoAnnotation, Categorization, PolygonAnnotation, ImageAnnotation, DocumentTranscription, LidarLinking, LidarAnnotation, or VideoboxAnnotation projects with Scale, use this endpoint to import your project directly into Nucleus.

This endpoint is asynchronous because there can be delays when the number of tasks is larger than 1000. As a result, the endpoint returns an instance of AsyncJob.

Parameters
  • project_id (str) – The ID of the Scale/Rapid project (retrievable from URL).

  • last_n_tasks (int) – If supplied, only pull in this number of the most recent tasks. By default the endpoint will pull in all eligible tasks.

  • name (str) – The name for your new Nucleus dataset. By default the endpoint will use the project’s name.

Returns

The newly created Nucleus dataset as an object.

Return type

Dataset

create_launch_model(name, reference_id, bundle_args, metadata=None)#

Adds a Model to Nucleus, as well as a Launch bundle from a given function.

Parameters
  • name (str) – A human-readable name for the model.

  • reference_id (str) – Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme.

  • bundle_args (Dict[str, Any]) – Dict for kwargs for the creation of a Launch bundle, more details on the keys below.

  • metadata (Optional[Dict]) – An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model.

Returns

The newly created model as an object.

Return type

Model

Details on bundle_args:

Grabs a s3 signed url and uploads a model bundle to Scale Launch.

A model bundle consists of exactly {predict_fn_or_cls}, {load_predict_fn + model}, or {load_predict_fn + load_model_fn}. Pre/post-processing code can be included inside load_predict_fn/model or in predict_fn_or_cls call.

Parameters
  • model_bundle_name – Name of model bundle you want to create. This acts as a unique identifier.

  • predict_fn_or_cls – Function or a Callable class that runs end-to-end (pre/post processing and model inference) on the call. I.e. predict_fn_or_cls(REQUEST) -> RESPONSE.

  • model – Typically a trained Neural Network, e.g. a Pytorch module

  • load_predict_fn – Function that when called with model, returns a function that carries out inference I.e. load_predict_fn(model) -> func; func(REQUEST) -> RESPONSE

  • load_model_fn – Function that when run, loads a model, e.g. a Pytorch module I.e. load_predict_fn(load_model_fn()) -> func; func(REQUEST) -> RESPONSE

  • bundle_url – Only for self-hosted mode. Desired location of bundle.

  • self.bundle_location_fn (Overrides any value given by) –

  • requirements – A list of python package requirements, e.g. [“tensorflow==2.3.0”, “tensorflow-hub==0.11.0”]. If no list has been passed, will default to the currently imported list of packages.

  • app_config – Either a Dictionary that represents a YAML file contents or a local path to a YAML file.

  • env_params – A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which cuda/cudnn versions to use. Specifically, the dictionary should contain the following keys: “framework_type”: either “tensorflow” or “pytorch”. “pytorch_version”: Version of pytorch, e.g. “1.5.1”, “1.7.0”, etc. Only applicable if framework_type is pytorch “cuda_version”: Version of cuda used, e.g. “11.0”. “cudnn_version” Version of cudnn used, e.g. “cudnn8-devel”. “tensorflow_version”: Version of tensorflow, e.g. “2.3.0”. Only applicable if framework_type is tensorflow

  • globals_copy – Dictionary of the global symbol table. Normally provided by globals() built-in function.

  • name (str) –

  • reference_id (str) –

  • bundle_args (Dict[str, Any]) –

  • metadata (Optional[Dict]) –

Return type

model.Model

create_launch_model_from_dir(name, reference_id, bundle_from_dir_args, metadata=None)#

Adds a Model to Nucleus, as well as a Launch bundle from a directory.

Parameters
  • name (str) – A human-readable name for the model.

  • reference_id (str) – Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme.

  • bundle_from_dir_args (Dict[str, Any]) – Dict for kwargs for the creation of a bundle from directory, more details on the keys below.

  • metadata (Optional[Dict]) – An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model.

Returns

The newly created model as an object.

Return type

Model

Details on bundle_from_dir_args Packages up code from one or more local filesystem folders and uploads them as a bundle to Scale Launch. In this mode, a bundle is just local code instead of a serialized object.

For example, if you have a directory structure like so, and your current working directory is also my_root:

``` my_root/

my_module1/

__init__.py …files and directories my_inference_file.py

my_module2/

__init__.py …files and directories

```

then calling create_model_bundle_from_dirs with base_paths=[“my_module1”, “my_module2”] essentially creates a zip file without the root directory, e.g.:

``` my_module1/

__init__.py …files and directories my_inference_file.py

my_module2/

__init__.py …files and directories

```

and these contents will be unzipped relative to the server side PYTHONPATH. Bear these points in mind when referencing Python module paths for this bundle. For instance, if my_inference_file.py has def f(…) as the desired inference loading function, then the load_predict_fn_module_path argument should be my_module1.my_inference_file.f.

Keys for bundle_from_dir_args:

model_bundle_name: Name of model bundle you want to create. This acts as a unique identifier. base_paths: The paths on the local filesystem where the bundle code lives. requirements_path: A path on the local filesystem where a requirements.txt file lives. env_params: A dictionary that dictates environment information e.g.

the use of pytorch or tensorflow, which cuda/cudnn versions to use. Specifically, the dictionary should contain the following keys: “framework_type”: either “tensorflow” or “pytorch”. “pytorch_version”: Version of pytorch, e.g. “1.5.1”, “1.7.0”, etc. Only applicable if framework_type is pytorch “cuda_version”: Version of cuda used, e.g. “11.0”. “cudnn_version” Version of cudnn used, e.g. “cudnn8-devel”. “tensorflow_version”: Version of tensorflow, e.g. “2.3.0”. Only applicable if framework_type is tensorflow

load_predict_fn_module_path: A python module path for a function that, when called with the output of

load_model_fn_module_path, returns a function that carries out inference.

load_model_fn_module_path: A python module path for a function that returns a model. The output feeds into

the function located at load_predict_fn_module_path.

app_config: Either a Dictionary that represents a YAML file contents or a local path to a YAML file.

create_model(name, reference_id, metadata=None, bundle_name=None, tags=None)#

Adds a Model to Nucleus.

Parameters
  • name (str) – A human-readable name for the model.

  • reference_id (str) – Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme.

  • metadata (Optional[Dict]) – An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model.

  • bundle_name (Optional[str]) – Optional name of bundle attached to this model

  • tags (Optional[List[str]]) – Optional list of tags to attach to this model

Returns

The newly created model as an object.

Return type

Model

delete_autotag(autotag_id)#

Deletes an autotag by ID.

Parameters

autotag_id (str) – Nucleus-generated autotag ID (starts with tag_). This can be retrieved via list_autotags() or a Nucleus dashboard URL.

Returns

Empty payload response.

Return type

dict

delete_dataset(dataset_id)#

Deletes a dataset by ID.

All items, annotations, and predictions associated with the dataset will be deleted as well. Note that if this dataset is linked to a Scale or Rapid labeling project, the project itself will not be deleted.

Parameters

dataset_id (str) – The ID of the dataset to delete.

Returns

Payload to indicate deletion invocation.

Return type

dict

delete_model(model_id)#

Deletes a model by ID.

Parameters

model_id (str) – Nucleus-generated model ID (starts with prj_). This can be retrieved via list_models() or a Nucleus dashboard URL.

Returns

Empty payload response.

Return type

dict

delete_slice(slice_id)#

Deletes slice from Nucleus.

Parameters

slice_id (str) – Nucleus-generated dataset ID (starts with slc_). This can be retrieved via Dataset.slices() or a Nucleus dashboard URL.

Returns

Empty payload response.

Return type

dict

get_autotag_refinement_metrics(autotag_id)#

Retrieves refinement metrics for an autotag by ID.

Parameters

autotag_id (str) – Nucleus-generated autotag ID (starts with tag_). This can be retrieved via list_autotags() or a Nucleus dashboard URL.

Returns

Response payload:

{
    "total_refinement_steps": int
    "average_positives_selected_per_refinement": int
    "average_ms_taken_in_refinement": float
}

Return type

dict

get_dataset(dataset_id)#

Fetches a dataset by its ID.

Parameters

dataset_id (str) – The ID of the dataset to fetch.

Returns

The Nucleus dataset as an object.

Return type

Dataset

get_job(job_id)#

Fetches a dataset by its ID.

Parameters

job_id (str) – The ID of the dataset to fetch.

Returns

The Nucleus async job as an object.

Return type

AsyncJob

get_model(model_id=None, model_run_id=None)#

Fetches a model by its ID.

Parameters
  • model_id (str) – You can pass either a model ID (starts with prj_) or a model run ID (starts with run_) This can be retrieved via list_models() or a Nucleus dashboard URL. Model run IDs result from the application of a model to a dataset.

  • model_run_id (str) –

    You can pass either a model ID (starts with prj_), or a model run ID (starts with run_) This can be retrieved via list_models() or a Nucleus dashboard URL. Model run IDs result from the application of a model to a dataset.

    In the future, we plan to hide model_run_ids fully from users.

Returns

The Nucleus model as an object.

Return type

Model

get_slice(slice_id)#

Returns a slice object by Nucleus-generated ID.

Parameters

slice_id (str) – Nucleus-generated dataset ID (starts with slc_). This can be retrieved via Dataset.slices() or a Nucleus dashboard URL.

Returns

The Nucleus slice as an object.

Return type

Slice

list_jobs(show_completed=False, from_date=None, to_date=None, job_types=None, limit=None, dataset_id=None, date_limit=None)#

Fetches all of your running jobs in Nucleus.

Parameters
  • job_types (Optional[List[job.CustomerJobTypes]]) – Filter on set of job types, if None, fetch all types

  • from_date (Optional[Union[str, datetime.datetime]]) – beginning of date range filter

  • to_date (Optional[Union[str, datetime.datetime]]) – end of date range filter

  • limit (Optional[int]) – number of results to fetch, max 50_000

  • show_completed (bool) – dont fetch jobs with Completed status

  • stats_only – return overview of jobs, instead of a list of job objects

  • dataset_id (Optional[str]) – filter on a particular dataset

  • date_limit (Optional[str]) –

    Deprecated, do not use

    Returns:

    List[AsyncJob]: List of running asynchronous jobs associated with the client API key.

Return type

List[async_job.AsyncJob]

make_request(payload, route, requests_command=requests.post, return_raw_response=False)#

Makes a request to a Nucleus API endpoint.

Logs a warning if not successful.

Parameters
  • payload (Optional[dict]) – Given request payload.

  • route (str) – Route for the request.

  • command (Requests) – requests.post, requests.get, or requests.delete.

  • return_raw_response (bool) – return the request’s response object entirely

Returns

Response payload as JSON dict or request object.

Return type

Union[dict, Any]

class nucleus.Point#

A point in 2D space.

Parameters
  • x (float) – The x coordinate of the point.

  • y (float) – The y coordinate of the point.

class nucleus.Point3D#

A point in 3D space.

Parameters
  • x (float) – The x coordinate of the point.

  • y (float) – The y coordinate of the point.

  • z (float) – The z coordinate of the point.

class nucleus.PolygonAnnotation#

A polygon annotation consisting of an ordered list of 2D points.

from nucleus import PolygonAnnotation

polygon = PolygonAnnotation(
    label="bus",
    vertices=[Point(100, 100), Point(150, 200), Point(200, 100)],
    reference_id="image_2",
    annotation_id="image_2_bus_polygon_1",
    metadata={"vehicle_color": "yellow"},
    embedding_vector=[0.1423, 1.432, ..., 3.829],
    track_reference_id="school_bus",
)
Parameters
  • label (str) – The label for this annotation.

  • vertices (List[Point]) – The list of points making up the polygon.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • embedding_vector – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.PolygonPrediction(label, vertices, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, embedding_vector=None, track_reference_id=None)#

Prediction of a polygon.

Parameters
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle).

  • vertices (List[nucleus.annotation.Point]) –

  • reference_id (str) –

  • confidence (Optional[float]) –

  • annotation_id (Optional[str]) –

  • metadata (Optional[Dict]) –

  • class_pdf (Optional[Dict]) –

  • embedding_vector (Optional[list]) –

  • track_reference_id (Optional[str]) –

:param vertices List[Point]: The list of points making up the polygon. :param reference_id: User-defined ID of the image to which to apply this

annotation.

Parameters
  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • embedding_vector (Optional[list]) – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

  • label (str) –

  • vertices (List[nucleus.annotation.Point]) –

  • reference_id (str) –

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.Quaternion#

Quaternion objects are used to represent rotation.

We use the Hamilton/right-handed quaternion convention, where

i^2 = j^2 = k^2 = ijk = -1

The quaternion represented by the tuple (x, y, z, w) is equal to w + x*i + y*j + z*k.

Parameters
  • x (float) – The x value.

  • y (float) – The y value.

  • x – The z value.

  • w (float) – The w value.

classmethod from_json(payload)#

Instantiates quaternion object from schematized JSON dict payload.

Parameters

payload (Dict[str, float]) –

to_payload()#

Serializes quaternion object to schematized JSON dict.

Return type

dict

class nucleus.SceneCategoryAnnotation#

A scene category annotation.

from nucleus import SceneCategoryAnnotation

category = SceneCategoryAnnotation(
    label="running",
    reference_id="scene_1",
    taxonomy_name="action",
    metadata={
        "weather": "clear",
    },
)
Parameters
  • label (str) – The label for this annotation.

  • reference_id (str) – User-defined ID of the scene to which to apply this annotation.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • metadata

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.SceneCategoryPrediction(label, reference_id, taxonomy_name=None, confidence=None, metadata=None)#

A prediction of a category for a scene.

from nucleus import SceneCategoryPrediction

category = SceneCategoryPrediction(
    label="running",
    reference_id="scene_1",
    taxonomy_name="action",
    confidence=0.83,
    metadata={
        "weather": "clear",
    },
)
Parameters
  • label (str) – The label for this annotation (e.g. action, subject, scenario).

  • reference_id (str) – The reference ID of the scene you wish to apply this annotation to.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.Segment#

Segment represents either a class or an instance depending on the task type.

For semantic segmentation, this object should store the mapping between a single class index and the string label.

For instance segmentation, you can use this class to store the label of a single instance, whose extent in the image is represented by the value of index.

In both cases, additional metadata can be attached to the segment.

Parameters
  • label (str) – The label name of the class for the class or instance represented by index in the associated mask.

  • index (int) – The integer pixel value in the mask this mapping refers to.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this segment. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

class nucleus.SegmentationAnnotation#

A segmentation mask on a 2D image.

When uploading a mask annotation, Nucleus expects the mask file to be in PNG format with each pixel being a 0-255 uint8. Currently, Nucleus only supports uploading masks from URL.

Nucleus automatically enforces the constraint that each DatasetItem can have at most one ground truth segmentation mask. As a consequence, if during upload a duplicate mask is detected for a given image, by default it will be ignored. You can change this behavior by setting update = True, which will replace the existing segmentation mask with the new mask.

from nucleus import SegmentationAnnotation

segmentation = SegmentationAnnotation(
    mask_url="s3://your-bucket-name/segmentation-masks/image_2_mask_id_1.png",
    annotations=[
        Segment(label="grass", index="1"),
        Segment(label="road", index="2"),
        Segment(label="bus", index="3", metadata={"vehicle_color": "yellow"}),
        Segment(label="tree", index="4")
    ],
    reference_id="image_2",
    annotation_id="image_2_mask_1",
)
Parameters
  • mask_url (str) –

    A URL pointing to the segmentation prediction mask which is accessible to Scale. This “URL” can also be a path to a local file. The mask is an HxW int8 array saved in PNG format, with each pixel value ranging from [0, N), where N is the number of possible classes (for semantic segmentation) or instances (for instance segmentation).

    The height and width of the mask must be the same as the original image. One example for semantic segmentation: the mask is 0 for pixels where there is background, 1 where there is a car, and 2 where there is a pedestrian.

    Another example for instance segmentation: the mask is 0 for one car, 1 for another car, 2 for a motorcycle and 3 for another motorcycle. The class name for each value in the mask is stored in the list of Segment objects passed for “annotations”

  • annotations (List[Segment]) – The list of mappings between the integer values contained in mask_url and string class labels. In the semantic segmentation example above these would map that 0 to background, 1 to car and 2 to pedestrian. In the instance segmentation example above, 0 and 1 would both be mapped to car, 2 and 3 would both be mapped to motorcycle

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – For segmentation annotations, this value is ignored because there can only be one segmentation annotation per dataset item. Therefore regardless of annotation ID, if there is an existing segmentation on a dataset item, it will be ignored unless update=True is passed to Dataset.annotate(), in which case it will be overwritten. Storing a custom ID here may be useful in order to tie this annotation to an external database, and its value will be returned for any export.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Check if the mask url is local and needs to be uploaded.

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.SegmentationPrediction#

Predicted segmentation mask on a 2D image.

from nucleus import SegmentationPrediction

segmentation = SegmentationPrediction(
    mask_url="s3://your-bucket-name/pred-seg-masks/image_2_pred_mask_id_1.png",
    annotations=[
        Segment(label="grass", index="1"),
        Segment(label="road", index="2"),
        Segment(label="bus", index="3", metadata={"vehicle_color": "yellow"}),
        Segment(label="tree", index="4")
    ],
    reference_id="image_2",
    annotation_id="image_2_pred_mask_1",
)
Parameters
  • mask_url (str) –

    A URL pointing to the segmentation prediction mask which is accessible to Scale. This “URL” can also be a path to a local file. The mask is an HxW int8 array saved in PNG format, with each pixel value ranging from [0, N), where N is the number of possible classes (for semantic segmentation) or instances (for instance segmentation).

    The height and width of the mask must be the same as the original image. One example for semantic segmentation: the mask is 0 for pixels where there is background, 1 where there is a car, and 2 where there is a pedestrian.

    Another example for instance segmentation: the mask is 0 for one car, 1 for another car, 2 for a motorcycle and 3 for another motorcycle. The class name for each value in the mask is stored in the list of Segment objects passed for “annotations”

  • annotations (List[Segment]) – The list of mappings between the integer values contained in mask_url and string class labels. In the semantic segmentation example above these would map that 0 to background, 1 to car and 2 to pedestrian. In the instance segmentation example above, 0 and 1 would both be mapped to car, 2 and 3 would both be mapped to motorcycle

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – For segmentation predictions, this value is ignored because there can only be one segmentation prediction per dataset item. Therefore regardless of annotation ID, if there is an existing segmentation on a dataset item, it will be ignored unless update=True is passed to Dataset.annotate(), in which case it will be overwritten. Storing a custom ID here may be useful in order to tie this annotation to an external database, and its value will be returned for any export.

classmethod from_json(payload)#

Instantiates annotation object from schematized JSON dict payload.

Parameters

payload (dict) –

has_local_files_to_upload()#

Check if the mask url is local and needs to be uploaded.

Return type

bool

to_json()#

Serializes annotation object to schematized JSON string.

Return type

str

to_payload()#

Serializes annotation object to schematized JSON dict.

Return type

dict

class nucleus.Slice(slice_id, client)#

A Slice represents a subset of DatasetItems in your Dataset.

Slices are subsets of your Dataset that unlock curation and exploration workflows. Instead of thinking of your Datasets as collections of data, it is useful to think about them as a collection of Slices. For instance, your dataset may contain different weather scenarios, traffic conditions, or highway types.

Perhaps your Models perform poorly on foggy weather scenarios; it is then useful to slice your dataset into a “foggy” slice, and fine-tune model performance on this slice until it reaches the performance you desire.

Slices cannot be instantiated directly and instead must be created in the dashboard, or via API endpoint using Dataset.create_slice().

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")

ref_ids = ["interesting_item_1", "interesting_item_2"]
slice = dataset.create_slice(name="interesting", reference_ids=ref_ids)
Parameters

slice_id (str) –

append(reference_ids=None)#

Appends existing DatasetItems from a Dataset to a Slice.

The endpoint expects a list of DatasetItem reference IDs which are set at upload time.

Parameters

reference_ids (List[str]) – List of user-defined reference IDs of dataset items or scenes to append to the slice.

Returns

Dict of the slice_id and the newly appended IDs.

{
    "slice_id": str,
    "new_items": List[str]
}

Return type

dict

dataset_items()#

Fetch all DatasetItems contained in the Slice.

We recommend using Slice.items_generator() if the Slice has more than 200k items.

Returns: list of DatasetItem objects

export_embeddings()#

Fetches a pd.DataFrame-ready list of slice embeddings.

Returns

A list where each element is a columnar mapping:

List[{
    "reference_id": str,
    "embedding_vector": List[float]
}]

Return type

List[Dict[str, Union[str, List[float]]]]

export_predictions(model)#

Provides a list of all DatasetItems and Predictions in the Slice for the given Model.

Parameters

model (Model) – the nucleus model objects representing the model for which to export predictions.

Returns

List where each element is a dict containing the DatasetItem and all of its associated Predictions, grouped by type (e.g. box).

List[{
    "item": DatasetItem,
    "predictions": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
    }
}]

Return type

List[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

export_predictions_generator(model)#

Provides a list of all DatasetItems and Predictions in the Slice for the given Model.

Parameters

model (Model) – the nucleus model objects representing the model for which to export predictions.

Returns

Iterable where each element is a dict containing the DatasetItem and all of its associated Predictions, grouped by type (e.g. box).

List[{
    "item": DatasetItem,
    "predictions": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
    }
}]

Return type

Iterable[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

export_raw_items()#

Fetches a list of accessible URLs for each item in the Slice.

Returns

List where each element is a dict containing a DatasetItem and its accessible (signed) Scale URL.

List[{
    "id": str,
    "ref_id": str,
    "metadata": Dict[str, Union[str, int]],
    "original_url": str,
    "scale_url": str
}]

Return type

List[Dict[str, str]]

export_scale_task_info()#

Fetches info for all linked Scale tasks of items/scenes in the slice.

Returns

A list of dicts, each with two keys, respectively mapping to items/scenes and info on their corresponding Scale tasks within the dataset:

List[{
    "item" | "scene": Union[DatasetItem, Scene],
    "scale_task_info": {
        "task_id": str,
        "task_status": str,
        "task_audit_status": str,
        "task_audit_review_comment": Optional[str],
        "project_name": str,
        "batch": str,
        "created_at": str,
        "completed_at": Optional[str]
    }]
}]

info()#

Retrieves the name, slice_id, and dataset_id of the Slice.

Returns

A dict mapping keys to the corresponding info retrieved.

{
    "name": Union[str, int],
    "slice_id": str,
    "dataset_id": str,
}

Return type

dict

items_and_annotation_generator()#

Provides a generator of all DatasetItems and Annotations in the slice.

Returns

Generator where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type (e.g. box).

Iterable[{
    "item": DatasetItem,
    "annotations": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "line": List[LineAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
        "keypoints": List[KeypointsAnnotation],
    }
}]

Return type

Iterable[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_and_annotations()#

Provides a list of all DatasetItems and Annotations in the Slice.

Returns

List where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type (e.g. box).

List[{
    "item": DatasetItem,
    "annotations": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "line": List[LineAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
        "keypoints": List[KeypointsAnnotation],
    }
}]

Return type

List[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_generator(page_size=100000)#

Generator yielding all dataset items in the dataset.

collected_ref_ids = []
for item in dataset.items_generator():
    print(f"Exporting item: {item.reference_id}")
    collected_ref_ids.append(item.reference_id)
Parameters

page_size (int, optional) – Number of items to return per page. If you are experiencing timeouts while using this generator, you can try lowering the page size.

Yields

an iterable of DatasetItem objects.

send_to_labeling(project_id)#

Send items in the Slice as tasks to a Scale labeling project.

This endpoint submits the items of the Slice as tasks to a pre-existing Scale Annotation project uniquely identified by projectId. Only projects of type General Image Annotation are currently supported. Additionally, in order for task submission to succeed, the project must have task instructions and geometries configured as project-level parameters. In order to create a project or set project parameters, you must use the Scale Annotation API, which is documented here: Scale Annotation API Documentation. When the newly created annotation tasks are annotated, the annotations will be automatically reflected in the Nucleus platform.

For self-serve projects, user can choose to submit the slice as a calibration batch, which is recommended for brand new labeling projects. For more information about calibration batches, please reference Overview of Self Serve Workflow. Note: A batch can be either a calibration batch or a self label batch, but not both.

Note: Nucleus only supports bounding box, polygon, and line annotations. If the project parameters specify any other geometries (ellipses or points), those objects will be annotated, but they will not be reflected in Nucleus.

Parameters

project_id (str) – Scale-defined ID of the target annotation project.

class nucleus.VideoScene#

Video or sequence of images over time.

Nucleus video datasets are comprised of VideoScenes. These can be comprised of a single video, or a sequence of DatasetItems which are equivalent to frames.

VideoScenes are uploaded to a Dataset with any accompanying metadata. Each of DatasetItems representing a frame also accepts metadata.

Note: Updates with different items will error out (only on scenes that now differ). Existing video are expected to retain the same frames, and only metadata can be updated. If a video definition is changed (for example, additional frames added) the update operation will be ignored. If you would like to alter the structure of a video scene, please delete the scene and re-upload.

Parameters
  • reference_id (str) – User-specified identifier to reference the scene.

  • frame_rate (Optional[int]) – Required if uploading items. Frame rate of the video.

  • video_location (Optional[str]) – Required if not uploading items. The remote URL containing the video MP4. Remote formats supported include any URL (http:// or https://) or URIs for AWS S3, Azure, or GCS (i.e. s3://, gcs://).

  • items (Optional[List[DatasetItem]]) – Required if not uploading video_location. List of items representing frames, to be a part of the scene. A scene can be created before items have been added to it, but must be non-empty when uploading to a Dataset. A video scene can contain a maximum of 3000 items.

  • metadata (Optional[Dict]) –

    Optional metadata to include with the scene.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

    Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying { “context_attachments”: [ { “attachment”: ‘https://example.com/1’ }, { “attachment”: ‘https://example.com/2’ }, … ] }.

  • upload_to_scale (Optional[bool]) –

    Set this to false in order to use privacy mode. If using privacy mode you must upload both a video_location and items to the VideoScene.

    Setting this to false means the actual data within the video scene will not be uploaded to scale meaning that you can send in links that are only accessible to certain users, and not to Scale.

Refer to our guide to uploading video data for more info!

add_item(item, index=None, update=False)#

Adds DatasetItem to the specified index for videos uploaded as an array of images.

Parameters
  • item (DatasetItem) – Video item to add.

  • index (int) – Serial index at which to add the item.

  • update (bool) – Whether to overwrite the item at the specified index, if it exists. Default is False.

Return type

None

classmethod from_json(payload, client=None)#

Instantiates scene object from schematized JSON dict payload.

Parameters
get_item(index)#

Fetches the DatasetItem at the specified index for videos uploaded as an array of images.

Parameters

index (int) – Serial index for which to retrieve the DatasetItem.

Returns

DatasetItem at the specified index.

Return type

DatasetItem

get_items()#

Fetches a sorted list of DatasetItems of the scene for videos uploaded as an array of images.

Returns

List of DatasetItems, sorted by index ascending.

Return type

List[DatasetItem]

info()#

Fetches information about the video scene.

Returns

Payload containing:

{
    "reference_id": str,
    "length": Optional[int],
    "frame_rate": int,
    "video_url": Optional[str],
}

to_json()#

Serializes scene object to schematized JSON string.

Return type

str

to_payload()#

Serializes scene object to schematized JSON dict.

Return type

dict