nucleus

Nucleus Python SDK.

AsyncJob

Object used to check the status or errors of a long running asynchronous operation.

BoxAnnotation

A bounding box annotation.

BoxPrediction

Prediction of a bounding box.

CameraParams

Camera position/heading used to record the image.

CategoryAnnotation

A category annotation.

CategoryPrediction

A prediction of a category.

CuboidAnnotation

A 3D Cuboid annotation.

CuboidPrediction

A prediction of 3D cuboid.

Dataset

Datasets are collections of your data that can be associated with models.

DatasetInfo

High-level Dataset information

DatasetItem

A dataset item is an image or pointcloud that has associated metadata.

EmbeddingsExportJob

Object used to check the status or errors of a long running asynchronous operation.

Frame

Collection of sensor data pertaining to a single time step.

Keypoint

A 2D point that has an additional visibility flag.

KeypointsAnnotation

A keypoints annotation containing a list of keypoints and the structure

KeypointsPrediction

Prediction of keypoints.

LidarPoint

A Lidar point in 3D space and intensity.

LidarScene

Sequence of lidar pointcloud and camera images over time.

LineAnnotation

A polyline annotation consisting of an ordered list of 2D points.

LinePrediction

Prediction of a line.

Model

A model that can be used to upload predictions to a dataset.

NucleusClient

Client to interact with the Nucleus API via Python SDK.

Point

A point in 2D space.

Point3D

A point in 3D space.

PolygonAnnotation

A polygon annotation consisting of an ordered list of 2D points.

PolygonPrediction

Prediction of a polygon.

Quaternion

Quaternion objects are used to represent rotation.

SceneCategoryAnnotation

A scene category annotation.

SceneCategoryPrediction

A prediction of a category for a scene.

Segment

Segment represents either a class or an instance depending on the task type.

SegmentationAnnotation

A segmentation mask on a 2D image.

SegmentationPrediction

Predicted segmentation mask on a 2D image.

Slice

A Slice represents a subset of DatasetItems in your Dataset.

VideoScene

Video or sequence of images over time.

class nucleus.AsyncJob

Object used to check the status or errors of a long running asynchronous operation.

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg")

# When kicking off an asynchronous job, store the return value as a variable
job = dataset.append(items=YOUR_DATASET_ITEMS, asynchronous=True)

# Poll for status or errors
print(job.status())
print(job.errors())

# Block until job finishes
job.sleep_until_complete()
errors()

Fetches a list of the latest errors generated by the asynchronous job.

Useful for debugging failed or partially successful jobs.

Returns:

A list of strings containing the 10,000 most recently generated errors.

[
    '{"annotation":{"label":"car","type":"box","geometry":{"x":50,"y":60,"width":70,"height":80},"referenceId":"bad_ref_id","annotationId":"attempted_annot_upload","metadata":{}},"error":"Item with id bad_ref_id does not exist."}'
]

Return type:

List[str]

classmethod from_id(job_id, client)

Creates a job instance from a specific job Id.

Parameters:
  • job_id (str) – Defines the job Id

  • client (NucleusClient) – The client to use for the request.

Returns:

The specific AsyncMethod (or inherited) instance.

sleep_until_complete(verbose_std_out=True)

Blocks until the job completes or errors.

Parameters:

verbose_std_out (Optional[bool]) – Whether or not to verbosely log while sleeping. Defaults to True.

status()

Fetches status of the job and an informative message on job progress.

Returns:

A dict of the job ID, status (one of Running, Completed, or Errored), an informative message on the job progress, and number of both completed and total steps.

{
    "job_id": "job_c19xcf9mkws46gah0000",
    "status": "Completed",
    "message": "Job completed successfully.",
    "job_progress": "0.33",
    "completed_steps": "1",
    "total_steps:": "3",
}

Return type:

Dict[str, str]

class nucleus.BoxAnnotation

A bounding box annotation.

from nucleus import BoxAnnotation

box = BoxAnnotation(
    label="car",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="image_1",
    annotation_id="image_1_car_box_1",
    metadata={"vehicle_color": "red"},
    embedding_vector=[0.1423, 1.432, ..., 3.829],
    track_reference_id="car_a",
)
Parameters:
  • label (str) – The label for this annotation.

  • x (Union[float, int]) – The distance, in pixels, between the left border of the bounding box and the left border of the image.

  • y (Union[float, int]) – The distance, in pixels, between the top border of the bounding box and the top border of the image.

  • width (Union[float, int]) – The width in pixels of the annotation.

  • height (Union[float, int]) – The height in pixels of the annotation.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and overwritten if update=True for dataset.annotate. If no annotation ID is passed, one will be automatically generated using the label, x, y, width, and height, so that you can make inserts idempotently as identical boxes will be ignored.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

  • embedding_vector – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.BoxPrediction(label, x, y, width, height, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, embedding_vector=None, track_reference_id=None)

Prediction of a bounding box.

Parameters:
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle)

  • x (Union[float, int]) – The distance, in pixels, between the left border of the bounding box and the left border of the image.

  • y (Union[float, int]) – The distance, in pixels, between the top border of the bounding box and the top border of the image.

  • width (Union[float, int]) – The width in pixels of the annotation.

  • height (Union[float, int]) – The height in pixels of the annotation.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. If no annotation ID is passed, one will be automatically generated using the label, x, y, width, and height, so that you can make inserts idempotently and identical boxes will be ignored.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • embedding_vector (Optional[List]) – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.CameraParams

Camera position/heading used to record the image.

Parameters:
  • position (Point3D) – World-normalized position of the camera

  • heading (Quaternion) – Vector4 indicating the quaternion of the camera direction; note that the z-axis of the camera frame represents the camera’s optical axis. See Heading Examples.

  • fx (float) – Focal length in x direction (in pixels).

  • fy (float) – Focal length in y direction (in pixels).

  • cx (float) – Principal point x value.

  • cy (float) – Principal point y value.

classmethod from_json(payload)

Instantiates camera params object from schematized JSON dict payload.

Parameters:

payload (Dict[str, Any])

to_payload()

Serializes camera params object to schematized JSON dict.

Return type:

dict

class nucleus.CategoryAnnotation

A category annotation.

from nucleus import CategoryAnnotation

category = CategoryAnnotation(
    label="dress",
    reference_id="image_1",
    taxonomy_name="clothing_type",
    metadata={"dress_color": "navy"},
    track_reference_id="blue_and_black_dress",
)
Parameters:
  • label (str) – The label for this annotation.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.CategoryPrediction(label, reference_id, taxonomy_name=None, confidence=None, metadata=None, class_pdf=None, track_reference_id=None)

A prediction of a category.

Parameters:
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle).

  • reference_id (str) – The reference ID of the image you wish to apply this annotation to.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this prediction. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.CuboidAnnotation

A 3D Cuboid annotation.

from nucleus import CuboidAnnotation

cuboid = CuboidAnnotation(
    label="car",
    position=Point3D(100, 100, 10),
    dimensions=Point3D(5, 10, 5),
    yaw=0,
    reference_id="pointcloud_1",
    annotation_id="pointcloud_1_car_cuboid_1",
    metadata={"vehicle_color": "green"},
    track_reference_id="red_car",
)
Parameters:
  • label (str) – The label for this annotation.

  • position (Point3D) – The point at the center of the cuboid

  • dimensions (Point3D) – The length (x), width (y), and height (z) of the cuboid

  • yaw (float) – The rotation, in radians, about the Z axis of the cuboid

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[str]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.CuboidPrediction(label, position, dimensions, yaw, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, track_reference_id=None)

A prediction of 3D cuboid.

Parameters:
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle)

  • position (Point3D) – The point at the center of the cuboid

  • dimensions (Point3D) – The length (x), width (y), and height (z) of the cuboid

  • yaw (float) – The rotation, in radians, about the Z axis of the cuboid

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[str]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.Dataset(dataset_id, client, name=None, is_scene=None, use_privacy_mode=None)

Datasets are collections of your data that can be associated with models.

You can append DatasetItems or Scenes with metadata to your dataset, annotate it with ground truth, and upload model predictions to evaluate and compare model performance on your data.

Make sure that the dataset is set up correctly supporting the required datatype (see code sample below).

Datasets cannot be instantiated directly and instead must be created via API endpoint using NucleusClient.create_dataset(), or in the dashboard.

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)

# Create new dataset supporting DatasetItems
dataset = client.create_dataset(YOUR_DATASET_NAME, is_scene=False)

# OR create new dataset supporting LidarScenes
dataset = client.create_dataset(YOUR_DATASET_NAME, is_scene=True)

# Or, retrieve existing dataset by ID
# This ID can be fetched using client.list_datasets() or from a dashboard URL
existing_dataset = client.get_dataset("YOUR_DATASET_ID")
Parameters:

client (nucleus.NucleusClient)

add_items_from_dir(dirname=None, existing_dirname=None, privacy_mode_proxy='', allowed_file_types=('png', 'jpg', 'jpeg'), skip_size_warning=False, update_items=False)

Update dataset by recursively crawling through a directory. A DatasetItem will be created for each unique image found. The existing items are skipped or updated depending on update_items param

Parameters:
  • dirname (Optional[str]) – Where to look for image files, recursively

  • existing_dirname (Optional[str]) – Already validated dirname

  • privacy_mode_proxy (str) – Endpoint that serves image files for privacy mode, ignore if not using privacy mode. The proxy should work based on the relative path of the images in the directory.

  • allowed_file_types (Tuple[str, Ellipsis]) – Which file type extensions to search for, ie: (‘jpg’, ‘png’)

  • skip_size_warning (bool) – If False, it will throw an error if the script globs more than 500 images. This is a safety check in case the dirname has a typo, and grabs too much data.

  • update_items (bool) – Whether to update items in existing dataset

add_taxonomy(taxonomy_name, taxonomy_type, labels, update=False)

Creates a new taxonomy.

At the moment we only support taxonomies for category annotations and predictions.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg")

response = dataset.add_taxonomy(
    taxonomy_name="clothing_type",
    taxonomy_type="category",
    labels=["shirt", "trousers", "dress"],
    update=False
)
Parameters:
  • taxonomy_name (str) – The name of the taxonomy. Taxonomy names must be unique within a dataset.

  • taxonomy_type (str) – The type of this taxonomy as a string literal. Currently, the only supported taxonomy type is “category.”

  • labels (List[str]) – The list of possible labels for the taxonomy.

  • update (bool) – Whether or not to update taxonomy labels on taxonomy name collision. Default is False. Note that taxonomy labels will not be deleted on update, they can only be appended.

Returns:

Returns a response with dataset_id, taxonomy_name, and status of the add taxonomy operation.

{
    "dataset_id": str,
    "taxonomy_name": str,
    "status": "Taxonomy created"
}

annotate(annotations, update=DEFAULT_ANNOTATION_UPDATE_MODE, batch_size=5000, asynchronous=False, remote_files_per_upload_request=20, local_files_per_upload_request=10)

Uploads ground truth annotations to the dataset.

Adding ground truth to your dataset in Nucleus allows you to visualize annotations, query dataset items based on the annotations they contain, and evaluate models by comparing their predictions to ground truth.

Nucleus supports Box, Polygon, Cuboid, Segmentation, Category, and Category annotations. Cuboid annotations can only be uploaded to a pointcloud DatasetItem.

When uploading an annotation, you need to specify which item you are annotating via the reference_id you provided when uploading the image or pointcloud.

Ground truth uploads can be made idempotent by specifying an optional annotation_id for each annotation. This id should be unique within the dataset_item so that (reference_id, annotation_id) is unique within the dataset.

See SegmentationAnnotation for specific requirements to upload segmentation annotations.

For ingesting large annotation payloads, see the Guide for Large Ingestions.

Parameters:
  • annotations (Sequence[Annotation]) – List of annotation objects to upload.

  • update (bool) – Whether to ignore or overwrite metadata for conflicting annotations.

  • batch_size (int) – Number of annotations processed in each concurrent batch. Default is 5000. If you get timeouts when uploading geometric annotations, you can try lowering this batch size.

  • asynchronous (bool) – Whether or not to process the upload asynchronously (and return an AsyncJob object). Default is False.

  • remote_files_per_upload_request (int) – Number of remote files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with remote urls, you should lower this value from its default of 20.

  • local_files_per_upload_request (int) – Number of local files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with local files, you should lower this value from its default of 10. The maximum is 10.

Returns:

If synchronous, payload describing the upload result:

{
    "dataset_id": str,
    "annotations_processed": int
}

Otherwise, returns an AsyncJob object.

Return type:

Union[Dict[str, Any], nucleus.async_job.AsyncJob]

append(items, update=False, batch_size=20, asynchronous=False, local_files_per_upload_request=10)

Appends items or scenes to a dataset.

Note

Datasets can only accept one of DatasetItems or Scenes, never both.

This behavior is set during Dataset creation with the is_scene flag.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")

local_item = nucleus.DatasetItem(
  image_location="./1.jpg",
  reference_id="image_1",
  metadata={"key": "value"}
)
remote_item = nucleus.DatasetItem(
  image_location="s3://your-bucket/2.jpg",
  reference_id="image_2",
  metadata={"key": "value"}
)

# default is synchronous upload
sync_response = dataset.append(items=[local_item])

# async jobs have higher throughput but can be more difficult to debug
async_job = dataset.append(
  items=[remote_item], # all items must be remote for async
  asynchronous=True
)
print(async_job.status())

A Dataset can be populated with labeled and unlabeled data. Using Nucleus, you can filter down the data inside your dataset using custom metadata about your images.

For instance, your local dataset may contain Sunny, Foggy, and Rainy folders of images. All of these images can be uploaded into a single Nucleus Dataset, with (queryable) metadata like {"weather": "Sunny"}.

To update an item’s metadata, you can re-ingest the same items with the update argument set to true. Existing metadata will be overwritten for DatasetItems in the payload that share a reference_id with a previously uploaded DatasetItem. To retrieve your existing reference_ids, use Dataset.items().

# overwrite metadata by reuploading the item
remote_item.metadata["weather"] = "Sunny"

async_job_2 = dataset.append(
  items=[remote_item],
  update=True,
  asynchronous=True
)
Parameters:
  • items (Union[Sequence[nucleus.dataset_item.DatasetItem], Sequence[nucleus.scene.LidarScene], Sequence[nucleus.scene.VideoScene]]) – ( Union[ Sequence[DatasetItem], Sequence[LidarScene] Sequence[VideoScene] ]): List of items or scenes to upload.

  • batch_size (int) – Size of the batch for larger uploads. Default is 20. This is for items that have a remote URL and do not require a local upload. If you get timeouts for uploading remote urls, try decreasing this.

  • update (bool) – Whether or not to overwrite metadata on reference ID collision. Default is False.

  • asynchronous (bool) – Whether or not to process the upload asynchronously (and return an AsyncJob object). This is required when uploading scenes. Default is False.

  • files_per_upload_request – Optional; default is 10. We recommend lowering this if you encounter timeouts.

  • local_files_per_upload_request (int) – Optional; default is 10. We recommend lowering this if you encounter timeouts.

Returns:

For scenes

If synchronous, returns a payload describing the upload result:

{
    "dataset_id: str,
    "new_items": int,
    "updated_items": int,
    "ignored_items": int,
    "upload_errors": int
}

Otherwise, returns an AsyncJob object.

For images

If synchronous returns nucleus.upload_response.UploadResponse otherwise AsyncJob

Return type:

Union[Dict[Any, Any], nucleus.async_job.AsyncJob, nucleus.upload_response.UploadResponse]

autotag_items(autotag_name, for_scores_greater_than=0)

Fetches the autotag’s items above the score threshold, sorted by descending score.

Parameters:
  • autotag_name – The user-defined name of the autotag.

  • for_scores_greater_than (Optional[int]) – Score threshold between -1 and 1 above which to include autotag items.

Returns:

List of autotagged items above the given score threshold, sorted by descending score, and autotag info, packaged into a dict as follows:

{
    "autotagItems": List[{
        ref_id: str,
        score: float,
        model_prediction_annotation_id: str | None
        ground_truth_annotation_id: str | None,
    }],
    "autotag": {
        id: str,
        name: str,
        status: "started" | "completed",
        autotag_level: "Image" | "Object"
    }
}

Note model_prediction_annotation_id and ground_truth_annotation_id are only relevant for object autotags.

autotag_training_items(autotag_name)

Fetches items that were manually selected during refinement of the autotag.

Parameters:

autotag_name – The user-defined name of the autotag.

Returns:

List of user-selected positives and autotag info, packaged into a dict as follows:

{
    "autotagPositiveTrainingItems": List[{
        ref_id: str,
        model_prediction_annotation_id: str | None,
        ground_truth_annotation_id: str | None,
    }],
    "autotag": {
        id: str,
        name: str,
        status: "started" | "completed",
        autotag_level: "Image" | "Object"
    }
}

Note model_prediction_annotation_id and ground_truth_annotation_id are only relevant for object autotags.

build_slice(name, sample_size, sample_method, filters=None)

Build a slice using Nucleus’ Smart Sample tool. Allowing slices to be built based on certain criteria, and filters.

Parameters:
  • name (str) – Name for the slice being created. Must be unique per dataset.

  • sample_size (int) – Size of the slice to create. Capped by the size of the dataset and the applied filters.

  • sample_method (Union[str, nucleus.slice.SliceBuilderMethods]) – How to sample the dataset, currently supports ‘Random’ and ‘Uniqueness’

  • filters (Optional[nucleus.slice.SliceBuilderFilters]) – Apply filters to only sample from an existing slice or autotag

Return type:

Union[str, Tuple[nucleus.async_job.AsyncJob, str], dict]

Examples

from nucleus.slice import SliceBuilderFilters, SliceBuilderMethods, SliceBuilderFilterAutotag

# random slice job = dataset.build_slice(“RandomSlice”, 20, SliceBuilderMethods.RANDOM)

# slice with filters filters = SliceBuilderFilters(

slice_id=”<some slice id>”, autotag=SliceBuilderFilterAutotag(“tag_cd41jhjdqyti07h8m1n1”, [-0.5, 0.5])

) job = dataset.build_slice(“NewSlice”, 20, SliceBuilderMethods.RANDOM, filters)

Returns: An async job

calculate_evaluation_metrics(model, options=None)

Starts computation of evaluation metrics for a model on the dataset.

To update matches and metrics calculated for a model on a given dataset you can call this endpoint. This is required in order to sort by IOU, view false positives/false negatives, and view model insights.

You can add predictions from a model to a dataset after running the calculation of the metrics. However, the calculation of metrics will have to be retriggered for the new predictions to be matched with ground truth and appear as false positives/negatives, or for the new predictions effect on metrics to be reflected in model run insights.

During IoU calculation, bounding box Predictions are compared to GroundTruth using a greedy matching algorithm that matches prediction and ground truth boxes that have the highest ious first. By default the matching algorithm is class-agnostic: it will greedily create matches regardless of the class labels.

The algorithm can be tuned to classify true positives between certain classes, but not others. This is useful if the labels in your ground truth do not match the exact strings of your model predictions, or if you want to associate multiple predictions with one ground truth label, or multiple ground truth labels with one prediction. To recompute metrics based on different matching, you can re-commit the run with new request parameters.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset(dataset_id="YOUR_DATASET_ID")

model = client.get_model(model_id="YOUR_MODEL_PRJ_ID")

# Compute all evaluation metrics including IOU-based matching:
dataset.calculate_evaluation_metrics(model)

# Match car and bus bounding boxes (for IOU computation)
# Otherwise enforce that class labels must match
dataset.calculate_evaluation_metrics(model, options={
  'allowed_label_matches': [
    {
      'ground_truth_label': 'car',
      'model_prediction_label': 'bus'
    },
    {
      'ground_truth_label': 'bus',
      'model_prediction_label': 'car'
    }
  ]
})
Parameters:
  • model (Model) – The model object for which to calculate metrics.

  • options (Optional[dict]) –

    Dictionary of specific options to configure metrics calculation.

    class_agnostic

    Whether ground truth and prediction classes can differ when being matched for evaluation metrics. Default is True.

    allowed_label_matches

    Pairs of ground truth and prediction classes that should be considered matchable when computing metrics. If supplied, class_agnostic must be False.

    {
        "class_agnostic": bool,
        "allowed_label_matches": List[{
            "ground_truth_label": str,
            "model_prediction_label": str
        }]
    }
    

create_custom_index(embeddings_urls, embedding_dim)

Processes user-provided embeddings for the dataset to use with autotag and simsearch.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")

all_embeddings = {
    "reference_id_0": [0.1, 0.2, 0.3],
    "reference_id_1": [0.4, 0.5, 0.6],
    ...
    "reference_id_10000": [0.7, 0.8, 0.9]
} # sharded and uploaded to s3 with the two below URLs

embeddings_url_1 = "s3://dataset/embeddings_map_1.json"
embeddings_url_2 = "s3://dataset/embeddings_map_2.json"

response = dataset.create_custom_index(
    embeddings_url=[embeddings_url_1, embeddings_url_2],
    embedding_dim=3
)
Parameters:
  • embeddings_urls (List[str]) – List of URLs, each of which pointing to a JSON mapping reference_id -> embedding vector. Each embedding JSON must contain <5000 rows.

  • embedding_dim (int) – The dimension of the embedding vectors. Must be consistent across all embedding vectors in the index.

Returns:

Asynchronous job object to track processing status.

Return type:

AsyncJob

create_image_index()

Creates or updates image index by generating embeddings for images that do not already have embeddings.

The embeddings are used for autotag and similarity search.

This endpoint is limited to index up to 2 million images at a time and the job will fail for payloads that exceed this limit.

Returns:

Asynchronous job object to track processing status.

Return type:

AsyncJob

create_object_index(model_run_id=None, gt_only=None)

Creates or updates object index by generating embeddings for objects that do not already have embeddings.

These embeddings are used for autotag and similarity search. This endpoint only supports indexing objects sourced from the predictions of a specific model or the ground truth annotations of the dataset.

This endpoint is idempotent. If this endpoint is called again for a model whose predictions were indexed in the past, the previously indexed predictions will not have new embeddings recomputed. The same is true for ground truth annotations.

Note that this means if you change update a prediction or ground truth bounding box that already has an associated embedding, the embedding will not be updated, even with another call to this endpoint. For now, we recommend deleting the prediction or ground truth annotation and re-inserting it to force generate a new embedding.

This endpoint is limited to generating embeddings for 3 million objects at a time and the job will fail for payloads that exceed this limit.

Parameters:
  • model_run_id (Optional[str]) –

    The ID of the model whose predictions should be indexed. Default is None, but must be supplied in the absence of gt_only.

  • gt_only (Optional[bool]) – Whether to only generate embeddings for the ground truth annotations of the dataset. Default is None, but must be supplied in the absence of model_run_id.

Returns:

Asynchronous job object to track processing status.

Return type:

AsyncJob

create_slice(name, reference_ids=None)

Creates a Slice of dataset items within a dataset.

Parameters:
  • name (str) – A human-readable name for the slice.

  • reference_ids (Optional[List[str]]) – List of reference IDs of dataset items to add to the slice, cannot exceed 10,000 items. Can be left unspecified, and an empty slice will be created.

Returns:

The newly constructed slice item.

Return type:

Slice

Raises:

BadRequest – If length of reference_ids is too large (> 10,000 items)

create_slice_by_ids(name, dataset_item_ids=None, scene_ids=None, annotation_ids=None, prediction_ids=None)

Creates a Slice of dataset items, scenes, annotations, or predictions within a dataset by their IDs.

Note

Dataset item, scene, and object (annotation or prediction) IDs may not be mixed. However, when creating an object slice, both annotation and prediction IDs may be supplied.

Parameters:
  • name (str) – A human-readable name for the slice.

  • dataset_item_ids (Optional[List[str]]) – List of internal IDs of dataset items to add to the slice:

  • scene_ids (Optional[List[str]]) – List of internal IDs of scenes to add to the slice:

  • annotation_ids (Optional[List[str]]) – List of internal IDs of Annotations to add to the slice:

  • prediction_ids (Optional[List[str]]) – List of internal IDs of Predictions to add to the slice:

Returns:

The newly constructed slice item.

Return type:

Slice

delete_annotations(reference_ids=None, keep_history=True)

Deletes all annotations associated with the specified item reference IDs.

Parameters:
  • reference_ids (Optional[list]) – List of user-defined reference IDs of the dataset items from which to delete annotations. Defaults to an empty list.

  • keep_history (bool) – Whether to preserve version history. We recommend skipping this parameter and using the default value of True.

Returns:

Empty payload response.

Return type:

AsyncJob

delete_custom_index(image=True)

Deletes the custom index uploaded to the dataset.

Returns:

Payload containing information that can be used to track the job’s status:

{
    "dataset_id": str,
    "job_id": str,
    "message": str
}

Parameters:

image (bool)

delete_item(reference_id)

Deletes an item from the dataset by item reference ID.

All annotations and predictions associated with the item will be deleted as well.

Parameters:

reference_id (str) – The user-defined reference ID of the item to delete.

Returns:

Payload to indicate deletion invocation.

Return type:

dict

delete_scene(reference_id)

Deletes a sene from the Dataset by scene reference ID.

All items, annotations, and predictions associated with the scene will be deleted as well.

Parameters:

reference_id (str) – The user-defined reference ID of the item to delete.

delete_taxonomy(taxonomy_name)

Deletes the given taxonomy.

All annotations and predictions associated with the taxonomy will be deleted as well.

Parameters:

taxonomy_name (str) – The name of the taxonomy.

Returns:

Returns a response with dataset_id, taxonomy_name, and status of the delete taxonomy operation.

{
    "dataset_id": str,
    "taxonomy_name": str,
    "status": "Taxonomy successfully deleted"
}

delete_tracks(track_reference_ids)

Deletes a list of tracks from the dataset, thereby unlinking their annotation and prediction instances.

Parameters:
  • reference_ids (List[str]) – A list of reference IDs for tracks to delete.

  • track_reference_ids (List[str])

Return type:

None

export_embeddings(asynchronous=True)

Fetches a pd.DataFrame-ready list of dataset embeddings.

Parameters:

asynchronous (bool) – Whether or not to process the export asynchronously (and return an EmbeddingsExportJob object). Default is True.

Returns:

If synchronous, a list where each item is a dict with two keys representing a row in the dataset:

List[{
    "reference_id": str,
    "embedding_vector": List[float]
}]

Otherwise, returns an EmbeddingsExportJob object.

Return type:

Union[List[Dict[str, Union[str, List[float]]]], nucleus.async_job.EmbeddingsExportJob]

export_predictions(model)

Fetches all predictions of a model that were uploaded to the dataset.

Parameters:

model (Model) – The model whose predictions to retrieve.

Returns:

List of prediction objects from the model.

Return type:

List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction, KeypointsPrediction, ]]

export_scale_task_info()

Fetches info for all linked Scale tasks of items/scenes in the dataset.

Returns:

A list of dicts, each with two keys, respectively mapping to items/scenes and info on their corresponding Scale tasks within the dataset:

List[{
    "item" | "scene": Union[:class:`DatasetItem`, :class:`Scene`],
    "scale_task_info": {
        "task_id": str,
        "task_status": str,
        "task_audit_status": str,
        "task_audit_review_comment": Optional[str],
        "project_name": str,
        "batch": str,
        "created_at": str,
        "completed_at": Optional[str]
    }[]
}]

get_image_indexing_status()

Gets the primary image index progress for the dataset.

Returns:

Response payload:

{
    "embedding_count": int
    "image_count": int
    "percent_indexed": float
    "additional_context": str
}

get_object_indexing_status(model_run_id=None)

Gets the primary object index progress of the dataset. If model_run_id is not specified, this endpoint will retrieve the indexing progress of the ground truth objects.

Returns:

Response payload:

{
    "embedding_count": int
    "object_count": int
    "percent_indexed": float
    "additional_context": str
}

get_scene(reference_id)

Fetches a single scene in the dataset by its reference ID.

Parameters:

reference_id (str) – The user-defined reference ID of the scene to fetch.

Returns:

A scene object containing frames, which in turn contain pointcloud or image items.

Return type:

Scene

get_scene_from_item_ref_id(item_reference_id)

Given a dataset item reference ID, find the Scene it belongs to.

Parameters:

item_reference_id (str)

Return type:

Optional[nucleus.scene.Scene]

get_slices(name=None, slice_type=None)

Get a list of slices from its name or underlying slice type.

Parameters:
  • name (Optional[str]) – Name of the desired slice to look up.

  • slice_type (Optional[Union[str, nucleus.slice.SliceType]]) – Type of slice to look up. This can be one of (‘dataset_item’, ‘object’, ‘scene’)

Raises:

NotFound if no slice(s) were found with the given criteria

Returns:

The Nucleus slice as an object.

Return type:

Slice

ground_truth_loc(reference_id, annotation_id)

Fetches a single ground truth annotation by ID.

Parameters:
  • reference_id (str) – User-defined reference ID of the dataset item associated with the ground truth annotation.

  • annotation_id (str) – User-defined ID of the ground truth annotation.

Returns:

Ground truth annotation object with the specified annotation ID.

Return type:

Union[ BoxAnnotation, LineAnnotation, PolygonAnnotation, KeypointsAnnotation, CuboidAnnotation, SegmentationAnnotation CategoryAnnotation ]

iloc(i)

Fetches dataset item and associated annotations by absolute numerical index.

Parameters:

i (int) – Absolute numerical index of the dataset item within the dataset.

Returns:

Payload describing the dataset item and associated annotations:

{
    "item": DatasetItem
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "keypoints": Optional[List[KeypointsAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
    }
}

Return type:

dict

info()

Fetches information about the dataset.

Returns:

Information about the dataset including its Scale-generated ID, name, length, associated Models, Slices, and more.

Return type:

DatasetInfo

ingest_tasks(task_ids)

Ingest specific tasks from an existing Scale or Rapid project into the dataset.

Note: if you would like to create a new Dataset from an exisiting Scale labeling project, use NucleusClient.create_dataset_from_project().

For more info, see our Ingest From Labeling Guide.

Parameters:

task_ids (List[str]) – List of task IDs to ingest.

Returns:

Payload describing the asynchronous upload result:

{
    "ingested_tasks": int,
    "ignored_tasks": int,
    "pending_tasks": int
}

Return type:

dict

items_and_annotation_chip_generator(chip_size, stride_size, cache_directory, query=None, num_processes=0)

Provides a generator of chips for all DatasetItems and BoxAnnotations in the dataset.

A chip is an image created by tiling a source image.

Parameters:
  • chip_size (int) – The size of the image chip

  • stride_size (int) – The distance to move when creating the next image chip. When stride is equal to chip size, there will be no overlap. When stride is equal to half the chip size, there will be 50 percent overlap.

  • cache_directory (str) – The s3 or local directory to store the image and annotations of a chip. s3 directories must be in the format s3://s3-bucket/s3-key

  • query (Optional[str]) – Structured query compatible with the Nucleus query language.

  • num_processes (int) – The number of worker processes to use to chip and upload images. If unset, no parallel processing will occur.

Returns:

Generator where each element is a dict containing the location of the image chip (jpeg) and its annotations (json).

Iterable[{
    "image_location": str,
    "annotation_location": str
}]

Return type:

Iterable[Dict[str, str]]

items_and_annotation_generator(query=None, use_mirrored_images=False)

Provides a generator of all DatasetItems and Annotations in the dataset.

Parameters:
  • query (Optional[str]) –

    Structured query compatible with the Nucleus query language.

  • use_mirrored_images (bool) – If True, returns the location of the mirrored image hosted in Scale S3. Useful when the original image is no longer available.

Returns:

Generator where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type.

Iterable[{
    "item": DatasetItem,
    "annotations": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "line": Optional[List[LineAnnotation]],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
        "keypoints": List[KeypointsAnnotation],
    }
}]

Return type:

Iterable[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_and_annotations()

Returns a list of all DatasetItems and Annotations in this dataset.

Returns:

A list of dicts, each with two keys representing a row in the dataset:

List[{
    "item": DatasetItem,
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
        "keypoints": Optional[List[KeypointsAnnotation]],
    }
}]

Return type:

List[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_generator(page_size=100000)

Generator yielding all dataset items in the dataset.

collected_ref_ids = []
for item in dataset.items_generator():
    print(f"Exporting item: {item.reference_id}")
    collected_ref_ids.append(item.reference_id)
Parameters:

page_size (int, optional) – Number of items to return per page. If you are experiencing timeouts while using this generator, you can try lowering the page size.

Yields:

DatasetItem – A single DatasetItem object.

Return type:

Iterable[nucleus.dataset_item.DatasetItem]

jobs(job_types=None, from_date=None, to_date=None, limit=JOB_REQ_LIMIT, show_completed=False, stats_only=False)

Fetch jobs pertaining to this particular dataset.

Parameters:
  • job_types (Optional[List[nucleus.job.CustomerJobTypes]]) – Filter on set of job types, if None, fetch all types, ie: [‘uploadDatasetItems’]

  • from_date (Optional[Union[str, datetime.datetime]]) – beginning of date range, as a string ‘YYYY-MM-DD’ or datetime object. For example: ‘2021-11-05’, parser.parse(‘Nov 5 2021’), or datetime(2021,11,5)

  • to_date (Optional[Union[str, datetime.datetime]]) – end of date range

  • limit (int) – number of results to fetch, max 50_000

  • show_completed (bool) – dont fetch jobs with Completed status

  • stats_only (bool) – return overview of jobs, instead of a list of job objects

list_autotags()

Fetches all autotags of the dataset.

Returns:

List of autotag payloads:

List[{
    "id": str,
    "name": str,
    "status": "completed" | "pending",
    "autotag_level": "Image" | "Object"
}]

loc(dataset_item_id)

Fetches a dataset item and associated annotations by Nucleus-generated ID.

Parameters:

dataset_item_id (str) – Nucleus-generated dataset item ID (starts with di_). This can be retrieved via Dataset.items() or a Nucleus dashboard URL.

Returns:

Payload containing the dataset item and associated annotations:

{
    "item": DatasetItem
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "keypoints": Optional[List[KeypointsAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
    }
}

Return type:

dict

prediction_loc(model, reference_id, annotation_id)

Fetches a single ground truth annotation by id.

Parameters:
  • model (Model) – Model object from which to fetch the prediction.

  • reference_id (str) – User-defined reference ID of the dataset item associated with the model prediction.

  • annotation_id (str) – User-defined ID of the ground truth annotation.

Returns:

Model prediction object with the specified annotation ID.

Return type:

Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction KeypointsPrediction ]

predictions_iloc(model, index)

Fetches all predictions of a dataset item by its absolute index.

Parameters:
  • model (Model) – Model object from which to fetch the prediction.

  • index (int) – Absolute index of the dataset item within the dataset.

Returns:

Dictionary mapping prediction type to a list of such prediction objects from the given model:

{
    "box": List[BoxPrediction],
    "polygon": List[PolygonPrediction],
    "cuboid": List[CuboidPrediction],
    "segmentation": List[SegmentationPrediction],
    "category": List[CategoryPrediction],
    "keypoints": List[KeypointsPrediction],
}

Return type:

List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction, KeypointsPrediction, ]]

predictions_refloc(model, reference_id)

Fetches all predictions of a dataset item by its reference ID.

Parameters:
  • model (Model) – Model object from which to fetch the prediction.

  • reference_id (str) – User-defined ID of the dataset item from which to fetch all predictions.

Returns:

Dictionary mapping prediction type to a list of such prediction objects from the given model:

{
    "box": List[BoxPrediction],
    "polygon": List[PolygonPrediction],
    "cuboid": List[CuboidPrediction],
    "segmentation": List[SegmentationPrediction],
    "category": List[CategoryPrediction],
    "keypoints": List[KeypointsPrediction],
}

Return type:

List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction CategoryPrediction, KeypointsPrediction, ]]

query_items(query)

Fetches all DatasetItems that pertain to a given structured query.

Parameters:

query (str) –

Structured query compatible with the Nucleus query language.

Returns:

A list of DatasetItem query results.

Return type:

Iterable[nucleus.dataset_item.DatasetItem]

query_objects(query, query_type, model_run_id=None)

Fetches all objects in the dataset that pertain to a given structured query. The results are either Predictions, Annotations, or Evaluation Matches, based on the objectType input parameter

Parameters:
  • query (str) –

    Structured query compatible with the Nucleus query language.

  • objectType – Defines the type of the object to query

  • query_type (ObjectQueryType)

  • model_run_id (Optional[str])

Returns:

An iterable of either Predictions, Annotations, or Evaluation Matches

Return type:

Iterable[Union[nucleus.annotation.Annotation, nucleus.prediction.Prediction, nucleus.evaluation_match.EvaluationMatch]]

query_scenes(query)

Fetches all Scenes that pertain to a given structured query.

Parameters:

query (str) –

Structured query compatible with the Nucleus query language.

Returns:

A list of Scene query results.

Return type:

Iterable[nucleus.scene.Scene]

refloc(reference_id)

Fetches a dataset item and associated annotations by reference ID.

Parameters:

reference_id (str) – User-defined reference ID of the dataset item.

Returns:

Payload containing the dataset item and associated annotations:

{
    "item": DatasetItem
    "annotations": {
        "box": Optional[List[BoxAnnotation]],
        "cuboid": Optional[List[CuboidAnnotation]],
        "line": Optional[List[LineAnnotation]],
        "polygon": Optional[List[PolygonAnnotation]],
        "keypoints": Option[List[KeypointsAnnotation]],
        "segmentation": Optional[List[SegmentationAnnotation]],
        "category": Optional[List[CategoryAnnotation]],
    }
}

Return type:

dict

scene_and_annotation_generator(page_size=10)

Provides a generator of all Scenes and Annotations in the dataset grouped by scene.

Parameters:

page_size (int) – Number of scenes to fetch per page. Default is 10.

Returns:

Generator where each element is a nested dict containing scene and annotation information of the dataset structured as a JSON.

Track grouping is slightly more complicated and is done in the following order: 1. If track_id is defined in the annotations table, use it as the track id. 2. If track_reference_id is defined in the metadata field of the annotations table, use it as the track id. 3. If track_id is defined in the metadata field of the annotations table, use it as the track id. 4. If track_id is not defined and annotation_id is defined, use annotation_id as track id. 5. If annotation_id grouping is unsuccessful such that the annotation id is unique across all frames in the scene for all annotations, throw an error that the annotation format is incompatible. 6. If there is no track or annotation id, throw an error that the annotation format is incompatible.

If you use the generator and discover that no scenes were generated, check the response error message for more information. It is likely that the annotations are not in the correct format.

:: Iterable[{

”scene”: {

“id”: str, “reference_id”: str, “metadata”: Dict[str, Any] “type”: str, “fileLocation”: str,

} “annotations”: {

”{trackId}”: {

“label”: str, “name”: str, “frames”: List[{

”left”: int, “top”: int, “width”: int, “height”: int, “key”: str, # frame key “metadata”: Dict[str, Any]

}]

}

}

}]

This is similar to how the Scale API returns task data

set_continuous_indexing(enable=True)

Toggle whether embeddings are automatically generated for new data.

Sets continuous indexing for a given dataset, which will automatically generate embeddings for use with autotag whenever new images are uploaded.

Parameters:

enable (bool) – Whether to enable or disable continuous indexing. Default is True.

Returns:

Response payload:

{
    "dataset_id": str,
    "message": str
    "backfill_job": AsyncJob,
}

set_primary_index(image=True, custom=False)

Sets the primary index used for Autotag and Similarity Search on this dataset.

Parameters:
  • image (bool) – Whether to configure the primary index for images or objects. Default is True (set primary image index).

  • custom (bool) – Whether to set the primary index to use custom or Nucleus-generated embeddings. Default is True (use custom embeddings as the primary index).

Returns:

{

“success”: bool,

}

update_autotag(autotag_id)

Rerun autotag inference on all items in the dataset.

Currently this endpoint does not try to skip already inferenced items, but this improvement is planned for the future. This means that for now, you can only have one job running at a time, so please await the result using job.sleep_until_complete() before launching another job.

Parameters:

autotag_id (str) – ID of the autotag to re-inference. You can retrieve the ID you want with list_autotags(), or from its URL in the “Manage Autotags” page in the dashboard.

Returns:

Asynchronous job object to track processing status.

Return type:

AsyncJob

update_item_metadata(mapping, asynchronous=False)

Update (merge) dataset item metadata for each reference_id given in the mapping. The backend will join the specified mapping metadata to the existing metadata. If there is a key-collision, the value given in the mapping will take precedence.

This method may also be used to udpate the camera_params for a particular set of items. Just specify the key camera_params in the metadata for each reference_id along with all the necessary fields.

Parameters:
  • mapping (Dict[str, dict]) – key-value pair of <reference_id>: <metadata>

  • asynchronous (bool) – if True, run the update as a background job

Examples

>>> mapping = {"item_ref_1": {"new_key": "foo"}, "item_ref_2": {"some_value": 123, "camera_params": {...}}}
>>> dataset.update_item_metadata(mapping)
Returns:

A dictionary outlining success or failures.

Parameters:
  • mapping (Dict[str, dict])

  • asynchronous (bool)

update_scene_metadata(mapping, asynchronous=False)

Update (merge) scene metadata for each reference_id given in the mapping. The backend will join the specified mapping metadata to the existing metadata. If there is a key-collision, the value given in the mapping will take precedence.

Parameters:
  • mapping (Dict[str, dict]) – key-value pair of <reference_id>: <metadata>

  • asynchronous (bool) – if True, run the update as a background job

Examples

>>> mapping = {"scene_ref_1": {"new_key": "foo"}, "scene_ref_2": {"some_value": 123}}
>>> dataset.update_scene_metadata(mapping)
Returns:

A dictionary outlining success or failures.

Parameters:
  • mapping (Dict[str, dict])

  • asynchronous (bool)

upload_lidar_semseg_predictions(model, pointcloud_ref_id, predictions_s3_path)

Upload Lidar Semantic Segmentation predictions for a given point-cloud.

Assuming a point-cloud with only 4 points (three labeled as Car, one labeled as Person), the contents of the predictions s3 object should be formatted as such:

{
    "objects": [
        { "label": "Car", "index": 1},
        { "label": "Person", "index": 2}
    ],
    "point_objects": [1, 1, 1, 2],
    "point_confidence": [0.5, 0.9, 0.9, 0.3]
}

The order of the points in the “point_objects” should be in the same order as the points that were originally uploaded to Scale.

Parameters:
  • model (Model) – Nucleus model used to store these predictions

  • pointcloud_ref_id (str) – The reference ID of the pointcloud for which these predictions belong to

  • predictions_s3_path (str) – S3 path to where the predictions are stored

upload_predictions(model, predictions, update=False, asynchronous=False, batch_size=5000, remote_files_per_upload_request=20, local_files_per_upload_request=10, trained_slice_id=None)

Uploads predictions and associates them with an existing Model.

Adding predictions to your dataset in Nucleus allows you to visualize discrepancies against ground truth, query dataset items based on the predictions they contain, and evaluate your models by comparing their predictions to ground truth.

Nucleus supports Box, Polygon, Cuboid, Segmentation, Category, and Category predictions. Cuboid predictions can only be uploaded to a pointcloud DatasetItem.

When uploading a prediction, you need to specify which item you are annotating via the reference_id you provided when uploading the image or pointcloud.

Ground truth uploads can be made idempotent by specifying an optional annotation_id for each prediction. This id should be unique within the dataset_item so that (reference_id, annotation_id) is unique within the dataset.

See SegmentationPrediction for specific requirements to upload segmentation predictions.

For ingesting large prediction payloads, see the Guide for Large Ingestions.

Parameters:
  • model (Model) – Nucleus-generated model ID (starts with prj_). This can be retrieved via list_models() or a Nucleus dashboard URL.

  • predictions (List[Union[ BoxPrediction, PolygonPrediction, CuboidPrediction, SegmentationPrediction, CategoryPrediction SceneCategoryPrediction ]]) – List of prediction objects to upload.

  • update (bool) – Whether or not to overwrite metadata or ignore on reference ID collision. Default is False.

  • asynchronous (bool) – Whether or not to process the upload asynchronously (and return an AsyncJob object). Default is False.

  • batch_size (int) – Number of predictions processed in each concurrent batch. Default is 5000. If you get timeouts when uploading geometric predictions, you can try lowering this batch size. This is only relevant for asynchronous=False

  • remote_files_per_upload_request (int) – Number of remote files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with remote urls, you should lower this value from its default of 20. This is only relevant for asynchronous=False.

  • local_files_per_upload_request (int) – Number of local files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with local files, you should lower this value from its default of 10. The maximum is 10. This is only relevant for asynchronous=False

  • trained_slice_id (Optional[str]) – Nucleus-generated slice ID (starts with slc_) which was used to train the model.

Returns:

Payload describing the synchronous upload::
{

“dataset_id”: str, “model_run_id”: str, “predictions_processed”: int, “predictions_ignored”: int,

}

class nucleus.DatasetInfo(**data)

High-level Dataset information

Parameters:

data (Any)

dataset_id

Nucleus-generated dataset ID

name

User-defined name of dataset

length

Number of DatasetItem in Dataset

model_run_ids

(deprecated)

slice_ids

List Slice IDs associated with the Dataset

annotation_metadata_schema

Dict defining annotation-level metadata schema.

item_metadata_schema

Dict defining item metadata schema.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

classmethod construct(_fields_set=None, **values)

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

Parameters:
  • _fields_set (Optional[pydantic.v1.typing.SetStr])

  • values (Any)

Return type:

Model

copy(*, include=None, exclude=None, update=None, deep=False)

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters:
  • include (Optional[Union[pydantic.v1.typing.AbstractSetIntStr, pydantic.v1.typing.MappingIntStrAny]]) – fields to include in new model

  • exclude (Optional[Union[pydantic.v1.typing.AbstractSetIntStr, pydantic.v1.typing.MappingIntStrAny]]) – fields to exclude from new model, as with values this takes precedence over include

  • update (Optional[pydantic.v1.typing.DictStrAny]) – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep (bool) – set to True to make a deep copy of the model

Returns:

new model instance

Return type:

Model

dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:
  • include (Optional[Union[pydantic.v1.typing.AbstractSetIntStr, pydantic.v1.typing.MappingIntStrAny]])

  • exclude (Optional[Union[pydantic.v1.typing.AbstractSetIntStr, pydantic.v1.typing.MappingIntStrAny]])

  • by_alias (bool)

  • skip_defaults (Optional[bool])

  • exclude_unset (bool)

  • exclude_defaults (bool)

  • exclude_none (bool)

Return type:

pydantic.v1.typing.DictStrAny

json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

Parameters:
  • include (Optional[Union[pydantic.v1.typing.AbstractSetIntStr, pydantic.v1.typing.MappingIntStrAny]])

  • exclude (Optional[Union[pydantic.v1.typing.AbstractSetIntStr, pydantic.v1.typing.MappingIntStrAny]])

  • by_alias (bool)

  • skip_defaults (Optional[bool])

  • exclude_unset (bool)

  • exclude_defaults (bool)

  • exclude_none (bool)

  • encoder (Optional[Callable[[Any], Any]])

  • models_as_dict (bool)

  • dumps_kwargs (Any)

Return type:

str

classmethod model_construct(_fields_set=None, **values)

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note

model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:
  • _fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.

  • values (Any) – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

Return type:

typing_extensions.Self

model_copy(*, update=None, deep=False)

Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#model_copy

Returns a copy of the model.

Parameters:
  • update (dict[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.

  • deep (bool) – Set to True to make a deep copy of the model.

Returns:

New model instance.

Return type:

typing_extensions.Self

model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, round_trip=False, warnings=True, serialize_as_any=False)

Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:
  • mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.

  • include (IncEx | None) – A set of fields to include in the output.

  • exclude (IncEx | None) – A set of fields to exclude from the output.

  • context (Any | None) – Additional context to pass to the serializer.

  • by_alias (bool) – Whether to use the field’s alias in the dictionary key if defined.

  • exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.

  • exclude_defaults (bool) – Whether to exclude fields that are set to their default value.

  • exclude_none (bool) – Whether to exclude fields that have a value of None.

  • round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].

  • warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].

  • serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

Return type:

dict[str, Any]

model_dump_json(*, indent=None, include=None, exclude=None, context=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, round_trip=False, warnings=True, serialize_as_any=False)

Usage docs: https://docs.pydantic.dev/2.9/concepts/serialization/#modelmodel_dump_json

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:
  • indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.

  • include (IncEx | None) – Field(s) to include in the JSON output.

  • exclude (IncEx | None) – Field(s) to exclude from the JSON output.

  • context (Any | None) – Additional context to pass to the serializer.

  • by_alias (bool) – Whether to serialize using field aliases.

  • exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.

  • exclude_defaults (bool) – Whether to exclude fields that are set to their default value.

  • exclude_none (bool) – Whether to exclude fields that have a value of None.

  • round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].

  • warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].

  • serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Return type:

str

classmethod model_json_schema(by_alias=True, ref_template=DEFAULT_REF_TEMPLATE, schema_generator=GenerateJsonSchema, mode='validation')

Generates a JSON schema for a model class.

Parameters:
  • by_alias (bool) – Whether to use attribute aliases or not.

  • ref_template (str) – The reference template.

  • schema_generator (type[pydantic.json_schema.GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications

  • mode (pydantic.json_schema.JsonSchemaMode) – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

Return type:

dict[str, Any]

classmethod model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:

params (tuple[type[Any], Ellipsis]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.

Returns:

String representing the new class where params are passed to cls as type variables.

Raises:

TypeError – Raised when trying to generate concrete names for non-generic models.

Return type:

str

model_post_init(__context)

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

Parameters:

__context (Any)

Return type:

None

classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:
  • force (bool) – Whether to force the rebuilding of the model schema, defaults to False.

  • raise_errors (bool) – Whether to raise errors, defaults to True.

  • _parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.

  • _types_namespace (dict[str, Any] | None) – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Return type:

bool | None

classmethod model_validate(obj, *, strict=None, from_attributes=None, context=None)

Validate a pydantic model instance.

Parameters:
  • obj (Any) – The object to validate.

  • strict (bool | None) – Whether to enforce types strictly.

  • from_attributes (bool | None) – Whether to extract data from object attributes.

  • context (Any | None) – Additional context to pass to the validator.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

Return type:

typing_extensions.Self

classmethod model_validate_json(json_data, *, strict=None, context=None)

Usage docs: https://docs.pydantic.dev/2.9/concepts/json/#json-parsing

Validate the given JSON data against the Pydantic model.

Parameters:
  • json_data (str | bytes | bytearray) – The JSON data to validate.

  • strict (bool | None) – Whether to enforce types strictly.

  • context (Any | None) – Extra variables to pass to the validator.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

Return type:

typing_extensions.Self

classmethod model_validate_strings(obj, *, strict=None, context=None)

Validate the given object with string data against the Pydantic model.

Parameters:
  • obj (Any) – The object containing string data to validate.

  • strict (bool | None) – Whether to enforce types strictly.

  • context (Any | None) – Extra variables to pass to the validator.

Returns:

The validated Pydantic model.

Return type:

typing_extensions.Self

classmethod update_forward_refs(**localns)

Try to update ForwardRefs on fields based on this Model, globalns and localns.

Parameters:

localns (Any)

Return type:

None

class nucleus.DatasetItem

A dataset item is an image or pointcloud that has associated metadata.

Note: for 3D data, please include a CameraParams object under a key named “camera_params” within the metadata dictionary. This will allow for projecting 3D annotations to any image within a scene.

Parameters:
  • image_location (Optional[str]) – Required if pointcloud_location is not present: The location containing the image for the given row of data. This can be a local path, or a remote URL. Remote formats supported include any URL (http:// or https://) or URIs for AWS S3, Azure, or GCS (i.e. s3://, gcs://).

  • pointcloud_location (Optional[str]) – Required if image_location is not present: The remote URL containing the pointcloud JSON. Remote formats supported include any URL (http:// or https://) or URIs for AWS S3, Azure, or GCS (i.e. s3://, gcs://).

  • reference_id (Optional[str]) – A user-specified identifier to reference the item.

  • metadata (Optional[dict]) –

    Extra information about the particular dataset item. ints, floats, string values will be made searchable in the query bar by the key in this dict. For example, {"animal": "dog"} will become searchable via metadata.animal = "dog".

    Categorical data can be passed as a string and will be treated categorically by Nucleus if there are less than 250 unique values in the dataset. This means histograms of values in the “Insights” section and autocomplete within the query bar.

    Numerical metadata will generate histograms in the “Insights” section, allow for sorting the results of any query, and can be used with the modulo operator For example: metadata.frame_number % 5 = 0

    All other types of metadata will be visible from the dataset item detail view.

    It is important that string and numerical metadata fields are consistent - if a metadata field has a string value, then all metadata fields with the same key should also have string values, and vice versa for numerical metadata. If conflicting types are found, Nucleus will return an error during upload!

    The recommended way of adding or updating existing metadata is to re-run the ingestion (dataset.append) with update=True, which will replace any existing metadata with whatever your new ingestion run uses. This will delete any metadata keys that are not present in the new ingestion run. We have a cache based on image_location that will skip the need for a re-upload of the images, so your second ingestion will be faster than your first.

    For 3D (sensor fusion) data, it is highly recommended to include camera intrinsics the metadata of your camera image items. Nucleus requires these intrinsics to create visualizations such as cuboid projections. Refer to our guide to uploading 3D data for more info.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

    Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying { “context_attachments”: [ { “attachment”: ‘https://example.com/1’ }, { “attachment”: ‘https://example.com/2’ }, … ] }.

classmethod from_json(payload)

Instantiates dataset item object from schematized JSON dict payload.

Parameters:

payload (dict)

to_json()

Serializes dataset item object to schematized JSON string.

Return type:

str

to_payload(is_scene=False)

Serializes dataset item object to schematized JSON dict.

Return type:

dict

class nucleus.EmbeddingsExportJob

Object used to check the status or errors of a long running asynchronous operation.

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg")

# When kicking off an asynchronous job, store the return value as a variable
job = dataset.append(items=YOUR_DATASET_ITEMS, asynchronous=True)

# Poll for status or errors
print(job.status())
print(job.errors())

# Block until job finishes
job.sleep_until_complete()
errors()

Fetches a list of the latest errors generated by the asynchronous job.

Useful for debugging failed or partially successful jobs.

Returns:

A list of strings containing the 10,000 most recently generated errors.

[
    '{"annotation":{"label":"car","type":"box","geometry":{"x":50,"y":60,"width":70,"height":80},"referenceId":"bad_ref_id","annotationId":"attempted_annot_upload","metadata":{}},"error":"Item with id bad_ref_id does not exist."}'
]

Return type:

List[str]

classmethod from_id(job_id, client)

Creates a job instance from a specific job Id.

Parameters:
  • job_id (str) – Defines the job Id

  • client (NucleusClient) – The client to use for the request.

Returns:

The specific AsyncMethod (or inherited) instance.

result_urls(wait_for_completion=True)

Gets a list of signed Scale URLs for each embedding batch.

Parameters:

wait_for_completion – Defines whether the call shall wait for the job to complete. Defaults to True

Returns:

A list of signed Scale URLs which contain batches of embeddings.

The files contain a JSON array of embedding records with the following schema:
[{

“reference_id”: str, “embedding_vector”: List[float]

}]

Return type:

List[str]

sleep_until_complete(verbose_std_out=True)

Blocks until the job completes or errors.

Parameters:

verbose_std_out (Optional[bool]) – Whether or not to verbosely log while sleeping. Defaults to True.

status()

Fetches status of the job and an informative message on job progress.

Returns:

A dict of the job ID, status (one of Running, Completed, or Errored), an informative message on the job progress, and number of both completed and total steps.

{
    "job_id": "job_c19xcf9mkws46gah0000",
    "status": "Completed",
    "message": "Job completed successfully.",
    "job_progress": "0.33",
    "completed_steps": "1",
    "total_steps:": "3",
}

Return type:

Dict[str, str]

class nucleus.Frame(**kwargs)

Collection of sensor data pertaining to a single time step.

For 3D data, each Frame houses a sensor-to-data mapping and must have exactly one pointcloud with any number of camera images.

Parameters:

**kwargs (Dict[str, DatasetItem]) – Mappings from sensor name to dataset item. Each frame of a lidar scene must contain exactly one pointcloud and any number of images (e.g. from different angles).

Refer to our guide to uploading 3D data for more info!

add_item(item, sensor_name)

Adds DatasetItem object to frame as sensor data.

Parameters:
  • item (DatasetItem) – Pointcloud or camera image item to add.

  • sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Return type:

None

classmethod from_json(payload)

Instantiates frame object from schematized JSON dict payload.

Parameters:

payload (dict)

get_item(sensor_name)

Fetches the DatasetItem object associated with the given sensor.

Parameters:

sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Returns:

DatasetItem object pertaining to the sensor.

Return type:

DatasetItem

get_items()

Fetches all items in the frame.

Returns:

List of all DatasetItem objects in the frame.

Return type:

List[DatasetItem]

get_sensors()

Fetches all sensor names of the frame.

Returns:

List of all sensor names of the frame.

Return type:

List[str]

to_payload()

Serializes frame object to schematized JSON dict.

Return type:

dict

class nucleus.Keypoint

A 2D point that has an additional visibility flag.

Keypoints are intended to be part of a larger collection, and connected via a pre-defined skeleton. A keypoint in this skeleton may be visible or not-visible, and may be unlabeled and not visible. Because of this, the x, y coordinates may be optional, assuming that the keypoint is not visible, and would not be shown as part of the combined label.

Parameters:
  • x (Optional[float]) – The x coordinate of the point.

  • y (Optional[float]) – The y coordinate of the point.

  • visible (bool) – The visibility of the point.

class nucleus.KeypointsAnnotation

A keypoints annotation containing a list of keypoints and the structure of those keypoints: the naming of each point and the skeleton that connects those keypoints.

from nucleus import KeypointsAnnotation

keypoints = KeypointsAnnotation(
    label="face",
    keypoints=[Keypoint(100, 100), Keypoint(120, 120), Keypoint(visible=False), Keypoint(0, 0)],
    names=["point1", "point2", "point3", "point4"],
    skeleton=[[0, 1], [1, 2], [1, 3], [2, 3]],
    reference_id="image_2",
    annotation_id="image_2_face_keypoints_1",
    metadata={"face_direction": "forward"},
    track_reference_id="face_1",
)
Parameters:
  • label (str) – The label for this annotation.

  • keypoints (List[Keypoint]) – The list of keypoints objects.

  • names (List[str]) – A list that corresponds to the names of each keypoint.

  • skeleton (Optional[List[List[int]]]) – A list of 2-length lists indicating a beginning and ending index for each line segment in the skeleton of this keypoint label.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.KeypointsPrediction(label, keypoints, names, skeleton, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, track_reference_id=None)

Prediction of keypoints.

Parameters:
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle).

  • keypoints (List[Keypoint]) – The list of keypoints objects.

  • names (List[str]) – A list that corresponds to the names of each keypoint.

  • skeleton (List[List[int]]) – A list of 2-length lists indicating a beginning and ending index for each line segment in the skeleton of this keypoint label.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.LidarPoint

A Lidar point in 3D space and intensity.

Parameters:
  • x (float) – The x coordinate of the point.

  • y (float) – The y coordinate of the point.

  • z (float) – The z coordinate of the point.

  • i (float) – The intensity value returned by the lidar scan point.

class nucleus.LidarScene

Sequence of lidar pointcloud and camera images over time.

Nucleus 3D datasets are comprised of LidarScenes, which are sequences of lidar pointclouds and camera images over time. These sequences are in turn comprised of Frames.

By organizing data across multiple sensors over time, LidarScenes make it easier to interpret pointclouds, allowing you to see objects move over time by clicking through frames and providing context in the form of corresponding images.

You can think of scenes and frames as nested groupings of sensor data across time:

  • LidarScene for a given location
    • Frame at timestep 0
      • DatasetItem of pointcloud

      • DatasetItem of front camera image

      • DatasetItem of rear camera image

    • Frame at timestep 1
  • LidarScene for another location

LidarScenes are uploaded to a Dataset with any accompanying metadata. Frames do not accept metadata, but each of its constituent DatasetItems does.

Note: Uploads with a different number of frames/items will error out (only on scenes that now differ). Existing scenes are expected to retain the same structure, i.e. the same number of frames, and same items per frame. If a scene definition is changed (for example, additional frames added) the update operation will be ignored. If you would like to alter the structure of a scene, please delete the scene and re-upload.

Parameters:
  • reference_id (str) – User-specified identifier to reference the scene.

  • frames (Optional[List[Frame]]) – List of frames to be a part of the scene. A scene can be created before frames or items have been added to it, but must be non-empty when uploading to a Dataset.

  • metadata (Optional[Dict]) –

    Optional metadata to include with the scene.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

    Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying { “context_attachments”: [ { “attachment”: ‘https://example.com/1’ }, { “attachment”: ‘https://example.com/2’ }, … ] }.

Refer to our guide to uploading 3D data for more info!

add_frame(frame, index, update=False)

Adds frame to scene at the specified index.

Parameters:
  • frame (Frame) – Frame object to add.

  • index (int) – Serial index at which to add the frame.

  • update (bool) – Whether to overwrite the frame at the specified index, if it exists. Default is False.

Return type:

None

add_item(index, sensor_name, item)

Adds DatasetItem to the specified frame as sensor data.

Parameters:
  • index (int) – Serial index of the frame to which to add the item.

  • item (DatasetItem) – Pointcloud or camera image item to add.

  • sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Return type:

None

classmethod from_json(payload, client=None, skip_validate=False)

Instantiates scene object from schematized JSON dict payload.

Parameters:
get_frame(index)

Fetches the Frame object at the specified index.

Parameters:

index (int) – Serial index for which to retrieve the Frame.

Returns:

Frame object at the specified index.

Return type:

Frame

get_frames()

Fetches a sorted list of Frames of the scene.

Returns:

List of Frames, sorted by index ascending.

Return type:

List[Frame]

get_item(index, sensor_name)

Fetches the DatasetItem object of the given frame and sensor.

Parameters:
  • index (int) – Serial index of the frame from which to fetch the item.

  • sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Returns:

DatasetItem object of the frame and sensor.

Return type:

DatasetItem

get_items()

Fetches all items in the scene.

Returns:

Unordered list of all DatasetItem objects in the scene.

Return type:

List[DatasetItem]

get_items_from_sensor(sensor_name)

Fetches all DatasetItem objects of the given sensor.

Parameters:

sensor_name (str) – Name of the sensor, e.g. “lidar” or “front_cam.”

Returns:

List of DatasetItem objects associated with the specified sensor.

Return type:

List[DatasetItem]

get_sensors()

Fetches all sensor names of the scene.

Returns:

List of all sensor names associated with frames in the scene.

Return type:

List[str]

info()

Fetches information about the scene.

Returns:

Payload containing:

{
    "reference_id": str,
    "length": int,
    "num_sensors": int
}

to_json()

Serializes scene object to schematized JSON string.

Return type:

str

to_payload()

Serializes scene object to schematized JSON dict.

Return type:

dict

class nucleus.LineAnnotation

A polyline annotation consisting of an ordered list of 2D points. A LineAnnotation differs from a PolygonAnnotation by not forming a closed loop, and by having zero area.

from nucleus import LineAnnotation

line = LineAnnotation(
    label="face",
    vertices=[Point(100, 100), Point(200, 300), Point(300, 200)],
    reference_id="person_image_1",
    annotation_id="person_image_1_line_1",
    metadata={"camera_mode": "portrait"},
    track_reference_id="face_human",
)
Parameters:
  • label (str) – The label for this annotation.

  • vertices (List[Point]) – The list of points making up the line.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.LinePrediction(label, vertices, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, track_reference_id=None)

Prediction of a line.

Parameters:
  • label (str) – The label for this prediction (e.g. car, pedestrian, bicycle).

  • vertices (List[Point]) – The list of points making up the line.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this prediction. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.Model(model_id, name, reference_id, metadata, client, bundle_name=None, tags=None, trained_slice_ids=None)

A model that can be used to upload predictions to a dataset.

By uploading model predictions to Nucleus, you can compare your predictions to ground truth annotations and discover problems with your Models or Dataset.

You can also upload predictions for unannotated images, letting you query them based on model predictions. This can help you prioritize which unlabeled data to label next.

Within Nucleus, Models work in the following way:

  1. You first create a Model. You can do this just once and reuse the model on multiple datasets.

  2. You then upload predictions to a dataset.

  3. Trigger calculation of metrics in order to view model debugging insights.

The above steps above will allow you to visualize model performance within Nucleus, or compare multiple models that have been run on the same Dataset.

Note that you can always add more predictions to a dataset, but then you will need to re-run the calculation of metrics in order to have them be correct.

import nucleus

client = nucleus.NucleusClient(YOUR_SCALE_API_KEY)
dataset = client.get_dataset(YOUR_DATASET_ID)

prediction_1 = nucleus.BoxPrediction(
    label="label",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="1",
    confidence=0.9,
    class_pdf={"label": 0.9, "other_label": 0.1},
)
prediction_2 = nucleus.BoxPrediction(
    label="label",
    x=0,
    y=0,
    width=10,
    height=10,
    reference_id="2",
    confidence=0.2,
    class_pdf={"label": 0.2, "other_label": 0.8},
)

model = client.create_model(
    name="My Model", reference_id="My-CNN", metadata={"timestamp": "121012401"}
)

# For small ingestions, we recommend synchronous ingestion
response = dataset.upload_predictions(model, [prediction_1, prediction_2])

# For large ingestions, we recommend asynchronous ingestion
job = dataset.upload_predictions(
    model, [prediction_1, prediction_2], asynchronous=True
)
# Check current status
job.status()
# Sleep until ingestion is done
job.sleep_until_complete()
# Check errors
job.errors()

dataset.calculate_evaluation_metrics(model)

Models cannot be instantiated directly and instead must be created via API endpoint, using NucleusClient.create_model().

add_tags(tags)

Tag the model with custom tag names.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.add_tags(["tag_A", "tag_B"])
Parameters:

tags (List[str]) – list of tag names

add_trained_slice_ids(slice_ids)

Add trained slice id(s) to the model.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.add_trained_slice_ids(["slc_...", "slc_..."])
Parameters:

slice_ids (List[str]) – list of trained slice ids

evaluate(scenario_test_names)

Evaluates this on the specified Unit Tests.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]
scenario_test = client.validate.create_scenario_test(
    "sample_scenario_test", "YOUR_SLICE_ID"
)

model.evaluate(["sample_scenario_test"])
Parameters:

scenario_test_names (List[str]) – list of unit tests to evaluate

Returns:

AsyncJob object of evaluation job

Return type:

nucleus.async_job.AsyncJob

classmethod from_json(payload, client)

Instantiates model object from schematized JSON dict payload.

Parameters:

payload (dict)

remove_tags(tags)

Remove tag(s) from the model.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.remove_tags(["tag_x"])
Parameters:

tags (List[str]) – list of tag names to remove

remove_trained_slice_ids(slide_ids)

Remove trained slice id(s) from the model.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.remove_trained_slice_ids(["slc_...", "slc_..."])
Parameters:
  • slice_ids – list of trained slice ids to remove

  • slide_ids (List[str])

run(dataset_id, model_run_name, slice_id)

Runs inference on the bundle associated with the model on the dataset.

import nucleus
client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
model = client.list_models()[0]

model.run("ds_123456")
Parameters:
  • dataset_id (str) – The ID of the dataset to run inference on.

  • model_run_name (str) – The name of the model run.

  • slice_id (Optional[str]) – The ID of the slice of the dataset to run inference on.

Returns:

The ID of the AsyncJob used to track job progress.

Return type:

job_id

class nucleus.NucleusClient(api_key=None, use_notebook=False, endpoint=None)

Client to interact with the Nucleus API via Python SDK.

Parameters:
  • api_key (Optional[str]) – Follow this guide to retrieve your API keys.

  • use_notebook (bool) – Whether the client is being used in a notebook (toggles tqdm style). Default is False.

  • endpoint (Optional[str]) – Base URL of the API. Default is Nucleus’s current production API.

append_to_slice(slice_id, reference_ids, dataset_id)

Appends dataset items or scenes to an existing slice.

Parameters:
  • slice_id (str) – Nucleus-generated dataset ID (starts with slc_). This can be retrieved via Dataset.slices() or a Nucleus dashboard URL.

  • reference_ids (List[str]) – List of user-defined reference IDs of dataset items or scenes to append to the slice.

  • dataset_id (str) – ID of dataset this slice belongs to.

Returns:

Empty payload response.

Return type:

dict

create_dataset(name, is_scene=None, use_privacy_mode=False, item_metadata_schema=None, annotation_metadata_schema=None)

Creates a new, empty dataset.

Make sure that the dataset is created for the data type you would like to support. Be sure to set the is_scene parameter correctly.

Parameters:
  • name (str) – A human-readable name for the dataset.

  • is_scene (Optional[bool]) – Whether the dataset contains strictly scenes or items. This value is immutable. Default is False (dataset of items).

  • use_privacy_mode (bool) – Whether the images of this dataset should be uploaded to Scale. If set to True, customer will have to adjust their file access policy with Scale.

  • item_metadata_schema (Optional[Dict]) – Dict defining item-level metadata schema. See below.

  • annotation_metadata_schema (Optional[Dict]) –

    Dict defining annotation-level metadata schema.

    Metadata schemas must be structured as follows:

    {
        "field_name": {
            "type": "category" | "number" | "text" | "json"
            "choices": List[str] | None
            "description": str | None
        },
        ...
    }
    

Returns:

The newly created Nucleus dataset as an object.

Return type:

Dataset

create_dataset_from_dir(dirname, dataset_name=None, use_privacy_mode=False, privacy_mode_proxy='', allowed_file_types=('png', 'jpg', 'jpeg'), skip_size_warning=False)

Create a dataset by recursively crawling through a directory. A DatasetItem will be created for each unique image found.

Parameters:
  • dirname (str) – Where to look for image files, recursively

  • dataset_name (Optional[str]) – If none is given, the parent folder name is used

  • use_privacy_mode (bool) – Whether the dataset should be treated as privacy

  • privacy_mode_proxy (str) – Endpoint that serves image files for privacy mode, ignore if not using privacy mode. The proxy should work based on the relative path of the images in the directory.

  • allowed_file_types (Tuple[str, Ellipsis]) – Which file type extensions to search for, ie: (‘jpg’, ‘png’)

  • skip_size_warning (bool) – If False, it will throw an error if the script globs more than 500 images. This is a safety check in case the dirname has a typo, and grabs too much data.

Return type:

dataset.Dataset

create_dataset_from_project(project_id, last_n_tasks=None, name=None)

Create a new dataset from an existing Scale or Rapid project.

If you already have Annotation, SegmentAnnotation, VideoAnnotation, Categorization, PolygonAnnotation, ImageAnnotation, DocumentTranscription, LidarLinking, LidarAnnotation, or VideoboxAnnotation projects with Scale, use this endpoint to import your project directly into Nucleus.

This endpoint is asynchronous because there can be delays when the number of tasks is larger than 1000. As a result, the endpoint returns an instance of AsyncJob.

Parameters:
  • project_id (str) – The ID of the Scale/Rapid project (retrievable from URL).

  • last_n_tasks (Optional[int]) – If supplied, only pull in this number of the most recent tasks. By default the endpoint will pull in all eligible tasks.

  • name (Optional[str]) – The name for your new Nucleus dataset. By default the endpoint will use the project’s name.

Returns:

The newly created Nucleus dataset as an object.

Return type:

Dataset

create_launch_model(name, reference_id, bundle_args, metadata=None, trained_slice_ids=None)

Adds a Model to Nucleus, as well as a Launch bundle from a given function.

Parameters:
  • name (str) – A human-readable name for the model.

  • reference_id (str) – Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme.

  • bundle_args (Dict[str, Any]) – Dict for kwargs for the creation of a Launch bundle, more details on the keys below.

  • metadata (Optional[Dict]) – An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model.

  • trained_slice_ids (Optional[List[str]])

Returns:

The newly created model as an object.

Return type:

Model

Details on bundle_args:

Grabs a s3 signed url and uploads a model bundle to Scale Launch.

A model bundle consists of exactly {predict_fn_or_cls}, {load_predict_fn + model}, or {load_predict_fn + load_model_fn}. Pre/post-processing code can be included inside load_predict_fn/model or in predict_fn_or_cls call. Note: the exact parameters used will depend on the version of the Launch client used. i.e. if you are on Launch client version 0.x, you will use env_params, otherwise you will use pytorch_image_tag and tensorflow_version.

Parameters:
  • model_bundle_name – Name of model bundle you want to create. This acts as a unique identifier.

  • predict_fn_or_cls – Function or a Callable class that runs end-to-end (pre/post processing and model inference) on the call. I.e. predict_fn_or_cls(REQUEST) -> RESPONSE.

  • model – Typically a trained Neural Network, e.g. a Pytorch module

  • load_predict_fn – Function that when called with model, returns a function that carries out inference I.e. load_predict_fn(model) -> func; func(REQUEST) -> RESPONSE

  • load_model_fn – Function that when run, loads a model, e.g. a Pytorch module I.e. load_predict_fn(load_model_fn()) -> func; func(REQUEST) -> RESPONSE

  • bundle_url – Only for self-hosted mode. Desired location of bundle.

  • self.bundle_location_fn (Overrides any value given by)

  • requirements – A list of python package requirements, e.g. [“tensorflow==2.3.0”, “tensorflow-hub==0.11.0”]. If no list has been passed, will default to the currently imported list of packages.

  • app_config – Either a Dictionary that represents a YAML file contents or a local path to a YAML file.

  • env_params – Only for launch v0. A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which cuda/cudnn versions to use. Specifically, the dictionary should contain the following keys: “framework_type”: either “tensorflow” or “pytorch”. “pytorch_version”: Version of pytorch, e.g. “1.5.1”, “1.7.0”, etc. Only applicable if framework_type is pytorch “cuda_version”: Version of cuda used, e.g. “11.0”. “cudnn_version” Version of cudnn used, e.g. “cudnn8-devel”. “tensorflow_version”: Version of tensorflow, e.g. “2.3.0”. Only applicable if framework_type is tensorflow

  • globals_copy – Dictionary of the global symbol table. Normally provided by globals() built-in function.

  • pytorch_image_tag – Only for launch v1, and if you want to use pytorch framework type. The tag of the pytorch docker image you want to use, e.g. 1.11.0-cuda11.3-cudnn8-runtime

  • tensorflow_version – Only for launch v1, and if you want to use tensorflow. Version of tensorflow, e.g. “2.3.0”.

  • name (str)

  • reference_id (str)

  • bundle_args (Dict[str, Any])

  • metadata (Optional[Dict])

  • trained_slice_ids (Optional[List[str]])

Return type:

model.Model

create_launch_model_from_dir(name, reference_id, bundle_from_dir_args, metadata=None, trained_slice_ids=None)

Adds a Model to Nucleus, as well as a Launch bundle from a directory.

Parameters:
  • name (str) – A human-readable name for the model.

  • reference_id (str) – Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme.

  • bundle_from_dir_args (Dict[str, Any]) – Dict for kwargs for the creation of a bundle from directory, more details on the keys below.

  • metadata (Optional[Dict]) – An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model.

  • trained_slice_ids (Optional[List[str]])

Returns:

The newly created model as an object.

Return type:

Model

Details on bundle_from_dir_args Packages up code from one or more local filesystem folders and uploads them as a bundle to Scale Launch. In this mode, a bundle is just local code instead of a serialized object.

For example, if you have a directory structure like so, and your current working directory is also my_root:

``` my_root/

my_module1/

__init__.py …files and directories my_inference_file.py

my_module2/

__init__.py …files and directories

```

then calling create_model_bundle_from_dirs with base_paths=[“my_module1”, “my_module2”] essentially creates a zip file without the root directory, e.g.:

``` my_module1/

__init__.py …files and directories my_inference_file.py

my_module2/

__init__.py …files and directories

```

and these contents will be unzipped relative to the server side PYTHONPATH. Bear these points in mind when referencing Python module paths for this bundle. For instance, if my_inference_file.py has def f(…) as the desired inference loading function, then the load_predict_fn_module_path argument should be my_module1.my_inference_file.f.

Note: the exact keys for bundle_from_dir_args used will depend on the version of the Launch client used. i.e. if you are on Launch client version 0.x, you will use env_params, otherwise you will use pytorch_image_tag and tensorflow_version.

Keys for bundle_from_dir_args:

model_bundle_name: Name of model bundle you want to create. This acts as a unique identifier. base_paths: The paths on the local filesystem where the bundle code lives. requirements_path: A path on the local filesystem where a requirements.txt file lives. env_params: Only for launch v0.

A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which cuda/cudnn versions to use. Specifically, the dictionary should contain the following keys: “framework_type”: either “tensorflow” or “pytorch”. “pytorch_version”: Version of pytorch, e.g. “1.5.1”, “1.7.0”, etc. Only applicable if framework_type is pytorch “cuda_version”: Version of cuda used, e.g. “11.0”. “cudnn_version” Version of cudnn used, e.g. “cudnn8-devel”. “tensorflow_version”: Version of tensorflow, e.g. “2.3.0”. Only applicable if framework_type is tensorflow

load_predict_fn_module_path: A python module path for a function that, when called with the output of

load_model_fn_module_path, returns a function that carries out inference.

load_model_fn_module_path: A python module path for a function that returns a model. The output feeds into

the function located at load_predict_fn_module_path.

app_config: Either a Dictionary that represents a YAML file contents or a local path to a YAML file. pytorch_image_tag: Only for launch v1, and if you want to use pytorch framework type.

The tag of the pytorch docker image you want to use, e.g. 1.11.0-cuda11.3-cudnn8-runtime

tensorflow_version: Only for launch v1, and if you want to use tensorflow. Version of tensorflow, e.g. “2.3.0”.

create_model(name, reference_id, metadata=None, bundle_name=None, tags=None, trained_slice_ids=None)

Adds a Model to Nucleus.

Parameters:
  • name (str) – A human-readable name for the model.

  • reference_id (str) – Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme.

  • metadata (Optional[Dict]) – An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model.

  • bundle_name (Optional[str]) – Optional name of bundle attached to this model

  • tags (Optional[List[str]]) – Optional list of tags to attach to this model

  • trained_slice_ids (Optional[List[str]])

Returns:

The newly created model as an object.

Return type:

Model

delete_autotag(autotag_id)

Deletes an autotag by ID.

Parameters:

autotag_id (str) – Nucleus-generated autotag ID (starts with tag_). This can be retrieved via list_autotags() or a Nucleus dashboard URL.

Returns:

Empty payload response.

Return type:

dict

delete_dataset(dataset_id)

Deletes a dataset by ID.

All items, annotations, and predictions associated with the dataset will be deleted as well. Note that if this dataset is linked to a Scale or Rapid labeling project, the project itself will not be deleted.

Parameters:

dataset_id (str) – The ID of the dataset to delete.

Returns:

Payload to indicate deletion invocation.

Return type:

dict

delete_model(model_id)

Deletes a model by ID.

Parameters:

model_id (str) – Nucleus-generated model ID (starts with prj_). This can be retrieved via list_models() or a Nucleus dashboard URL.

Returns:

Empty payload response.

Return type:

dict

delete_slice(slice_id)

Deletes slice from Nucleus.

Parameters:

slice_id (str) – Nucleus-generated dataset ID (starts with slc_). This can be retrieved via Dataset.slices() or a Nucleus dashboard URL.

Returns:

Empty payload response.

Return type:

dict

download_pointcloud_task(task_id, frame_num)

Download the lidar point cloud data for a give task and frame number.

Parameters:
  • task_id (str) – download point cloud for this particular task

  • frame_num (int) – download point cloud for this particular frame

Returns:

List of Point3D objects

Return type:

List[Union[annotation.Point3D, annotation.LidarPoint]]

download_pointcloud_tasks(task_ids, frame_num)

Download the lidar point cloud data for a given set of tasks and frame number.

Parameters:
  • task_ids (List[str]) – list of task ids to fetch data from

  • frame_num (int) – download point cloud for this particular frame

Returns:

A dictionary from task_id to list of Point3D objects

Return type:

Dict[str, List[Union[annotation.Point3D, annotation.LidarPoint]]]

get_autotag_refinement_metrics(autotag_id)

Retrieves refinement metrics for an autotag by ID.

Parameters:

autotag_id (str) – Nucleus-generated autotag ID (starts with tag_). This can be retrieved via list_autotags() or a Nucleus dashboard URL.

Returns:

Response payload:

{
    "total_refinement_steps": int
    "average_positives_selected_per_refinement": int
    "average_ms_taken_in_refinement": float
}

Return type:

dict

get_dataset(dataset_id)

Fetches a dataset by its ID.

Parameters:

dataset_id (str) – The ID of the dataset to fetch.

Returns:

The Nucleus dataset as an object.

Return type:

Dataset

get_job(job_id)

Fetches a dataset by its ID.

Parameters:

job_id (str) – The ID of the dataset to fetch.

Returns:

The Nucleus async job as an object.

Return type:

AsyncJob

get_model(model_id=None, model_run_id=None)

Fetches a model by its ID.

Parameters:
  • model_id (Optional[str]) – You can pass either a model ID (starts with prj_) or a model run ID (starts with run_) This can be retrieved via list_models() or a Nucleus dashboard URL. Model run IDs result from the application of a model to a dataset.

  • model_run_id (Optional[str]) –

    You can pass either a model ID (starts with prj_), or a model run ID (starts with run_) This can be retrieved via list_models() or a Nucleus dashboard URL. Model run IDs result from the application of a model to a dataset.

    In the future, we plan to hide model_run_ids fully from users.

Returns:

The Nucleus model as an object.

Return type:

Model

get_slice(slice_id)

Returns a slice object by Nucleus-generated ID.

Parameters:

slice_id (str) – Nucleus-generated dataset ID (starts with slc_). This can be retrieved via Dataset.slices() or a Nucleus dashboard URL.

Returns:

The Nucleus slice as an object.

Return type:

Slice

list_jobs(show_completed=False, from_date=None, to_date=None, job_types=None, limit=None, dataset_id=None, date_limit=None)

Fetches all of your running jobs in Nucleus.

Parameters:
  • job_types (Optional[List[job.CustomerJobTypes]]) – Filter on set of job types, if None, fetch all types

  • from_date (Optional[Union[str, datetime.datetime]]) – beginning of date range filter

  • to_date (Optional[Union[str, datetime.datetime]]) – end of date range filter

  • limit (Optional[int]) – number of results to fetch, max 50_000

  • show_completed (bool) – dont fetch jobs with Completed status

  • stats_only – return overview of jobs, instead of a list of job objects

  • dataset_id (Optional[str]) – filter on a particular dataset

  • date_limit (Optional[str]) –

    Deprecated, do not use

    Returns:

    List[AsyncJob]: List of running asynchronous jobs associated with the client API key.

Return type:

List[async_job.AsyncJob]

make_request(payload, route, requests_command=requests.post, return_raw_response=False)

Makes a request to a Nucleus API endpoint.

Logs a warning if not successful.

Parameters:
  • payload (Optional[dict]) – Given request payload.

  • route (str) – Route for the request.

  • command (Requests) – requests.post, requests.get, or requests.delete.

  • return_raw_response (bool) – return the request’s response object entirely

Returns:

Response payload as JSON dict or request object.

Return type:

Union[dict, Any]

static valid_dirname(dirname)

Validate directory exists :param dirname: Path of directory

Returns:

Existing directory path

Return type:

str

class nucleus.Point

A point in 2D space.

Parameters:
  • x (float) – The x coordinate of the point.

  • y (float) – The y coordinate of the point.

class nucleus.Point3D

A point in 3D space.

Parameters:
  • x (float) – The x coordinate of the point.

  • y (float) – The y coordinate of the point.

  • z (float) – The z coordinate of the point.

class nucleus.PolygonAnnotation

A polygon annotation consisting of an ordered list of 2D points.

from nucleus import PolygonAnnotation

polygon = PolygonAnnotation(
    label="bus",
    vertices=[Point(100, 100), Point(150, 200), Point(200, 100)],
    reference_id="image_2",
    annotation_id="image_2_bus_polygon_1",
    metadata={"vehicle_color": "yellow"},
    embedding_vector=[0.1423, 1.432, ..., 3.829],
    track_reference_id="school_bus",
)
Parameters:
  • label (str) – The label for this annotation.

  • vertices (List[Point]) – The list of points making up the polygon.

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • embedding_vector – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id – A unique string to identify the annotation as part of a group. For instance, multiple “car” annotations across several dataset items may have the same track_reference_id such as “red_car”.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.PolygonPrediction(label, vertices, reference_id, confidence=None, annotation_id=None, metadata=None, class_pdf=None, embedding_vector=None, track_reference_id=None)

Prediction of a polygon.

Parameters:
  • label (str) – The label for this annotation (e.g. car, pedestrian, bicycle).

  • vertices (List[nucleus.annotation.Point])

  • reference_id (str)

  • confidence (Optional[float])

  • annotation_id (Optional[str])

  • metadata (Optional[Dict])

  • class_pdf (Optional[Dict])

  • embedding_vector (Optional[list])

  • track_reference_id (Optional[str])

:param vertices List[Point]: The list of points making up the polygon. :param reference_id: User-defined ID of the image to which to apply this

annotation.

Parameters:
  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • annotation_id (Optional[str]) – The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

  • class_pdf (Optional[Dict]) – An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain.

  • embedding_vector (Optional[list]) – Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors.

  • track_reference_id (Optional[str]) – A unique string to identify the prediction as part of a group. For instance, multiple “car” predictions across several dataset items may have the same track_reference_id such as “red_car”.

  • label (str)

  • vertices (List[nucleus.annotation.Point])

  • reference_id (str)

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.Quaternion

Quaternion objects are used to represent rotation.

We use the Hamilton/right-handed quaternion convention, where

i^2 = j^2 = k^2 = ijk = -1

The quaternion represented by the tuple (x, y, z, w) is equal to w + x*i + y*j + z*k.

Parameters:
  • x (float) – The x value.

  • y (float) – The y value.

  • x – The z value.

  • w (float) – The w value.

classmethod from_json(payload)

Instantiates quaternion object from schematized JSON dict payload.

Parameters:

payload (Dict[str, float])

to_payload()

Serializes quaternion object to schematized JSON dict.

Return type:

dict

class nucleus.SceneCategoryAnnotation

A scene category annotation.

from nucleus import SceneCategoryAnnotation

category = SceneCategoryAnnotation(
    label="running",
    reference_id="scene_1",
    taxonomy_name="action",
    metadata={
        "weather": "clear",
    },
)
Parameters:
  • label (str) – The label for this annotation.

  • reference_id (str) – User-defined ID of the scene to which to apply this annotation.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • metadata

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.SceneCategoryPrediction(label, reference_id, taxonomy_name=None, confidence=None, metadata=None)

A prediction of a category for a scene.

from nucleus import SceneCategoryPrediction

category = SceneCategoryPrediction(
    label="running",
    reference_id="scene_1",
    taxonomy_name="action",
    confidence=0.83,
    metadata={
        "weather": "clear",
    },
)
Parameters:
  • label (str) – The label for this annotation (e.g. action, subject, scenario).

  • reference_id (str) – The reference ID of the scene you wish to apply this annotation to.

  • taxonomy_name (Optional[str]) – The name of the taxonomy this annotation conforms to. See Dataset.add_taxonomy().

  • confidence (Optional[float]) – 0-1 indicating the confidence of the prediction.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Returns True if annotation has local files that need to be uploaded.

Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.)

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.Segment

Segment represents either a class or an instance depending on the task type.

For semantic segmentation, this object should store the mapping between a single class index and the string label.

For instance segmentation, you can use this class to store the label of a single instance, whose extent in the image is represented by the value of index.

In both cases, additional metadata can be attached to the segment.

Parameters:
  • label (str) – The label name of the class for the class or instance represented by index in the associated mask.

  • index (int) – The integer pixel value in the mask this mapping refers to.

  • metadata (Optional[Dict]) –

    Arbitrary key/value dictionary of info to attach to this segment. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our metadata guide.

class nucleus.SegmentationAnnotation

A segmentation mask on a 2D image.

When uploading a mask annotation, Nucleus expects the mask file to be in PNG format with each pixel being a 0-255 uint8. Currently, Nucleus only supports uploading masks from URL.

Nucleus automatically enforces the constraint that each DatasetItem can have at most one ground truth segmentation mask. As a consequence, if during upload a duplicate mask is detected for a given image, by default it will be ignored. You can change this behavior by setting update = True, which will replace the existing segmentation mask with the new mask.

from nucleus import SegmentationAnnotation

segmentation = SegmentationAnnotation(
    mask_url="s3://your-bucket-name/segmentation-masks/image_2_mask_id_1.png",
    annotations=[
        Segment(label="grass", index="1"),
        Segment(label="road", index="2"),
        Segment(label="bus", index="3", metadata={"vehicle_color": "yellow"}),
        Segment(label="tree", index="4")
    ],
    reference_id="image_2",
    annotation_id="image_2_mask_1",
)
Parameters:
  • mask_url (str) –

    A URL pointing to the segmentation prediction mask which is accessible to Scale. This “URL” can also be a path to a local file. The mask is an HxW int8 array saved in PNG format, with each pixel value ranging from [0, N), where N is the number of possible classes (for semantic segmentation) or instances (for instance segmentation).

    The height and width of the mask must be the same as the original image. One example for semantic segmentation: the mask is 0 for pixels where there is background, 1 where there is a car, and 2 where there is a pedestrian.

    Another example for instance segmentation: the mask is 0 for one car, 1 for another car, 2 for a motorcycle and 3 for another motorcycle. The class name for each value in the mask is stored in the list of Segment objects passed for “annotations”

  • annotations (List[Segment]) – The list of mappings between the integer values contained in mask_url and string class labels. In the semantic segmentation example above these would map that 0 to background, 1 to car and 2 to pedestrian. In the instance segmentation example above, 0 and 1 would both be mapped to car, 2 and 3 would both be mapped to motorcycle

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – For segmentation annotations, this value is ignored because there can only be one segmentation annotation per dataset item. Therefore regardless of annotation ID, if there is an existing segmentation on a dataset item, it will be ignored unless update=True is passed to Dataset.annotate(), in which case it will be overwritten. Storing a custom ID here may be useful in order to tie this annotation to an external database, and its value will be returned for any export.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Check if the mask url is local and needs to be uploaded.

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.SegmentationPrediction

Predicted segmentation mask on a 2D image.

from nucleus import SegmentationPrediction

segmentation = SegmentationPrediction(
    mask_url="s3://your-bucket-name/pred-seg-masks/image_2_pred_mask_id_1.png",
    annotations=[
        Segment(label="grass", index="1"),
        Segment(label="road", index="2"),
        Segment(label="bus", index="3", metadata={"vehicle_color": "yellow"}),
        Segment(label="tree", index="4")
    ],
    reference_id="image_2",
    annotation_id="image_2_pred_mask_1",
)
Parameters:
  • mask_url (str) –

    A URL pointing to the segmentation prediction mask which is accessible to Scale. This “URL” can also be a path to a local file. The mask is an HxW int8 array saved in PNG format, with each pixel value ranging from [0, N), where N is the number of possible classes (for semantic segmentation) or instances (for instance segmentation).

    The height and width of the mask must be the same as the original image. One example for semantic segmentation: the mask is 0 for pixels where there is background, 1 where there is a car, and 2 where there is a pedestrian.

    Another example for instance segmentation: the mask is 0 for one car, 1 for another car, 2 for a motorcycle and 3 for another motorcycle. The class name for each value in the mask is stored in the list of Segment objects passed for “annotations”

  • annotations (List[Segment]) – The list of mappings between the integer values contained in mask_url and string class labels. In the semantic segmentation example above these would map that 0 to background, 1 to car and 2 to pedestrian. In the instance segmentation example above, 0 and 1 would both be mapped to car, 2 and 3 would both be mapped to motorcycle

  • reference_id (str) – User-defined ID of the image to which to apply this annotation.

  • annotation_id (Optional[str]) – For segmentation predictions, this value is ignored because there can only be one segmentation prediction per dataset item. Therefore regardless of annotation ID, if there is an existing segmentation on a dataset item, it will be ignored unless update=True is passed to Dataset.annotate(), in which case it will be overwritten. Storing a custom ID here may be useful in order to tie this annotation to an external database, and its value will be returned for any export.

classmethod from_json(payload)

Instantiates annotation object from schematized JSON dict payload.

Parameters:

payload (dict)

has_local_files_to_upload()

Check if the mask url is local and needs to be uploaded.

Return type:

bool

to_json()

Serializes annotation object to schematized JSON string.

Return type:

str

to_payload()

Serializes annotation object to schematized JSON dict.

Return type:

dict

class nucleus.Slice(slice_id, client)

A Slice represents a subset of DatasetItems in your Dataset.

Slices are subsets of your Dataset that unlock curation and exploration workflows. Instead of thinking of your Datasets as collections of data, it is useful to think about them as a collection of Slices. For instance, your dataset may contain different weather scenarios, traffic conditions, or highway types.

Perhaps your Models perform poorly on foggy weather scenarios; it is then useful to slice your dataset into a “foggy” slice, and fine-tune model performance on this slice until it reaches the performance you desire.

Slices cannot be instantiated directly and instead must be created in the dashboard, or via API endpoint using Dataset.create_slice().

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")

ref_ids = ["interesting_item_1", "interesting_item_2"]
slice = dataset.create_slice(name="interesting", reference_ids=ref_ids)
Parameters:

slice_id (str)

add_tags(tags)

Tag a slice with custom tag names.

import nucleus client = nucleus.NucleusClient(“YOUR_SCALE_API_KEY”) slc = client.get_slice(“YOUR_SLICE_ID”)

slc.add_tags([“tag_1”, “tag_2”])

Parameters:

tags (List[str]) – list of tag names

Return type:

dict

append(reference_ids=None)

Appends existing DatasetItems from a Dataset to a Slice.

The endpoint expects a list of DatasetItem reference IDs which are set at upload time. The length of reference_ids cannot exceed 10,000 items per request.

Parameters:

reference_ids (Optional[List[str]]) – List of user-defined reference IDs of dataset items or scenes to append to the slice.

Returns:

Dict of the slice_id and the newly appended IDs.

{
    "slice_id": str,
    "new_items": List[str]
}

Raises:

BadRequest – If length of reference_ids is too large (> 10,000 items)

Return type:

dict

dataset_items()

Fetch all DatasetItems contained in the Slice.

We recommend using Slice.items_generator() if the Slice has more than 200k items.

Returns: list of DatasetItem objects

export_embeddings(asynchronous=True)

Fetches a pd.DataFrame-ready list of slice embeddings.

Parameters:

asynchronous (bool) – Whether or not to process the export asynchronously (and return an EmbeddingsExportJob object). Default is True.

Returns:

If synchronous, a list where each element is a columnar mapping:

List[{
    "reference_id": str,
    "embedding_vector": List[float]
}]

Otherwise, returns an EmbeddingsExportJob object.

Return type:

Union[List[Dict[str, Union[str, List[float]]]], nucleus.async_job.EmbeddingsExportJob]

export_predictions(model)

Provides a list of all DatasetItems and Predictions in the Slice for the given Model.

Parameters:

model (Model) – the nucleus model objects representing the model for which to export predictions.

Returns:

List where each element is a dict containing the DatasetItem and all of its associated Predictions, grouped by type (e.g. box).

List[{
    "item": DatasetItem,
    "predictions": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
    }
}]

Return type:

List[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

export_predictions_generator(model)

Provides a list of all DatasetItems and Predictions in the Slice for the given Model.

Parameters:

model (Model) – the nucleus model objects representing the model for which to export predictions.

Returns:

Iterable where each element is a dict containing the DatasetItem and all of its associated Predictions, grouped by type (e.g. box).

List[{
    "item": DatasetItem,
    "predictions": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
    }
}]

Return type:

Iterable[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

export_raw_items()

Fetches a list of accessible URLs for each item in the Slice.

Returns:

List where each element is a dict containing a DatasetItem and its accessible (signed) Scale URL.

List[{
    "id": str,
    "ref_id": str,
    "metadata": Dict[str, Union[str, int]],
    "original_url": str,
    "scale_url": str
}]

Return type:

List[Dict[str, str]]

export_raw_json()

Exports object slices in a raw JSON format. Note that it currently does not support item-level slices.

For each object or match in an object slice, this method exports the following information: - The item that contains the object. - The prediction and/or annotation (both, if the slice is based on IOU matches). - If the object is part of a scene, it includes scene-level attributes in the export.

Returns:

An iterable where each element is a dictionary containing JSON-formatted data.

List[{
    "item": DatasetItem (as JSON),
    "annotation": BoxAnnotation/CuboidAnnotation (as JSON)
    "prediction": BoxPrediction/CuboidPrediction (as JSON)
    "scene": Scene (as JSON)
    }
}]

Return type:

List[Union[nucleus.dataset_item.DatasetItem, nucleus.annotation.Annotation, nucleus.prediction.Prediction, nucleus.scene.Scene]]

export_scale_task_info()

Fetches info for all linked Scale tasks of items/scenes in the slice.

Returns:

A list of dicts, each with two keys, respectively mapping to items/scenes and info on their corresponding Scale tasks within the dataset:

List[{
    "item" | "scene": Union[DatasetItem, Scene],
    "scale_task_info": {
        "task_id": str,
        "task_status": str,
        "task_audit_status": str,
        "task_audit_review_comment": Optional[str],
        "project_name": str,
        "batch": str,
        "created_at": str,
        "completed_at": Optional[str]
    }]
}]

info()

Retrieves the name, slice_id, and dataset_id of the Slice.

Returns:

A dict mapping keys to the corresponding info retrieved.

{
    "name": Union[str, int],
    "slice_id": str,
    "dataset_id": str,
    "type": str
    "pending_job_count": int
    "created_at": datetime
    "description": Union[str, None]
    "tags":
}

Return type:

dict

items_and_annotation_generator()

Provides a generator of all DatasetItems and Annotations in the slice.

Returns:

Generator where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type (e.g. box).

Iterable[{
    "item": DatasetItem,
    "annotations": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "line": List[LineAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
        "keypoints": List[KeypointsAnnotation],
    }
}]

Return type:

Iterable[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_and_annotations()

Provides a list of all DatasetItems and Annotations in the Slice.

Returns:

List where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type (e.g. box).

List[{
    "item": DatasetItem,
    "annotations": {
        "box": List[BoxAnnotation],
        "polygon": List[PolygonAnnotation],
        "cuboid": List[CuboidAnnotation],
        "line": List[LineAnnotation],
        "segmentation": List[SegmentationAnnotation],
        "category": List[CategoryAnnotation],
        "keypoints": List[KeypointsAnnotation],
    }
}]

Return type:

List[Dict[str, Union[nucleus.dataset_item.DatasetItem, Dict[str, List[nucleus.annotation.Annotation]]]]]

items_generator(page_size=100000)

Generator yielding all dataset items in the dataset.

collected_ref_ids = []
for item in dataset.items_generator():
    print(f"Exporting item: {item.reference_id}")
    collected_ref_ids.append(item.reference_id)
Parameters:

page_size (int, optional) – Number of items to return per page. If you are experiencing timeouts while using this generator, you can try lowering the page size.

Yields:

an iterable of DatasetItem objects.

send_to_labeling(project_id)

Send items in the Slice as tasks to a Scale labeling project.

This endpoint submits the items of the Slice as tasks to a pre-existing Scale Annotation project uniquely identified by projectId. Only projects of type General Image Annotation are currently supported. Additionally, in order for task submission to succeed, the project must have task instructions and geometries configured as project-level parameters. In order to create a project or set project parameters, you must use the Scale Annotation API, which is documented here: Scale Annotation API Documentation. When the newly created annotation tasks are annotated, the annotations will be automatically reflected in the Nucleus platform.

For self-serve projects, user can choose to submit the slice as a calibration batch, which is recommended for brand new labeling projects. For more information about calibration batches, please reference Overview of Self Serve Workflow. Note: A batch can be either a calibration batch or a self label batch, but not both.

Note: Nucleus only supports bounding box, polygon, and line annotations. If the project parameters specify any other geometries (ellipses or points), those objects will be annotated, but they will not be reflected in Nucleus.

Parameters:

project_id (str) – Scale-defined ID of the target annotation project.

class nucleus.VideoScene

Video or sequence of images over time.

Nucleus video datasets are comprised of VideoScenes. These can be comprised of a single video, or a sequence of DatasetItems which are equivalent to frames.

VideoScenes are uploaded to a Dataset with any accompanying metadata. Each of DatasetItems representing a frame also accepts metadata.

Note: Updates with different items will error out (only on scenes that now differ). Existing video are expected to retain the same frames, and only metadata can be updated. If a video definition is changed (for example, additional frames added) the update operation will be ignored. If you would like to alter the structure of a video scene, please delete the scene and re-upload.

Parameters:
  • reference_id (str) – User-specified identifier to reference the scene.

  • frame_rate (Optional[int]) – Required if uploading items. Frame rate of the video.

  • video_location (Optional[str]) – Required if not uploading items. The remote URL containing the video MP4. Remote formats supported include any URL (http:// or https://) or URIs for AWS S3, Azure, or GCS (i.e. s3://, gcs://).

  • items (Optional[List[DatasetItem]]) – Required if not uploading video_location. List of items representing frames, to be a part of the scene. A scene can be created before items have been added to it, but must be non-empty when uploading to a Dataset. A video scene can contain a maximum of 3000 items.

  • metadata (Optional[Dict]) –

    Optional metadata to include with the scene.

    Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as { “lat”: 52.5, “lon”: 13.3, … }.

    Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying { “context_attachments”: [ { “attachment”: ‘https://example.com/1’ }, { “attachment”: ‘https://example.com/2’ }, … ] }.

Refer to our guide to uploading video data for more info!

add_item(item, index=None, update=False)

Adds DatasetItem to the specified index for videos uploaded as an array of images.

Parameters:
  • item (DatasetItem) – Video item to add.

  • index (Optional[int]) – Serial index at which to add the item.

  • update (bool) – Whether to overwrite the item at the specified index, if it exists. Default is False.

Return type:

None

classmethod from_json(payload, client=None)

Instantiates scene object from schematized JSON dict payload.

Parameters:
get_item(index)

Fetches the DatasetItem at the specified index for videos uploaded as an array of images.

Parameters:

index (int) – Serial index for which to retrieve the DatasetItem.

Returns:

DatasetItem at the specified index.

Return type:

DatasetItem

get_items()

Fetches a sorted list of DatasetItems of the scene for videos uploaded as an array of images.

Returns:

List of DatasetItems, sorted by index ascending.

Return type:

List[DatasetItem]

info()

Fetches information about the video scene.

Returns:

Payload containing:

{
    "reference_id": str,
    "length": Optional[int],
    "frame_rate": int,
    "video_url": Optional[str],
}

to_json()

Serializes scene object to schematized JSON string.

Return type:

str

to_payload()

Serializes scene object to schematized JSON dict.

Return type:

dict