nucleus ============= .. py:module:: nucleus .. autoapi-nested-parse:: Nucleus Python SDK. .. autoapisummary:: nucleus.AsyncJob nucleus.BoxAnnotation nucleus.BoxPrediction nucleus.CameraParams nucleus.CategoryAnnotation nucleus.CategoryPrediction nucleus.CuboidAnnotation nucleus.CuboidPrediction nucleus.Dataset nucleus.DatasetInfo nucleus.DatasetItem nucleus.EmbeddingsExportJob nucleus.Frame nucleus.Keypoint nucleus.KeypointsAnnotation nucleus.KeypointsPrediction nucleus.LidarPoint nucleus.LidarScene nucleus.LineAnnotation nucleus.LinePrediction nucleus.Model nucleus.NucleusClient nucleus.Point nucleus.Point3D nucleus.PolygonAnnotation nucleus.PolygonPrediction nucleus.Quaternion nucleus.SceneCategoryAnnotation nucleus.SceneCategoryPrediction nucleus.Segment nucleus.SegmentationAnnotation nucleus.SegmentationPrediction nucleus.Slice nucleus.VideoScene .. py:class:: AsyncJob Object used to check the status or errors of a long running asynchronous operation. :: import nucleus client = nucleus.NucleusClient(YOUR_SCALE_API_KEY) dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg") # When kicking off an asynchronous job, store the return value as a variable job = dataset.append(items=YOUR_DATASET_ITEMS, asynchronous=True) # Poll for status or errors print(job.status()) print(job.errors()) # Block until job finishes job.sleep_until_complete() .. py:method:: errors() Fetches a list of the latest errors generated by the asynchronous job. Useful for debugging failed or partially successful jobs. :returns: A list of strings containing the 10,000 most recently generated errors. :: [ '{"annotation":{"label":"car","type":"box","geometry":{"x":50,"y":60,"width":70,"height":80},"referenceId":"bad_ref_id","annotationId":"attempted_annot_upload","metadata":{}},"error":"Item with id bad_ref_id does not exist."}' ] .. py:method:: from_id(job_id, client) :classmethod: Creates a job instance from a specific job Id. :param job_id: Defines the job Id :param client: The client to use for the request. :returns: The specific AsyncMethod (or inherited) instance. .. py:method:: sleep_until_complete(verbose_std_out=True) Blocks until the job completes or errors. :param verbose_std_out: Whether or not to verbosely log while sleeping. Defaults to True. :type verbose_std_out: Optional[bool] .. py:method:: status() Fetches status of the job and an informative message on job progress. :returns: A dict of the job ID, status (one of Running, Completed, or Errored), an informative message on the job progress, and number of both completed and total steps. :: { "job_id": "job_c19xcf9mkws46gah0000", "status": "Completed", "message": "Job completed successfully.", "job_progress": "0.33", "completed_steps": "1", "total_steps:": "3", } .. py:class:: BoxAnnotation A bounding box annotation. :: from nucleus import BoxAnnotation box = BoxAnnotation( label="car", x=0, y=0, width=10, height=10, reference_id="image_1", annotation_id="image_1_car_box_1", metadata={"vehicle_color": "red"}, embedding_vector=[0.1423, 1.432, ..., 3.829], track_reference_id="car_a", ) :param label: The label for this annotation. :type label: str :param x: The distance, in pixels, between the left border of the bounding box and the left border of the image. :type x: Union[float, int] :param y: The distance, in pixels, between the top border of the bounding box and the top border of the image. :type y: Union[float, int] :param width: The width in pixels of the annotation. :type width: Union[float, int] :param height: The height in pixels of the annotation. :type height: Union[float, int] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and overwritten if update=True for dataset.annotate. If no annotation ID is passed, one will be automatically generated using the label, x, y, width, and height, so that you can make inserts idempotently as identical boxes will be ignored. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as `{ "lat": 52.5, "lon": 13.3, ... }`. :type metadata: Optional[Dict] :param embedding_vector: Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors. :param track_reference_id: A unique string to identify the annotation as part of a group. For instance, multiple "car" annotations across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: BoxPrediction(label, x, y, width, height, reference_id, confidence = None, annotation_id = None, metadata = None, class_pdf = None, embedding_vector = None, track_reference_id = None) Prediction of a bounding box. :param label: The label for this annotation (e.g. car, pedestrian, bicycle) :type label: str :param x: The distance, in pixels, between the left border of the bounding box and the left border of the image. :type x: Union[float, int] :param y: The distance, in pixels, between the top border of the bounding box and the top border of the image. :type y: Union[float, int] :param width: The width in pixels of the annotation. :type width: Union[float, int] :param height: The height in pixels of the annotation. :type height: Union[float, int] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param confidence: 0-1 indicating the confidence of the prediction. :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. If no annotation ID is passed, one will be automatically generated using the label, x, y, width, and height, so that you can make inserts idempotently and identical boxes will be ignored. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as `{ "lat": 52.5, "lon": 13.3, ... }`. :type metadata: Optional[Dict] :param class_pdf: An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain. :param embedding_vector: Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors. :type embedding_vector: Optional[List] :param track_reference_id: A unique string to identify the prediction as part of a group. For instance, multiple "car" predictions across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: CameraParams Camera position/heading used to record the image. :param position: World-normalized position of the camera :type position: :class:`Point3D` :param heading: Vector4 indicating the quaternion of the camera direction; note that the z-axis of the camera frame represents the camera's optical axis. See `Heading Examples `_. :type heading: :class:`Quaternion` :param fx: Focal length in x direction (in pixels). :type fx: float :param fy: Focal length in y direction (in pixels). :type fy: float :param cx: Principal point x value. :type cx: float :param cy: Principal point y value. :type cy: float .. py:method:: from_json(payload) :classmethod: Instantiates camera params object from schematized JSON dict payload. .. py:method:: to_payload() Serializes camera params object to schematized JSON dict. .. py:class:: CategoryAnnotation A category annotation. :: from nucleus import CategoryAnnotation category = CategoryAnnotation( label="dress", reference_id="image_1", taxonomy_name="clothing_type", metadata={"dress_color": "navy"}, track_reference_id="blue_and_black_dress", ) :param label: The label for this annotation. :type label: str :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param taxonomy_name: The name of the taxonomy this annotation conforms to. See :meth:`Dataset.add_taxonomy`. :type taxonomy_name: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] :param track_reference_id: A unique string to identify the annotation as part of a group. For instance, multiple "car" annotations across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: CategoryPrediction(label, reference_id, taxonomy_name = None, confidence = None, metadata = None, class_pdf = None, track_reference_id = None) A prediction of a category. :param label: The label for this annotation (e.g. car, pedestrian, bicycle). :param reference_id: The reference ID of the image you wish to apply this annotation to. :param taxonomy_name: The name of the taxonomy this annotation conforms to. See :meth:`Dataset.add_taxonomy`. :param confidence: 0-1 indicating the confidence of the prediction. :param class_pdf: An optional complete class probability distribution on this prediction. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain. :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :param track_reference_id: A unique string to identify the prediction as part of a group. For instance, multiple "car" predictions across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: CuboidAnnotation A 3D Cuboid annotation. :: from nucleus import CuboidAnnotation cuboid = CuboidAnnotation( label="car", position=Point3D(100, 100, 10), dimensions=Point3D(5, 10, 5), yaw=0, reference_id="pointcloud_1", annotation_id="pointcloud_1_car_cuboid_1", metadata={"vehicle_color": "green"}, track_reference_id="red_car", ) :param label: The label for this annotation. :type label: str :param position: The point at the center of the cuboid :type position: :class:`Point3D` :param dimensions: The length (x), width (y), and height (z) of the cuboid :type dimensions: :class:`Point3D` :param yaw: The rotation, in radians, about the Z axis of the cuboid :type yaw: float :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[str] :param track_reference_id: A unique string to identify the annotation as part of a group. For instance, multiple "car" annotations across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: CuboidPrediction(label, position, dimensions, yaw, reference_id, confidence = None, annotation_id = None, metadata = None, class_pdf = None, track_reference_id = None) A prediction of 3D cuboid. :param label: The label for this annotation (e.g. car, pedestrian, bicycle) :type label: str :param position: The point at the center of the cuboid :type position: :class:`Point3D` :param dimensions: The length (x), width (y), and height (z) of the cuboid :type dimensions: :class:`Point3D` :param yaw: The rotation, in radians, about the Z axis of the cuboid :type yaw: float :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param confidence: 0-1 indicating the confidence of the prediction. :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[str] :param class_pdf: An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain. :param track_reference_id: A unique string to identify the prediction as part of a group. For instance, multiple "car" predictions across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: Dataset(dataset_id, client, name=None, is_scene=None, use_privacy_mode=None) Datasets are collections of your data that can be associated with models. You can append :class:`DatasetItems` or :class:`Scenes` with metadata to your dataset, annotate it with ground truth, and upload model predictions to evaluate and compare model performance on your data. Make sure that the dataset is set up correctly supporting the required datatype (see code sample below). Datasets cannot be instantiated directly and instead must be created via API endpoint using :meth:`NucleusClient.create_dataset`, or in the dashboard. :: import nucleus client = nucleus.NucleusClient(YOUR_SCALE_API_KEY) # Create new dataset supporting DatasetItems dataset = client.create_dataset(YOUR_DATASET_NAME, is_scene=False) # OR create new dataset supporting LidarScenes dataset = client.create_dataset(YOUR_DATASET_NAME, is_scene=True) # Or, retrieve existing dataset by ID # This ID can be fetched using client.list_datasets() or from a dashboard URL existing_dataset = client.get_dataset("YOUR_DATASET_ID") .. py:method:: add_items_from_dir(dirname = None, existing_dirname = None, privacy_mode_proxy = '', allowed_file_types = ('png', 'jpg', 'jpeg'), skip_size_warning = False, update_items = False) Update dataset by recursively crawling through a directory. A DatasetItem will be created for each unique image found. The existing items are skipped or updated depending on update_items param :param dirname: Where to look for image files, recursively :param existing_dirname: Already validated dirname :param privacy_mode_proxy: Endpoint that serves image files for privacy mode, ignore if not using privacy mode. The proxy should work based on the relative path of the images in the directory. :param allowed_file_types: Which file type extensions to search for, ie: ('jpg', 'png') :param skip_size_warning: If False, it will throw an error if the script globs more than 500 images. This is a safety check in case the dirname has a typo, and grabs too much data. :param update_items: Whether to update items in existing dataset .. py:method:: add_taxonomy(taxonomy_name, taxonomy_type, labels, update = False) Creates a new taxonomy. At the moment we only support taxonomies for category annotations and predictions. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg") response = dataset.add_taxonomy( taxonomy_name="clothing_type", taxonomy_type="category", labels=["shirt", "trousers", "dress"], update=False ) :param taxonomy_name: The name of the taxonomy. Taxonomy names must be unique within a dataset. :param taxonomy_type: The type of this taxonomy as a string literal. Currently, the only supported taxonomy type is "category." :param labels: The list of possible labels for the taxonomy. :param update: Whether or not to update taxonomy labels on taxonomy name collision. Default is False. Note that taxonomy labels will not be deleted on update, they can only be appended. :returns: Returns a response with dataset_id, taxonomy_name, and status of the add taxonomy operation. :: { "dataset_id": str, "taxonomy_name": str, "status": "Taxonomy created" } .. py:method:: annotate(annotations, update = DEFAULT_ANNOTATION_UPDATE_MODE, batch_size = 5000, asynchronous = False, remote_files_per_upload_request = 20, local_files_per_upload_request = 10) Uploads ground truth annotations to the dataset. Adding ground truth to your dataset in Nucleus allows you to visualize annotations, query dataset items based on the annotations they contain, and evaluate models by comparing their predictions to ground truth. Nucleus supports :class:`Box`, :class:`Polygon`, :class:`Cuboid`, :class:`Segmentation`, :class:`Category`, and :class:`Category` annotations. Cuboid annotations can only be uploaded to a :class:`pointcloud DatasetItem`. When uploading an annotation, you need to specify which item you are annotating via the reference_id you provided when uploading the image or pointcloud. Ground truth uploads can be made idempotent by specifying an optional annotation_id for each annotation. This id should be unique within the dataset_item so that (reference_id, annotation_id) is unique within the dataset. See :class:`SegmentationAnnotation` for specific requirements to upload segmentation annotations. For ingesting large annotation payloads, see the `Guide for Large Ingestions `_. :param annotations: List of annotation objects to upload. :type annotations: Sequence[:class:`Annotation`] :param update: Whether to ignore or overwrite metadata for conflicting annotations. :param batch_size: Number of annotations processed in each concurrent batch. Default is 5000. If you get timeouts when uploading geometric annotations, you can try lowering this batch size. :param asynchronous: Whether or not to process the upload asynchronously (and return an :class:`AsyncJob` object). Default is False. :param remote_files_per_upload_request: Number of remote files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with remote urls, you should lower this value from its default of 20. :param local_files_per_upload_request: Number of local files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with local files, you should lower this value from its default of 10. The maximum is 10. :returns: If synchronous, payload describing the upload result:: { "dataset_id": str, "annotations_processed": int } Otherwise, returns an :class:`AsyncJob` object. .. py:method:: append(items, update = False, batch_size = 20, asynchronous = False, local_files_per_upload_request = 10) Appends items or scenes to a dataset. .. note:: Datasets can only accept one of DatasetItems or Scenes, never both. This behavior is set during Dataset :meth:`creation ` with the ``is_scene`` flag. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") dataset = client.get_dataset("YOUR_DATASET_ID") local_item = nucleus.DatasetItem( image_location="./1.jpg", reference_id="image_1", metadata={"key": "value"} ) remote_item = nucleus.DatasetItem( image_location="s3://your-bucket/2.jpg", reference_id="image_2", metadata={"key": "value"} ) # default is synchronous upload sync_response = dataset.append(items=[local_item]) # async jobs have higher throughput but can be more difficult to debug async_job = dataset.append( items=[remote_item], # all items must be remote for async asynchronous=True ) print(async_job.status()) A :class:`Dataset` can be populated with labeled and unlabeled data. Using Nucleus, you can filter down the data inside your dataset using custom metadata about your images. For instance, your local dataset may contain ``Sunny``, ``Foggy``, and ``Rainy`` folders of images. All of these images can be uploaded into a single Nucleus ``Dataset``, with (queryable) metadata like ``{"weather": "Sunny"}``. To update an item's metadata, you can re-ingest the same items with the ``update`` argument set to true. Existing metadata will be overwritten for ``DatasetItems`` in the payload that share a ``reference_id`` with a previously uploaded ``DatasetItem``. To retrieve your existing ``reference_ids``, use :meth:`Dataset.items`. :: # overwrite metadata by reuploading the item remote_item.metadata["weather"] = "Sunny" async_job_2 = dataset.append( items=[remote_item], update=True, asynchronous=True ) :param items: ( Union[ Sequence[:class:`DatasetItem`], Sequence[:class:`LidarScene`] Sequence[:class:`VideoScene`] ]): List of items or scenes to upload. :param batch_size: Size of the batch for larger uploads. Default is 20. This is for items that have a remote URL and do not require a local upload. If you get timeouts for uploading remote urls, try decreasing this. :param update: Whether or not to overwrite metadata on reference ID collision. Default is False. :param asynchronous: Whether or not to process the upload asynchronously (and return an :class:`AsyncJob` object). This is required when uploading scenes. Default is False. :param files_per_upload_request: Optional; default is 10. We recommend lowering this if you encounter timeouts. :param local_files_per_upload_request: Optional; default is 10. We recommend lowering this if you encounter timeouts. :returns: For scenes If synchronous, returns a payload describing the upload result:: { "dataset_id: str, "new_items": int, "updated_items": int, "ignored_items": int, "upload_errors": int } Otherwise, returns an :class:`AsyncJob` object. For images If synchronous returns :class:`nucleus.upload_response.UploadResponse` otherwise :class:`AsyncJob` .. py:method:: autotag_items(autotag_name, for_scores_greater_than=0) Fetches the autotag's items above the score threshold, sorted by descending score. :param autotag_name: The user-defined name of the autotag. :param for_scores_greater_than: Score threshold between -1 and 1 above which to include autotag items. :type for_scores_greater_than: Optional[int] :returns: List of autotagged items above the given score threshold, sorted by descending score, and autotag info, packaged into a dict as follows:: { "autotagItems": List[{ ref_id: str, score: float, model_prediction_annotation_id: str | None ground_truth_annotation_id: str | None, }], "autotag": { id: str, name: str, status: "started" | "completed", autotag_level: "Image" | "Object" } } Note ``model_prediction_annotation_id`` and ``ground_truth_annotation_id`` are only relevant for object autotags. .. py:method:: autotag_training_items(autotag_name) Fetches items that were manually selected during refinement of the autotag. :param autotag_name: The user-defined name of the autotag. :returns: List of user-selected positives and autotag info, packaged into a dict as follows:: { "autotagPositiveTrainingItems": List[{ ref_id: str, model_prediction_annotation_id: str | None, ground_truth_annotation_id: str | None, }], "autotag": { id: str, name: str, status: "started" | "completed", autotag_level: "Image" | "Object" } } Note ``model_prediction_annotation_id`` and ``ground_truth_annotation_id`` are only relevant for object autotags. .. py:method:: build_slice(name, sample_size, sample_method, filters = None) Build a slice using Nucleus' Smart Sample tool. Allowing slices to be built based on certain criteria, and filters. :param name: Name for the slice being created. Must be unique per dataset. :param sample_size: Size of the slice to create. Capped by the size of the dataset and the applied filters. :param sample_method: How to sample the dataset, currently supports 'Random' and 'Uniqueness' :param filters: Apply filters to only sample from an existing slice or autotag .. rubric:: Examples from nucleus.slice import SliceBuilderFilters, SliceBuilderMethods, SliceBuilderFilterAutotag # random slice job = dataset.build_slice("RandomSlice", 20, SliceBuilderMethods.RANDOM) # slice with filters filters = SliceBuilderFilters( slice_id="", autotag=SliceBuilderFilterAutotag("tag_cd41jhjdqyti07h8m1n1", [-0.5, 0.5]) ) job = dataset.build_slice("NewSlice", 20, SliceBuilderMethods.RANDOM, filters) Returns: An async job .. py:method:: calculate_evaluation_metrics(model, options = None) Starts computation of evaluation metrics for a model on the dataset. To update matches and metrics calculated for a model on a given dataset you can call this endpoint. This is required in order to sort by IOU, view false positives/false negatives, and view model insights. You can add predictions from a model to a dataset after running the calculation of the metrics. However, the calculation of metrics will have to be retriggered for the new predictions to be matched with ground truth and appear as false positives/negatives, or for the new predictions effect on metrics to be reflected in model run insights. During IoU calculation, bounding box Predictions are compared to GroundTruth using a greedy matching algorithm that matches prediction and ground truth boxes that have the highest ious first. By default the matching algorithm is class-agnostic: it will greedily create matches regardless of the class labels. The algorithm can be tuned to classify true positives between certain classes, but not others. This is useful if the labels in your ground truth do not match the exact strings of your model predictions, or if you want to associate multiple predictions with one ground truth label, or multiple ground truth labels with one prediction. To recompute metrics based on different matching, you can re-commit the run with new request parameters. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") dataset = client.get_dataset(dataset_id="YOUR_DATASET_ID") model = client.get_model(model_id="YOUR_MODEL_PRJ_ID") # Compute all evaluation metrics including IOU-based matching: dataset.calculate_evaluation_metrics(model) # Match car and bus bounding boxes (for IOU computation) # Otherwise enforce that class labels must match dataset.calculate_evaluation_metrics(model, options={ 'allowed_label_matches': [ { 'ground_truth_label': 'car', 'model_prediction_label': 'bus' }, { 'ground_truth_label': 'bus', 'model_prediction_label': 'car' } ] }) :param model: The model object for which to calculate metrics. :type model: :class:`Model` :param options: Dictionary of specific options to configure metrics calculation. class_agnostic Whether ground truth and prediction classes can differ when being matched for evaluation metrics. Default is True. allowed_label_matches Pairs of ground truth and prediction classes that should be considered matchable when computing metrics. If supplied, ``class_agnostic`` must be False. :: { "class_agnostic": bool, "allowed_label_matches": List[{ "ground_truth_label": str, "model_prediction_label": str }] } .. py:method:: create_custom_index(embeddings_urls, embedding_dim) Processes user-provided embeddings for the dataset to use with autotag and simsearch. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") dataset = client.get_dataset("YOUR_DATASET_ID") all_embeddings = { "reference_id_0": [0.1, 0.2, 0.3], "reference_id_1": [0.4, 0.5, 0.6], ... "reference_id_10000": [0.7, 0.8, 0.9] } # sharded and uploaded to s3 with the two below URLs embeddings_url_1 = "s3://dataset/embeddings_map_1.json" embeddings_url_2 = "s3://dataset/embeddings_map_2.json" response = dataset.create_custom_index( embeddings_url=[embeddings_url_1, embeddings_url_2], embedding_dim=3 ) :param embeddings_urls: List of URLs, each of which pointing to a JSON mapping reference_id -> embedding vector. Each embedding JSON must contain <5000 rows. :param embedding_dim: The dimension of the embedding vectors. Must be consistent across all embedding vectors in the index. :returns: Asynchronous job object to track processing status. :rtype: :class:`AsyncJob` .. py:method:: create_image_index() Creates or updates image index by generating embeddings for images that do not already have embeddings. The embeddings are used for autotag and similarity search. This endpoint is limited to index up to 2 million images at a time and the job will fail for payloads that exceed this limit. :returns: Asynchronous job object to track processing status. :rtype: :class:`AsyncJob` .. py:method:: create_object_index(model_run_id = None, gt_only = None) Creates or updates object index by generating embeddings for objects that do not already have embeddings. These embeddings are used for autotag and similarity search. This endpoint only supports indexing objects sourced from the predictions of a specific model or the ground truth annotations of the dataset. This endpoint is idempotent. If this endpoint is called again for a model whose predictions were indexed in the past, the previously indexed predictions will not have new embeddings recomputed. The same is true for ground truth annotations. Note that this means if you change update a prediction or ground truth bounding box that already has an associated embedding, the embedding will not be updated, even with another call to this endpoint. For now, we recommend deleting the prediction or ground truth annotation and re-inserting it to force generate a new embedding. This endpoint is limited to generating embeddings for 3 million objects at a time and the job will fail for payloads that exceed this limit. :param model_run_id: The ID of the model whose predictions should be indexed. Default is None, but must be supplied in the absence of ``gt_only``. .. todo :: Deprecate model run :param gt_only: Whether to only generate embeddings for the ground truth annotations of the dataset. Default is None, but must be supplied in the absence of ``model_run_id``. :returns: Asynchronous job object to track processing status. :rtype: :class:`AsyncJob` .. py:method:: create_slice(name, reference_ids = None) Creates a :class:`Slice` of dataset items within a dataset. :param name: A human-readable name for the slice. :param reference_ids: List of reference IDs of dataset items to add to the slice, cannot exceed 10,000 items. Can be left unspecified, and an empty slice will be created. :returns: The newly constructed slice item. :rtype: :class:`Slice` :raises BadRequest: If length of reference_ids is too large (> 10,000 items) .. py:method:: create_slice_by_ids(name, dataset_item_ids = None, scene_ids = None, annotation_ids = None, prediction_ids = None) Creates a :class:`Slice` of dataset items, scenes, annotations, or predictions within a dataset by their IDs. .. note:: Dataset item, scene, and object (annotation or prediction) IDs may not be mixed. However, when creating an object slice, both annotation and prediction IDs may be supplied. :param name: A human-readable name for the slice. :param dataset_item_ids: List of internal IDs of dataset items to add to the slice:: :param scene_ids: List of internal IDs of scenes to add to the slice:: :param annotation_ids: List of internal IDs of Annotations to add to the slice:: :param prediction_ids: List of internal IDs of Predictions to add to the slice:: :returns: The newly constructed slice item. :rtype: :class:`Slice` .. py:method:: delete_annotations(reference_ids = None, keep_history = True) Deletes all annotations associated with the specified item reference IDs. :param reference_ids: List of user-defined reference IDs of the dataset items from which to delete annotations. Defaults to an empty list. :param keep_history: Whether to preserve version history. We recommend skipping this parameter and using the default value of True. :returns: Empty payload response. :rtype: :class:`AsyncJob` .. py:method:: delete_custom_index(image = True) Deletes the custom index uploaded to the dataset. :returns: Payload containing information that can be used to track the job's status:: { "dataset_id": str, "job_id": str, "message": str } .. py:method:: delete_item(reference_id) Deletes an item from the dataset by item reference ID. All annotations and predictions associated with the item will be deleted as well. :param reference_id: The user-defined reference ID of the item to delete. :returns: Payload to indicate deletion invocation. .. py:method:: delete_scene(reference_id) Deletes a sene from the Dataset by scene reference ID. All items, annotations, and predictions associated with the scene will be deleted as well. :param reference_id: The user-defined reference ID of the item to delete. .. py:method:: delete_taxonomy(taxonomy_name) Deletes the given taxonomy. All annotations and predictions associated with the taxonomy will be deleted as well. :param taxonomy_name: The name of the taxonomy. :returns: Returns a response with dataset_id, taxonomy_name, and status of the delete taxonomy operation. :: { "dataset_id": str, "taxonomy_name": str, "status": "Taxonomy successfully deleted" } .. py:method:: delete_tracks(track_reference_ids) Deletes a list of tracks from the dataset, thereby unlinking their annotation and prediction instances. :param reference_ids: A list of reference IDs for tracks to delete. :type reference_ids: List[str] .. py:method:: export_embeddings(asynchronous = True) Fetches a pd.DataFrame-ready list of dataset embeddings. :param asynchronous: Whether or not to process the export asynchronously (and return an :class:`EmbeddingsExportJob` object). Default is True. :returns: If synchronous, a list where each item is a dict with two keys representing a row in the dataset:: List[{ "reference_id": str, "embedding_vector": List[float] }] Otherwise, returns an :class:`EmbeddingsExportJob` object. .. py:method:: export_predictions(model) Fetches all predictions of a model that were uploaded to the dataset. :param model: The model whose predictions to retrieve. :type model: :class:`Model` :returns: List of prediction objects from the model. :rtype: List[Union[ :class:`BoxPrediction`, :class:`PolygonPrediction`, :class:`CuboidPrediction`, :class:`SegmentationPrediction` :class:`CategoryPrediction`, :class:`KeypointsPrediction`, ]] .. py:method:: export_scale_task_info() Fetches info for all linked Scale tasks of items/scenes in the dataset. :returns: A list of dicts, each with two keys, respectively mapping to items/scenes and info on their corresponding Scale tasks within the dataset:: List[{ "item" | "scene": Union[:class:`DatasetItem`, :class:`Scene`], "scale_task_info": { "task_id": str, "task_status": str, "task_audit_status": str, "task_audit_review_comment": Optional[str], "project_name": str, "batch": str, "created_at": str, "completed_at": Optional[str] }[] }] .. py:method:: get_image_indexing_status() Gets the primary image index progress for the dataset. :returns: Response payload:: { "embedding_count": int "image_count": int "percent_indexed": float "additional_context": str } .. py:method:: get_object_indexing_status(model_run_id=None) Gets the primary object index progress of the dataset. If model_run_id is not specified, this endpoint will retrieve the indexing progress of the ground truth objects. :returns: Response payload:: { "embedding_count": int "object_count": int "percent_indexed": float "additional_context": str } .. py:method:: get_scene(reference_id) Fetches a single scene in the dataset by its reference ID. :param reference_id: The user-defined reference ID of the scene to fetch. :returns: A scene object containing frames, which in turn contain pointcloud or image items. :rtype: :class:`Scene` .. py:method:: get_scene_from_item_ref_id(item_reference_id) Given a dataset item reference ID, find the Scene it belongs to. .. py:method:: get_slices(name = None, slice_type = None) Get a list of slices from its name or underlying slice type. :param name: Name of the desired slice to look up. :param slice_type: Type of slice to look up. This can be one of ('dataset_item', 'object', 'scene') :raises NotFound if no slice(s) were found with the given criteria: :returns: The Nucleus slice as an object. :rtype: :class:`Slice` .. py:method:: ground_truth_loc(reference_id, annotation_id) Fetches a single ground truth annotation by ID. :param reference_id: User-defined reference ID of the dataset item associated with the ground truth annotation. :param annotation_id: User-defined ID of the ground truth annotation. :returns: Ground truth annotation object with the specified annotation ID. :rtype: Union[ :class:`BoxAnnotation`, :class:`LineAnnotation`, :class:`PolygonAnnotation`, :class:`KeypointsAnnotation`, :class:`CuboidAnnotation`, :class:`SegmentationAnnotation` :class:`CategoryAnnotation` ] .. py:method:: iloc(i) Fetches dataset item and associated annotations by absolute numerical index. :param i: Absolute numerical index of the dataset item within the dataset. :returns: Payload describing the dataset item and associated annotations:: { "item": DatasetItem "annotations": { "box": Optional[List[BoxAnnotation]], "cuboid": Optional[List[CuboidAnnotation]], "line": Optional[List[LineAnnotation]], "polygon": Optional[List[PolygonAnnotation]], "keypoints": Optional[List[KeypointsAnnotation]], "segmentation": Optional[List[SegmentationAnnotation]], "category": Optional[List[CategoryAnnotation]], } } .. py:method:: info() Fetches information about the dataset. :returns: Information about the dataset including its Scale-generated ID, name, length, associated Models, Slices, and more. :rtype: :class:`DatasetInfo` .. py:method:: ingest_tasks(task_ids) Ingest specific tasks from an existing Scale or Rapid project into the dataset. Note: if you would like to create a new Dataset from an exisiting Scale labeling project, use :meth:`NucleusClient.create_dataset_from_project`. For more info, see our `Ingest From Labeling Guide `_. :param task_ids: List of task IDs to ingest. :returns: Payload describing the asynchronous upload result:: { "ingested_tasks": int, "ignored_tasks": int, "pending_tasks": int } .. py:method:: items_and_annotation_chip_generator(chip_size, stride_size, cache_directory, query = None, num_processes = 0) Provides a generator of chips for all DatasetItems and BoxAnnotations in the dataset. A chip is an image created by tiling a source image. :param chip_size: The size of the image chip :param stride_size: The distance to move when creating the next image chip. When stride is equal to chip size, there will be no overlap. When stride is equal to half the chip size, there will be 50 percent overlap. :param cache_directory: The s3 or local directory to store the image and annotations of a chip. s3 directories must be in the format s3://s3-bucket/s3-key :param query: Structured query compatible with the `Nucleus query language `_. :param num_processes: The number of worker processes to use to chip and upload images. If unset, no parallel processing will occur. :returns: Generator where each element is a dict containing the location of the image chip (jpeg) and its annotations (json). :: Iterable[{ "image_location": str, "annotation_location": str }] .. py:method:: items_and_annotation_generator(query = None, use_mirrored_images = False) Provides a generator of all DatasetItems and Annotations in the dataset. :param query: Structured query compatible with the `Nucleus query language `_. :param use_mirrored_images: If True, returns the location of the mirrored image hosted in Scale S3. Useful when the original image is no longer available. :returns: Generator where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type. :: Iterable[{ "item": DatasetItem, "annotations": { "box": List[BoxAnnotation], "polygon": List[PolygonAnnotation], "cuboid": List[CuboidAnnotation], "line": Optional[List[LineAnnotation]], "segmentation": List[SegmentationAnnotation], "category": List[CategoryAnnotation], "keypoints": List[KeypointsAnnotation], } }] .. py:method:: items_and_annotations() Returns a list of all DatasetItems and Annotations in this dataset. :returns: A list of dicts, each with two keys representing a row in the dataset:: List[{ "item": DatasetItem, "annotations": { "box": Optional[List[BoxAnnotation]], "cuboid": Optional[List[CuboidAnnotation]], "line": Optional[List[LineAnnotation]], "polygon": Optional[List[PolygonAnnotation]], "segmentation": Optional[List[SegmentationAnnotation]], "category": Optional[List[CategoryAnnotation]], "keypoints": Optional[List[KeypointsAnnotation]], } }] .. py:method:: items_generator(page_size=100000) Generator yielding all dataset items in the dataset. :: collected_ref_ids = [] for item in dataset.items_generator(): print(f"Exporting item: {item.reference_id}") collected_ref_ids.append(item.reference_id) :param page_size: Number of items to return per page. If you are experiencing timeouts while using this generator, you can try lowering the page size. :type page_size: int, optional :Yields: :class:`DatasetItem` -- A single DatasetItem object. .. py:method:: jobs(job_types = None, from_date = None, to_date = None, limit = JOB_REQ_LIMIT, show_completed = False, stats_only = False) Fetch jobs pertaining to this particular dataset. :param job_types: Filter on set of job types, if None, fetch all types, ie: ['uploadDatasetItems'] :param from_date: beginning of date range, as a string 'YYYY-MM-DD' or datetime object. For example: '2021-11-05', parser.parse('Nov 5 2021'), or datetime(2021,11,5) :param to_date: end of date range :param limit: number of results to fetch, max 50_000 :param show_completed: dont fetch jobs with Completed status :param stats_only: return overview of jobs, instead of a list of job objects .. py:method:: list_autotags() Fetches all autotags of the dataset. :returns: List of autotag payloads:: List[{ "id": str, "name": str, "status": "completed" | "pending", "autotag_level": "Image" | "Object" }] .. py:method:: loc(dataset_item_id) Fetches a dataset item and associated annotations by Nucleus-generated ID. :param dataset_item_id: Nucleus-generated dataset item ID (starts with ``di_``). This can be retrieved via :meth:`Dataset.items` or a Nucleus dashboard URL. :returns: Payload containing the dataset item and associated annotations:: { "item": DatasetItem "annotations": { "box": Optional[List[BoxAnnotation]], "cuboid": Optional[List[CuboidAnnotation]], "line": Optional[List[LineAnnotation]], "polygon": Optional[List[PolygonAnnotation]], "keypoints": Optional[List[KeypointsAnnotation]], "segmentation": Optional[List[SegmentationAnnotation]], "category": Optional[List[CategoryAnnotation]], } } .. py:method:: prediction_loc(model, reference_id, annotation_id) Fetches a single ground truth annotation by id. :param model: Model object from which to fetch the prediction. :type model: :class:`Model` :param reference_id: User-defined reference ID of the dataset item associated with the model prediction. :type reference_id: str :param annotation_id: User-defined ID of the ground truth annotation. :type annotation_id: str :returns: Model prediction object with the specified annotation ID. :rtype: Union[ :class:`BoxPrediction`, :class:`PolygonPrediction`, :class:`CuboidPrediction`, :class:`SegmentationPrediction` :class:`CategoryPrediction` :class:`KeypointsPrediction` ] .. py:method:: predictions_iloc(model, index) Fetches all predictions of a dataset item by its absolute index. :param model: Model object from which to fetch the prediction. :type model: :class:`Model` :param index: Absolute index of the dataset item within the dataset. :type index: int :returns: Dictionary mapping prediction type to a list of such prediction objects from the given model:: { "box": List[BoxPrediction], "polygon": List[PolygonPrediction], "cuboid": List[CuboidPrediction], "segmentation": List[SegmentationPrediction], "category": List[CategoryPrediction], "keypoints": List[KeypointsPrediction], } :rtype: List[Union[ :class:`BoxPrediction`, :class:`PolygonPrediction`, :class:`CuboidPrediction`, :class:`SegmentationPrediction` :class:`CategoryPrediction`, :class:`KeypointsPrediction`, ]] .. py:method:: predictions_refloc(model, reference_id) Fetches all predictions of a dataset item by its reference ID. :param model: Model object from which to fetch the prediction. :type model: :class:`Model` :param reference_id: User-defined ID of the dataset item from which to fetch all predictions. :type reference_id: str :returns: Dictionary mapping prediction type to a list of such prediction objects from the given model:: { "box": List[BoxPrediction], "polygon": List[PolygonPrediction], "cuboid": List[CuboidPrediction], "segmentation": List[SegmentationPrediction], "category": List[CategoryPrediction], "keypoints": List[KeypointsPrediction], } :rtype: List[Union[ :class:`BoxPrediction`, :class:`PolygonPrediction`, :class:`CuboidPrediction`, :class:`SegmentationPrediction` :class:`CategoryPrediction`, :class:`KeypointsPrediction`, ]] .. py:method:: query_items(query) Fetches all DatasetItems that pertain to a given structured query. :param query: Structured query compatible with the `Nucleus query language `_. :returns: A list of DatasetItem query results. .. py:method:: query_objects(query, query_type, model_run_id = None) Fetches all objects in the dataset that pertain to a given structured query. The results are either Predictions, Annotations, or Evaluation Matches, based on the objectType input parameter :param query: Structured query compatible with the `Nucleus query language `_. :param objectType: Defines the type of the object to query :returns: An iterable of either Predictions, Annotations, or Evaluation Matches .. py:method:: query_scenes(query) Fetches all Scenes that pertain to a given structured query. :param query: Structured query compatible with the `Nucleus query language `_. :returns: A list of Scene query results. .. py:method:: refloc(reference_id) Fetches a dataset item and associated annotations by reference ID. :param reference_id: User-defined reference ID of the dataset item. :returns: Payload containing the dataset item and associated annotations:: { "item": DatasetItem "annotations": { "box": Optional[List[BoxAnnotation]], "cuboid": Optional[List[CuboidAnnotation]], "line": Optional[List[LineAnnotation]], "polygon": Optional[List[PolygonAnnotation]], "keypoints": Option[List[KeypointsAnnotation]], "segmentation": Optional[List[SegmentationAnnotation]], "category": Optional[List[CategoryAnnotation]], } } .. py:method:: scene_and_annotation_generator(page_size=10) Provides a generator of all DatasetItems and Annotations in the dataset grouped by scene. :returns: Iterable[{ "file_location": str, "metadata": Dict[str, Any], "annotations": { "{trackId}": { "label": str, "name": str, "frames": List[{ "left": int, "top": int, "width": int, "height": int, "key": str, # frame key "metadata": Dict[str, Any] }] } } }] This is similar to how the Scale API returns task data :rtype: Generator where each element is a nested dict (representing a JSON) structured in the following way .. py:method:: set_continuous_indexing(enable = True) Toggle whether embeddings are automatically generated for new data. Sets continuous indexing for a given dataset, which will automatically generate embeddings for use with autotag whenever new images are uploaded. :param enable: Whether to enable or disable continuous indexing. Default is True. :returns: Response payload:: { "dataset_id": str, "message": str "backfill_job": AsyncJob, } .. py:method:: set_primary_index(image = True, custom = False) Sets the primary index used for Autotag and Similarity Search on this dataset. :param image: Whether to configure the primary index for images or objects. Default is True (set primary image index). :param custom: Whether to set the primary index to use custom or Nucleus-generated embeddings. Default is True (use custom embeddings as the primary index). :returns: { "success": bool, } .. py:method:: update_autotag(autotag_id) Rerun autotag inference on all items in the dataset. Currently this endpoint does not try to skip already inferenced items, but this improvement is planned for the future. This means that for now, you can only have one job running at a time, so please await the result using job.sleep_until_complete() before launching another job. :param autotag_id: ID of the autotag to re-inference. You can retrieve the ID you want with :meth:`list_autotags`, or from its URL in the "Manage Autotags" page in the dashboard. :returns: Asynchronous job object to track processing status. :rtype: :class:`AsyncJob` .. py:method:: update_item_metadata(mapping, asynchronous = False) Update (merge) dataset item metadata for each reference_id given in the mapping. The backend will join the specified mapping metadata to the existing metadata. If there is a key-collision, the value given in the mapping will take precedence. This method may also be used to udpate the `camera_params` for a particular set of items. Just specify the key `camera_params` in the metadata for each reference_id along with all the necessary fields. :param mapping: key-value pair of : :param asynchronous: if True, run the update as a background job .. rubric:: Examples >>> mapping = {"item_ref_1": {"new_key": "foo"}, "item_ref_2": {"some_value": 123, "camera_params": {...}}} >>> dataset.update_item_metadata(mapping) :returns: A dictionary outlining success or failures. .. py:method:: update_scene_metadata(mapping, asynchronous = False) Update (merge) scene metadata for each reference_id given in the mapping. The backend will join the specified mapping metadata to the existing metadata. If there is a key-collision, the value given in the mapping will take precedence. :param mapping: key-value pair of : :param asynchronous: if True, run the update as a background job .. rubric:: Examples >>> mapping = {"scene_ref_1": {"new_key": "foo"}, "scene_ref_2": {"some_value": 123}} >>> dataset.update_scene_metadata(mapping) :returns: A dictionary outlining success or failures. .. py:method:: upload_lidar_semseg_predictions(model, pointcloud_ref_id, predictions_s3_path) Upload Lidar Semantic Segmentation predictions for a given point-cloud. Assuming a point-cloud with only 4 points (three labeled as Car, one labeled as Person), the contents of the predictions s3 object should be formatted as such: .. code-block:: json { "objects": [ { "label": "Car", "index": 1}, { "label": "Person", "index": 2} ], "point_objects": [1, 1, 1, 2], "point_confidence": [0.5, 0.9, 0.9, 0.3] } The order of the points in the `"point_objects"` should be in the same order as the points that were originally uploaded to Scale. :param model: Nucleus model used to store these predictions :type model: :class:`Model` :param pointcloud_ref_id: The reference ID of the pointcloud for which these predictions belong to :type pointcloud_ref_id: str :param predictions_s3_path: S3 path to where the predictions are stored :type predictions_s3_path: str .. py:method:: upload_predictions(model, predictions, update = False, asynchronous = False, batch_size = 5000, remote_files_per_upload_request = 20, local_files_per_upload_request = 10, trained_slice_id = None) Uploads predictions and associates them with an existing :class:`Model`. Adding predictions to your dataset in Nucleus allows you to visualize discrepancies against ground truth, query dataset items based on the predictions they contain, and evaluate your models by comparing their predictions to ground truth. Nucleus supports :class:`Box`, :class:`Polygon`, :class:`Cuboid`, :class:`Segmentation`, :class:`Category`, and :class:`Category` predictions. Cuboid predictions can only be uploaded to a :class:`pointcloud DatasetItem`. When uploading a prediction, you need to specify which item you are annotating via the reference_id you provided when uploading the image or pointcloud. Ground truth uploads can be made idempotent by specifying an optional annotation_id for each prediction. This id should be unique within the dataset_item so that (reference_id, annotation_id) is unique within the dataset. See :class:`SegmentationPrediction` for specific requirements to upload segmentation predictions. For ingesting large prediction payloads, see the `Guide for Large Ingestions `_. :param model: Nucleus-generated model ID (starts with ``prj_``). This can be retrieved via :meth:`list_models` or a Nucleus dashboard URL. :type model: :class:`Model` :param predictions: List of prediction objects to upload. :type predictions: List[Union[ :class:`BoxPrediction`, :class:`PolygonPrediction`, :class:`CuboidPrediction`, :class:`SegmentationPrediction`, :class:`CategoryPrediction` :class:`SceneCategoryPrediction` ]] :param update: Whether or not to overwrite metadata or ignore on reference ID collision. Default is False. :param asynchronous: Whether or not to process the upload asynchronously (and return an :class:`AsyncJob` object). Default is False. :param batch_size: Number of predictions processed in each concurrent batch. Default is 5000. If you get timeouts when uploading geometric predictions, you can try lowering this batch size. This is only relevant for asynchronous=False :param remote_files_per_upload_request: Number of remote files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with remote urls, you should lower this value from its default of 20. This is only relevant for asynchronous=False. :param local_files_per_upload_request: Number of local files to upload in each request. Segmentations have either local or remote files, if you are getting timeouts while uploading segmentations with local files, you should lower this value from its default of 10. The maximum is 10. This is only relevant for asynchronous=False :param trained_slice_id: Nucleus-generated slice ID (starts with ``slc_``) which was used to train the model. :returns: Payload describing the synchronous upload:: { "dataset_id": str, "model_run_id": str, "predictions_processed": int, "predictions_ignored": int, } .. py:class:: DatasetInfo(**data) High-level :class:`Dataset` information .. attribute:: dataset_id Nucleus-generated dataset ID .. attribute:: name User-defined name of dataset .. attribute:: length Number of :class:`DatasetItem` in :class:`Dataset` .. attribute:: model_run_ids (deprecated) .. attribute:: slice_ids List :class:`Slice` IDs associated with the :class:`Dataset` .. attribute:: annotation_metadata_schema Dict defining annotation-level metadata schema. .. attribute:: item_metadata_schema Dict defining item metadata schema. Create a new model by parsing and validating input data from keyword arguments. Raises ValidationError if the input data cannot be parsed to form a valid model. .. py:method:: construct(_fields_set = None, **values) :classmethod: Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if `Config.extra = 'allow'` was set since it adds all passed values .. py:method:: copy(*, include = None, exclude = None, update = None, deep = False) Duplicate a model, optionally choose which fields to include, exclude and change. :param include: fields to include in new model :param exclude: fields to exclude from new model, as with values this takes precedence over include :param update: values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data :param deep: set to `True` to make a deep copy of the model :return: new model instance .. py:method:: dict(*, include = None, exclude = None, by_alias = False, skip_defaults = None, exclude_unset = False, exclude_defaults = False, exclude_none = False) Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. .. py:method:: json(*, include = None, exclude = None, by_alias = False, skip_defaults = None, exclude_unset = False, exclude_defaults = False, exclude_none = False, encoder = None, models_as_dict = True, **dumps_kwargs) Generate a JSON representation of the model, `include` and `exclude` arguments as per `dict()`. `encoder` is an optional function to supply as `default` to json.dumps(), other arguments as per `json.dumps()`. .. py:method:: model_construct(_fields_set = None, **values) :classmethod: Creates a new instance of the `Model` class with validated data. Creates a new model setting `__dict__` and `__pydantic_fields_set__` from trusted or pre-validated data. Default values are respected, but no other validation is performed. !!! note `model_construct()` generally respects the `model_config.extra` setting on the provided model. That is, if `model_config.extra == 'allow'`, then all extra passed values are added to the model instance's `__dict__` and `__pydantic_extra__` fields. If `model_config.extra == 'ignore'` (the default), then all extra passed values are ignored. Because no validation is performed with a call to `model_construct()`, having `model_config.extra == 'forbid'` does not result in an error if extra values are passed, but they will be ignored. :param _fields_set: The set of field names accepted for the Model instance. :param values: Trusted or pre-validated data dictionary. :returns: A new instance of the `Model` class with validated data. .. py:method:: model_copy(*, update = None, deep = False) Usage docs: https://docs.pydantic.dev/2.8/concepts/serialization/#model_copy Returns a copy of the model. :param update: Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data. :param deep: Set to `True` to make a deep copy of the model. :returns: New model instance. .. py:method:: model_dump(*, mode = 'python', include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.8/concepts/serialization/#modelmodel_dump Generate a dictionary representation of the model, optionally specifying which fields to include or exclude. :param mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. :param include: A set of fields to include in the output. :param exclude: A set of fields to exclude from the output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to use the field's alias in the dictionary key if defined. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :returns: A dictionary representation of the model. .. py:method:: model_dump_json(*, indent = None, include = None, exclude = None, context = None, by_alias = False, exclude_unset = False, exclude_defaults = False, exclude_none = False, round_trip = False, warnings = True, serialize_as_any = False) Usage docs: https://docs.pydantic.dev/2.8/concepts/serialization/#modelmodel_dump_json Generates a JSON representation of the model using Pydantic's `to_json` method. :param indent: Indentation to use in the JSON output. If None is passed, the output will be compact. :param include: Field(s) to include in the JSON output. :param exclude: Field(s) to exclude from the JSON output. :param context: Additional context to pass to the serializer. :param by_alias: Whether to serialize using field aliases. :param exclude_unset: Whether to exclude fields that have not been explicitly set. :param exclude_defaults: Whether to exclude fields that are set to their default value. :param exclude_none: Whether to exclude fields that have a value of `None`. :param round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. :param warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. :param serialize_as_any: Whether to serialize fields with duck-typing serialization behavior. :returns: A JSON string representation of the model. .. py:method:: model_json_schema(by_alias = True, ref_template = DEFAULT_REF_TEMPLATE, schema_generator = GenerateJsonSchema, mode = 'validation') :classmethod: Generates a JSON schema for a model class. :param by_alias: Whether to use attribute aliases or not. :param ref_template: The reference template. :param schema_generator: To override the logic used to generate the JSON schema, as a subclass of `GenerateJsonSchema` with your desired modifications :param mode: The mode in which to generate the schema. :returns: The JSON schema for the given model class. .. py:method:: model_parametrized_name(params) :classmethod: Compute the class name for parametrizations of generic classes. This method can be overridden to achieve a custom naming scheme for generic BaseModels. :param params: Tuple of types of the class. Given a generic class `Model` with 2 type variables and a concrete model `Model[str, int]`, the value `(str, int)` would be passed to `params`. :returns: String representing the new class where `params` are passed to `cls` as type variables. :raises TypeError: Raised when trying to generate concrete names for non-generic models. .. py:method:: model_post_init(__context) Override this method to perform additional initialization after `__init__` and `model_construct`. This is useful if you want to do some validation that requires the entire model to be initialized. .. py:method:: model_rebuild(*, force = False, raise_errors = True, _parent_namespace_depth = 2, _types_namespace = None) :classmethod: Try to rebuild the pydantic-core schema for the model. This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails. :param force: Whether to force the rebuilding of the model schema, defaults to `False`. :param raise_errors: Whether to raise errors, defaults to `True`. :param _parent_namespace_depth: The depth level of the parent namespace, defaults to 2. :param _types_namespace: The types namespace, defaults to `None`. :returns: Returns `None` if the schema is already "complete" and rebuilding was not required. If rebuilding _was_ required, returns `True` if rebuilding was successful, otherwise `False`. .. py:method:: model_validate(obj, *, strict = None, from_attributes = None, context = None) :classmethod: Validate a pydantic model instance. :param obj: The object to validate. :param strict: Whether to enforce types strictly. :param from_attributes: Whether to extract data from object attributes. :param context: Additional context to pass to the validator. :raises ValidationError: If the object could not be validated. :returns: The validated model instance. .. py:method:: model_validate_json(json_data, *, strict = None, context = None) :classmethod: Usage docs: https://docs.pydantic.dev/2.8/concepts/json/#json-parsing Validate the given JSON data against the Pydantic model. :param json_data: The JSON data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :returns: The validated Pydantic model. :raises ValueError: If `json_data` is not a JSON string. .. py:method:: model_validate_strings(obj, *, strict = None, context = None) :classmethod: Validate the given object with string data against the Pydantic model. :param obj: The object containing string data to validate. :param strict: Whether to enforce types strictly. :param context: Extra variables to pass to the validator. :returns: The validated Pydantic model. .. py:method:: update_forward_refs(**localns) :classmethod: Try to update ForwardRefs on fields based on this Model, globalns and localns. .. py:class:: DatasetItem A dataset item is an image or pointcloud that has associated metadata. Note: for 3D data, please include a :class:`CameraParams` object under a key named "camera_params" within the metadata dictionary. This will allow for projecting 3D annotations to any image within a scene. :param image_location: Required if pointcloud_location is not present: The location containing the image for the given row of data. This can be a local path, or a remote URL. Remote formats supported include any URL (``http://`` or ``https://``) or URIs for AWS S3, Azure, or GCS (i.e. ``s3://``, ``gcs://``). :type image_location: Optional[str] :param pointcloud_location: Required if image_location is not present: The remote URL containing the pointcloud JSON. Remote formats supported include any URL (``http://`` or ``https://``) or URIs for AWS S3, Azure, or GCS (i.e. ``s3://``, ``gcs://``). :type pointcloud_location: Optional[str] :param reference_id: A user-specified identifier to reference the item. :type reference_id: Optional[str] :param metadata: Extra information about the particular dataset item. ints, floats, string values will be made searchable in the query bar by the key in this dict. For example, ``{"animal": "dog"}`` will become searchable via ``metadata.animal = "dog"``. Categorical data can be passed as a string and will be treated categorically by Nucleus if there are less than 250 unique values in the dataset. This means histograms of values in the "Insights" section and autocomplete within the query bar. Numerical metadata will generate histograms in the "Insights" section, allow for sorting the results of any query, and can be used with the modulo operator For example: metadata.frame_number % 5 = 0 All other types of metadata will be visible from the dataset item detail view. It is important that string and numerical metadata fields are consistent - if a metadata field has a string value, then all metadata fields with the same key should also have string values, and vice versa for numerical metadata. If conflicting types are found, Nucleus will return an error during upload! The recommended way of adding or updating existing metadata is to re-run the ingestion (dataset.append) with update=True, which will replace any existing metadata with whatever your new ingestion run uses. This will delete any metadata keys that are not present in the new ingestion run. We have a cache based on image_location that will skip the need for a re-upload of the images, so your second ingestion will be faster than your first. For 3D (sensor fusion) data, it is highly recommended to include camera intrinsics the metadata of your camera image items. Nucleus requires these intrinsics to create visualizations such as cuboid projections. Refer to our `guide to uploading 3D data `_ for more info. Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as `{ "lat": 52.5, "lon": 13.3, ... }`. Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying `{ "context_attachments": [ { "attachment": 'https://example.com/1' }, { "attachment": 'https://example.com/2' }, ... ] }`. .. todo :: Shorten this once we have a guide migrated for metadata, or maybe link from other places to here. :type metadata: Optional[dict] .. py:method:: from_json(payload) :classmethod: Instantiates dataset item object from schematized JSON dict payload. .. py:method:: to_json() Serializes dataset item object to schematized JSON string. .. py:method:: to_payload(is_scene=False) Serializes dataset item object to schematized JSON dict. .. py:class:: EmbeddingsExportJob Object used to check the status or errors of a long running asynchronous operation. :: import nucleus client = nucleus.NucleusClient(YOUR_SCALE_API_KEY) dataset = client.get_dataset("ds_bwkezj6g5c4g05gqp1eg") # When kicking off an asynchronous job, store the return value as a variable job = dataset.append(items=YOUR_DATASET_ITEMS, asynchronous=True) # Poll for status or errors print(job.status()) print(job.errors()) # Block until job finishes job.sleep_until_complete() .. py:method:: errors() Fetches a list of the latest errors generated by the asynchronous job. Useful for debugging failed or partially successful jobs. :returns: A list of strings containing the 10,000 most recently generated errors. :: [ '{"annotation":{"label":"car","type":"box","geometry":{"x":50,"y":60,"width":70,"height":80},"referenceId":"bad_ref_id","annotationId":"attempted_annot_upload","metadata":{}},"error":"Item with id bad_ref_id does not exist."}' ] .. py:method:: from_id(job_id, client) :classmethod: Creates a job instance from a specific job Id. :param job_id: Defines the job Id :param client: The client to use for the request. :returns: The specific AsyncMethod (or inherited) instance. .. py:method:: result_urls(wait_for_completion=True) Gets a list of signed Scale URLs for each embedding batch. :param wait_for_completion: Defines whether the call shall wait for the job to complete. Defaults to True :returns: A list of signed Scale URLs which contain batches of embeddings. The files contain a JSON array of embedding records with the following schema: [{ "reference_id": str, "embedding_vector": List[float] }] .. py:method:: sleep_until_complete(verbose_std_out=True) Blocks until the job completes or errors. :param verbose_std_out: Whether or not to verbosely log while sleeping. Defaults to True. :type verbose_std_out: Optional[bool] .. py:method:: status() Fetches status of the job and an informative message on job progress. :returns: A dict of the job ID, status (one of Running, Completed, or Errored), an informative message on the job progress, and number of both completed and total steps. :: { "job_id": "job_c19xcf9mkws46gah0000", "status": "Completed", "message": "Job completed successfully.", "job_progress": "0.33", "completed_steps": "1", "total_steps:": "3", } .. py:class:: Frame(**kwargs) Collection of sensor data pertaining to a single time step. For 3D data, each Frame houses a sensor-to-data mapping and must have exactly one pointcloud with any number of camera images. :param \*\*kwargs: Mappings from sensor name to dataset item. Each frame of a lidar scene must contain exactly one pointcloud and any number of images (e.g. from different angles). :type \*\*kwargs: Dict[str, :class:`DatasetItem`] Refer to our `guide to uploading 3D data `_ for more info! .. py:method:: add_item(item, sensor_name) Adds DatasetItem object to frame as sensor data. :param item: Pointcloud or camera image item to add. :type item: :class:`DatasetItem` :param sensor_name: Name of the sensor, e.g. "lidar" or "front_cam." .. py:method:: from_json(payload) :classmethod: Instantiates frame object from schematized JSON dict payload. .. py:method:: get_item(sensor_name) Fetches the DatasetItem object associated with the given sensor. :param sensor_name: Name of the sensor, e.g. "lidar" or "front_cam." :returns: DatasetItem object pertaining to the sensor. :rtype: :class:`DatasetItem` .. py:method:: get_items() Fetches all items in the frame. :returns: List of all DatasetItem objects in the frame. :rtype: List[:class:`DatasetItem`] .. py:method:: get_sensors() Fetches all sensor names of the frame. :returns: List of all sensor names of the frame. .. py:method:: to_payload() Serializes frame object to schematized JSON dict. .. py:class:: Keypoint A 2D point that has an additional visibility flag. Keypoints are intended to be part of a larger collection, and connected via a pre-defined skeleton. A keypoint in this skeleton may be visible or not-visible, and may be unlabeled and not visible. Because of this, the x, y coordinates may be optional, assuming that the keypoint is not visible, and would not be shown as part of the combined label. :param x: The x coordinate of the point. :type x: Optional[float] :param y: The y coordinate of the point. :type y: Optional[float] :param visible: The visibility of the point. :type visible: bool .. py:class:: KeypointsAnnotation A keypoints annotation containing a list of keypoints and the structure of those keypoints: the naming of each point and the skeleton that connects those keypoints. :: from nucleus import KeypointsAnnotation keypoints = KeypointsAnnotation( label="face", keypoints=[Keypoint(100, 100), Keypoint(120, 120), Keypoint(visible=False), Keypoint(0, 0)], names=["point1", "point2", "point3", "point4"], skeleton=[[0, 1], [1, 2], [1, 3], [2, 3]], reference_id="image_2", annotation_id="image_2_face_keypoints_1", metadata={"face_direction": "forward"}, track_reference_id="face_1", ) :param label: The label for this annotation. :type label: str :param keypoints: The list of keypoints objects. :type keypoints: List[:class:`Keypoint`] :param names: A list that corresponds to the names of each keypoint. :type names: List[str] :param skeleton: A list of 2-length lists indicating a beginning and ending index for each line segment in the skeleton of this keypoint label. :type skeleton: Optional[List[List[int]]] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] :param track_reference_id: A unique string to identify the annotation as part of a group. For instance, multiple "car" annotations across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: KeypointsPrediction(label, keypoints, names, skeleton, reference_id, confidence = None, annotation_id = None, metadata = None, class_pdf = None, track_reference_id = None) Prediction of keypoints. :param label: The label for this annotation (e.g. car, pedestrian, bicycle). :type label: str :param keypoints: The list of keypoints objects. :type keypoints: List[:class:`Keypoint`] :param names: A list that corresponds to the names of each keypoint. :type names: List[str] :param skeleton: A list of 2-length lists indicating a beginning and ending index for each line segment in the skeleton of this keypoint label. :type skeleton: List[List[int]] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param confidence: 0-1 indicating the confidence of the prediction. :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] :param class_pdf: An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain. :param track_reference_id: A unique string to identify the prediction as part of a group. For instance, multiple "car" predictions across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: LidarPoint A Lidar point in 3D space and intensity. :param x: The x coordinate of the point. :type x: float :param y: The y coordinate of the point. :type y: float :param z: The z coordinate of the point. :type z: float :param i: The intensity value returned by the lidar scan point. :type i: float .. py:class:: LidarScene Sequence of lidar pointcloud and camera images over time. Nucleus 3D datasets are comprised of LidarScenes, which are sequences of lidar pointclouds and camera images over time. These sequences are in turn comprised of :class:`Frames `. By organizing data across multiple sensors over time, LidarScenes make it easier to interpret pointclouds, allowing you to see objects move over time by clicking through frames and providing context in the form of corresponding images. You can think of scenes and frames as nested groupings of sensor data across time: * LidarScene for a given location * Frame at timestep 0 * DatasetItem of pointcloud * DatasetItem of front camera image * DatasetItem of rear camera image * Frame at timestep 1 * ... * ... * LidarScene for another location * ... LidarScenes are uploaded to a :class:`Dataset` with any accompanying metadata. Frames do not accept metadata, but each of its constituent :class:`DatasetItems ` does. Note: Uploads with a different number of frames/items will error out (only on scenes that now differ). Existing scenes are expected to retain the same structure, i.e. the same number of frames, and same items per frame. If a scene definition is changed (for example, additional frames added) the update operation will be ignored. If you would like to alter the structure of a scene, please delete the scene and re-upload. :param reference_id: User-specified identifier to reference the scene. :type reference_id: str :param frames: List of frames to be a part of the scene. A scene can be created before frames or items have been added to it, but must be non-empty when uploading to a :class:`Dataset`. :type frames: Optional[List[:class:`Frame`]] :param metadata: Optional metadata to include with the scene. Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as `{ "lat": 52.5, "lon": 13.3, ... }`. Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying `{ "context_attachments": [ { "attachment": 'https://example.com/1' }, { "attachment": 'https://example.com/2' }, ... ] }`. :type metadata: Optional[Dict] Refer to our `guide to uploading 3D data `_ for more info! .. py:method:: add_frame(frame, index, update = False) Adds frame to scene at the specified index. :param frame: Frame object to add. :type frame: :class:`Frame` :param index: Serial index at which to add the frame. :param update: Whether to overwrite the frame at the specified index, if it exists. Default is False. .. py:method:: add_item(index, sensor_name, item) Adds DatasetItem to the specified frame as sensor data. :param index: Serial index of the frame to which to add the item. :param item: Pointcloud or camera image item to add. :type item: :class:`DatasetItem` :param sensor_name: Name of the sensor, e.g. "lidar" or "front_cam." .. py:method:: from_json(payload, client = None, skip_validate = False) :classmethod: Instantiates scene object from schematized JSON dict payload. .. py:method:: get_frame(index) Fetches the Frame object at the specified index. :param index: Serial index for which to retrieve the Frame. :returns: Frame object at the specified index. :rtype: :class:`Frame` .. py:method:: get_frames() Fetches a sorted list of Frames of the scene. :returns: List of Frames, sorted by index ascending. :rtype: List[:class:`Frame`] .. py:method:: get_item(index, sensor_name) Fetches the DatasetItem object of the given frame and sensor. :param index: Serial index of the frame from which to fetch the item. :param sensor_name: Name of the sensor, e.g. "lidar" or "front_cam." :returns: DatasetItem object of the frame and sensor. :rtype: :class:`DatasetItem` .. py:method:: get_items() Fetches all items in the scene. :returns: Unordered list of all DatasetItem objects in the scene. :rtype: List[:class:`DatasetItem`] .. py:method:: get_items_from_sensor(sensor_name) Fetches all DatasetItem objects of the given sensor. :param sensor_name: Name of the sensor, e.g. "lidar" or "front_cam." :returns: List of DatasetItem objects associated with the specified sensor. :rtype: List[:class:`DatasetItem`] .. py:method:: get_sensors() Fetches all sensor names of the scene. :returns: List of all sensor names associated with frames in the scene. .. py:method:: info() Fetches information about the scene. :returns: Payload containing:: { "reference_id": str, "length": int, "num_sensors": int } .. py:method:: to_json() Serializes scene object to schematized JSON string. .. py:method:: to_payload() Serializes scene object to schematized JSON dict. .. py:class:: LineAnnotation A polyline annotation consisting of an ordered list of 2D points. A LineAnnotation differs from a PolygonAnnotation by not forming a closed loop, and by having zero area. :: from nucleus import LineAnnotation line = LineAnnotation( label="face", vertices=[Point(100, 100), Point(200, 300), Point(300, 200)], reference_id="person_image_1", annotation_id="person_image_1_line_1", metadata={"camera_mode": "portrait"}, track_reference_id="face_human", ) :param label: The label for this annotation. :type label: str :param vertices: The list of points making up the line. :type vertices: List[:class:`Point`] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] :param track_reference_id: A unique string to identify the annotation as part of a group. For instance, multiple "car" annotations across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: LinePrediction(label, vertices, reference_id, confidence = None, annotation_id = None, metadata = None, class_pdf = None, track_reference_id = None) Prediction of a line. :param label: The label for this prediction (e.g. car, pedestrian, bicycle). :type label: str :param vertices: The list of points making up the line. :type vertices: List[:class:`Point`] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param confidence: 0-1 indicating the confidence of the prediction. :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this prediction. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] :param class_pdf: An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain. :param track_reference_id: A unique string to identify the prediction as part of a group. For instance, multiple "car" predictions across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: Model(model_id, name, reference_id, metadata, client, bundle_name=None, tags=None, trained_slice_ids=None) A model that can be used to upload predictions to a dataset. By uploading model predictions to Nucleus, you can compare your predictions to ground truth annotations and discover problems with your Models or :class:`Dataset`. You can also upload predictions for unannotated images, letting you query them based on model predictions. This can help you prioritize which unlabeled data to label next. Within Nucleus, Models work in the following way: 1. You first :meth:`create a Model`. You can do this just once and reuse the model on multiple datasets. 2. You then :meth:`upload predictions ` to a dataset. 3. Trigger :meth:`calculation of metrics ` in order to view model debugging insights. The above steps above will allow you to visualize model performance within Nucleus, or compare multiple models that have been run on the same Dataset. Note that you can always add more predictions to a dataset, but then you will need to re-run the calculation of metrics in order to have them be correct. :: import nucleus client = nucleus.NucleusClient(YOUR_SCALE_API_KEY) dataset = client.get_dataset(YOUR_DATASET_ID) prediction_1 = nucleus.BoxPrediction( label="label", x=0, y=0, width=10, height=10, reference_id="1", confidence=0.9, class_pdf={"label": 0.9, "other_label": 0.1}, ) prediction_2 = nucleus.BoxPrediction( label="label", x=0, y=0, width=10, height=10, reference_id="2", confidence=0.2, class_pdf={"label": 0.2, "other_label": 0.8}, ) model = client.create_model( name="My Model", reference_id="My-CNN", metadata={"timestamp": "121012401"} ) # For small ingestions, we recommend synchronous ingestion response = dataset.upload_predictions(model, [prediction_1, prediction_2]) # For large ingestions, we recommend asynchronous ingestion job = dataset.upload_predictions( model, [prediction_1, prediction_2], asynchronous=True ) # Check current status job.status() # Sleep until ingestion is done job.sleep_until_complete() # Check errors job.errors() dataset.calculate_evaluation_metrics(model) Models cannot be instantiated directly and instead must be created via API endpoint, using :meth:`NucleusClient.create_model`. .. py:method:: add_tags(tags) Tag the model with custom tag names. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") model = client.list_models()[0] model.add_tags(["tag_A", "tag_B"]) :param tags: list of tag names .. py:method:: add_trained_slice_ids(slice_ids) Add trained slice id(s) to the model. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") model = client.list_models()[0] model.add_trained_slice_ids(["slc_...", "slc_..."]) :param slice_ids: list of trained slice ids .. py:method:: evaluate(scenario_test_names) Evaluates this on the specified Unit Tests. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") model = client.list_models()[0] scenario_test = client.validate.create_scenario_test( "sample_scenario_test", "YOUR_SLICE_ID" ) model.evaluate(["sample_scenario_test"]) :param scenario_test_names: list of unit tests to evaluate :returns: AsyncJob object of evaluation job .. py:method:: from_json(payload, client) :classmethod: Instantiates model object from schematized JSON dict payload. .. py:method:: remove_tags(tags) Remove tag(s) from the model. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") model = client.list_models()[0] model.remove_tags(["tag_x"]) :param tags: list of tag names to remove .. py:method:: remove_trained_slice_ids(slide_ids) Remove trained slice id(s) from the model. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") model = client.list_models()[0] model.remove_trained_slice_ids(["slc_...", "slc_..."]) :param slice_ids: list of trained slice ids to remove .. py:method:: run(dataset_id, model_run_name, slice_id) Runs inference on the bundle associated with the model on the dataset. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") model = client.list_models()[0] model.run("ds_123456") :param dataset_id: The ID of the dataset to run inference on. :param model_run_name: The name of the model run. :param slice_id: The ID of the slice of the dataset to run inference on. :returns: The ID of the :class:`AsyncJob` used to track job progress. :rtype: job_id .. py:class:: NucleusClient(api_key = None, use_notebook = False, endpoint = None) Client to interact with the Nucleus API via Python SDK. :param api_key: Follow `this guide `_ to retrieve your API keys. :param use_notebook: Whether the client is being used in a notebook (toggles tqdm style). Default is ``False``. :param endpoint: Base URL of the API. Default is Nucleus's current production API. .. py:method:: append_to_slice(slice_id, reference_ids, dataset_id) Appends dataset items or scenes to an existing slice. :param slice_id: Nucleus-generated dataset ID (starts with ``slc_``). This can be retrieved via :meth:`Dataset.slices` or a Nucleus dashboard URL. :param reference_ids: List of user-defined reference IDs of dataset items or scenes to append to the slice. :param dataset_id: ID of dataset this slice belongs to. :returns: Empty payload response. .. py:method:: create_dataset(name, is_scene = None, use_privacy_mode = False, item_metadata_schema = None, annotation_metadata_schema = None) Creates a new, empty dataset. Make sure that the dataset is created for the data type you would like to support. Be sure to set the ``is_scene`` parameter correctly. :param name: A human-readable name for the dataset. :param is_scene: Whether the dataset contains strictly :class:`scenes ` or :class:`items `. This value is immutable. Default is False (dataset of items). :param use_privacy_mode: Whether the images of this dataset should be uploaded to Scale. If set to True, customer will have to adjust their file access policy with Scale. :param item_metadata_schema: Dict defining item-level metadata schema. See below. :param annotation_metadata_schema: Dict defining annotation-level metadata schema. Metadata schemas must be structured as follows:: { "field_name": { "type": "category" | "number" | "text" | "json" "choices": List[str] | None "description": str | None }, ... } :returns: The newly created Nucleus dataset as an object. :rtype: :class:`Dataset` .. py:method:: create_dataset_from_dir(dirname, dataset_name = None, use_privacy_mode = False, privacy_mode_proxy = '', allowed_file_types = ('png', 'jpg', 'jpeg'), skip_size_warning = False) Create a dataset by recursively crawling through a directory. A DatasetItem will be created for each unique image found. :param dirname: Where to look for image files, recursively :param dataset_name: If none is given, the parent folder name is used :param use_privacy_mode: Whether the dataset should be treated as privacy :param privacy_mode_proxy: Endpoint that serves image files for privacy mode, ignore if not using privacy mode. The proxy should work based on the relative path of the images in the directory. :param allowed_file_types: Which file type extensions to search for, ie: ('jpg', 'png') :param skip_size_warning: If False, it will throw an error if the script globs more than 500 images. This is a safety check in case the dirname has a typo, and grabs too much data. .. py:method:: create_dataset_from_project(project_id, last_n_tasks = None, name = None) Create a new dataset from an existing Scale or Rapid project. If you already have Annotation, SegmentAnnotation, VideoAnnotation, Categorization, PolygonAnnotation, ImageAnnotation, DocumentTranscription, LidarLinking, LidarAnnotation, or VideoboxAnnotation projects with Scale, use this endpoint to import your project directly into Nucleus. This endpoint is asynchronous because there can be delays when the number of tasks is larger than 1000. As a result, the endpoint returns an instance of :class:`AsyncJob`. :param project_id: The ID of the Scale/Rapid project (retrievable from URL). :param last_n_tasks: If supplied, only pull in this number of the most recent tasks. By default the endpoint will pull in all eligible tasks. :param name: The name for your new Nucleus dataset. By default the endpoint will use the project's name. :returns: The newly created Nucleus dataset as an object. :rtype: :class:`Dataset` .. py:method:: create_launch_model(name, reference_id, bundle_args, metadata = None, trained_slice_ids = None) Adds a :class:`Model` to Nucleus, as well as a Launch bundle from a given function. :param name: A human-readable name for the model. :param reference_id: Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme. :param bundle_args: Dict for kwargs for the creation of a Launch bundle, more details on the keys below. :param metadata: An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model. :returns: The newly created model as an object. :rtype: :class:`Model` Details on `bundle_args`: Grabs a s3 signed url and uploads a model bundle to Scale Launch. A model bundle consists of exactly {predict_fn_or_cls}, {load_predict_fn + model}, or {load_predict_fn + load_model_fn}. Pre/post-processing code can be included inside load_predict_fn/model or in predict_fn_or_cls call. Note: the exact parameters used will depend on the version of the Launch client used. i.e. if you are on Launch client version 0.x, you will use `env_params`, otherwise you will use `pytorch_image_tag` and `tensorflow_version`. :param model_bundle_name: Name of model bundle you want to create. This acts as a unique identifier. :param predict_fn_or_cls: Function or a Callable class that runs end-to-end (pre/post processing and model inference) on the call. I.e. `predict_fn_or_cls(REQUEST) -> RESPONSE`. :param model: Typically a trained Neural Network, e.g. a Pytorch module :param load_predict_fn: Function that when called with model, returns a function that carries out inference I.e. `load_predict_fn(model) -> func; func(REQUEST) -> RESPONSE` :param load_model_fn: Function that when run, loads a model, e.g. a Pytorch module I.e. `load_predict_fn(load_model_fn()) -> func; func(REQUEST) -> RESPONSE` :param bundle_url: Only for self-hosted mode. Desired location of bundle. :param Overrides any value given by self.bundle_location_fn: :param requirements: A list of python package requirements, e.g. ["tensorflow==2.3.0", "tensorflow-hub==0.11.0"]. If no list has been passed, will default to the currently imported list of packages. :param app_config: Either a Dictionary that represents a YAML file contents or a local path to a YAML file. :param env_params: Only for launch v0. A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which cuda/cudnn versions to use. Specifically, the dictionary should contain the following keys: "framework_type": either "tensorflow" or "pytorch". "pytorch_version": Version of pytorch, e.g. "1.5.1", "1.7.0", etc. Only applicable if framework_type is pytorch "cuda_version": Version of cuda used, e.g. "11.0". "cudnn_version" Version of cudnn used, e.g. "cudnn8-devel". "tensorflow_version": Version of tensorflow, e.g. "2.3.0". Only applicable if framework_type is tensorflow :param globals_copy: Dictionary of the global symbol table. Normally provided by `globals()` built-in function. :param pytorch_image_tag: Only for launch v1, and if you want to use pytorch framework type. The tag of the pytorch docker image you want to use, e.g. 1.11.0-cuda11.3-cudnn8-runtime :param tensorflow_version: Only for launch v1, and if you want to use tensorflow. Version of tensorflow, e.g. "2.3.0". .. py:method:: create_launch_model_from_dir(name, reference_id, bundle_from_dir_args, metadata = None, trained_slice_ids = None) Adds a :class:`Model` to Nucleus, as well as a Launch bundle from a directory. :param name: A human-readable name for the model. :param reference_id: Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme. :param bundle_from_dir_args: Dict for kwargs for the creation of a bundle from directory, more details on the keys below. :param metadata: An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model. :returns: The newly created model as an object. :rtype: :class:`Model` Details on `bundle_from_dir_args` Packages up code from one or more local filesystem folders and uploads them as a bundle to Scale Launch. In this mode, a bundle is just local code instead of a serialized object. For example, if you have a directory structure like so, and your current working directory is also `my_root`: ``` my_root/ my_module1/ __init__.py ...files and directories my_inference_file.py my_module2/ __init__.py ...files and directories ``` then calling `create_model_bundle_from_dirs` with `base_paths=["my_module1", "my_module2"]` essentially creates a zip file without the root directory, e.g.: ``` my_module1/ __init__.py ...files and directories my_inference_file.py my_module2/ __init__.py ...files and directories ``` and these contents will be unzipped relative to the server side `PYTHONPATH`. Bear these points in mind when referencing Python module paths for this bundle. For instance, if `my_inference_file.py` has `def f(...)` as the desired inference loading function, then the `load_predict_fn_module_path` argument should be `my_module1.my_inference_file.f`. Note: the exact keys for `bundle_from_dir_args` used will depend on the version of the Launch client used. i.e. if you are on Launch client version 0.x, you will use `env_params`, otherwise you will use `pytorch_image_tag` and `tensorflow_version`. Keys for `bundle_from_dir_args`: model_bundle_name: Name of model bundle you want to create. This acts as a unique identifier. base_paths: The paths on the local filesystem where the bundle code lives. requirements_path: A path on the local filesystem where a requirements.txt file lives. env_params: Only for launch v0. A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which cuda/cudnn versions to use. Specifically, the dictionary should contain the following keys: "framework_type": either "tensorflow" or "pytorch". "pytorch_version": Version of pytorch, e.g. "1.5.1", "1.7.0", etc. Only applicable if framework_type is pytorch "cuda_version": Version of cuda used, e.g. "11.0". "cudnn_version" Version of cudnn used, e.g. "cudnn8-devel". "tensorflow_version": Version of tensorflow, e.g. "2.3.0". Only applicable if framework_type is tensorflow load_predict_fn_module_path: A python module path for a function that, when called with the output of load_model_fn_module_path, returns a function that carries out inference. load_model_fn_module_path: A python module path for a function that returns a model. The output feeds into the function located at load_predict_fn_module_path. app_config: Either a Dictionary that represents a YAML file contents or a local path to a YAML file. pytorch_image_tag: Only for launch v1, and if you want to use pytorch framework type. The tag of the pytorch docker image you want to use, e.g. 1.11.0-cuda11.3-cudnn8-runtime tensorflow_version: Only for launch v1, and if you want to use tensorflow. Version of tensorflow, e.g. "2.3.0". .. py:method:: create_model(name, reference_id, metadata = None, bundle_name = None, tags = None, trained_slice_ids = None) Adds a :class:`Model` to Nucleus. :param name: A human-readable name for the model. :param reference_id: Unique, user-controlled ID for the model. This can be used, for example, to link to an external storage of models which may have its own id scheme. :param metadata: An arbitrary dictionary of additional data about this model that can be stored and retrieved. For example, you can store information about the hyperparameters used in training this model. :param bundle_name: Optional name of bundle attached to this model :param tags: Optional list of tags to attach to this model :returns: The newly created model as an object. :rtype: :class:`Model` .. py:method:: delete_autotag(autotag_id) Deletes an autotag by ID. :param autotag_id: Nucleus-generated autotag ID (starts with ``tag_``). This can be retrieved via :meth:`list_autotags` or a Nucleus dashboard URL. :returns: Empty payload response. .. py:method:: delete_dataset(dataset_id) Deletes a dataset by ID. All items, annotations, and predictions associated with the dataset will be deleted as well. Note that if this dataset is linked to a Scale or Rapid labeling project, the project itself will not be deleted. :param dataset_id: The ID of the dataset to delete. :returns: Payload to indicate deletion invocation. .. py:method:: delete_model(model_id) Deletes a model by ID. :param model_id: Nucleus-generated model ID (starts with ``prj_``). This can be retrieved via :meth:`list_models` or a Nucleus dashboard URL. :returns: Empty payload response. .. py:method:: delete_slice(slice_id) Deletes slice from Nucleus. :param slice_id: Nucleus-generated dataset ID (starts with ``slc_``). This can be retrieved via :meth:`Dataset.slices` or a Nucleus dashboard URL. :returns: Empty payload response. .. py:method:: download_pointcloud_task(task_id, frame_num) Download the lidar point cloud data for a give task and frame number. :param task_id: download point cloud for this particular task :param frame_num: download point cloud for this particular frame :returns: List of Point3D objects .. py:method:: download_pointcloud_tasks(task_ids, frame_num) Download the lidar point cloud data for a given set of tasks and frame number. :param task_ids: list of task ids to fetch data from :param frame_num: download point cloud for this particular frame :returns: A dictionary from task_id to list of Point3D objects .. py:method:: get_autotag_refinement_metrics(autotag_id) Retrieves refinement metrics for an autotag by ID. :param autotag_id: Nucleus-generated autotag ID (starts with ``tag_``). This can be retrieved via :meth:`list_autotags` or a Nucleus dashboard URL. :returns: Response payload:: { "total_refinement_steps": int "average_positives_selected_per_refinement": int "average_ms_taken_in_refinement": float } .. py:method:: get_dataset(dataset_id) Fetches a dataset by its ID. :param dataset_id: The ID of the dataset to fetch. :returns: The Nucleus dataset as an object. :rtype: :class:`Dataset` .. py:method:: get_job(job_id) Fetches a dataset by its ID. :param job_id: The ID of the dataset to fetch. :returns: The Nucleus async job as an object. :rtype: :class:`AsyncJob` .. py:method:: get_model(model_id = None, model_run_id = None) Fetches a model by its ID. :param model_id: You can pass either a model ID (starts with ``prj_``) or a model run ID (starts with ``run_``) This can be retrieved via :meth:`list_models` or a Nucleus dashboard URL. Model run IDs result from the application of a model to a dataset. :param model_run_id: You can pass either a model ID (starts with ``prj_``), or a model run ID (starts with ``run_``) This can be retrieved via :meth:`list_models` or a Nucleus dashboard URL. Model run IDs result from the application of a model to a dataset. In the future, we plan to hide ``model_run_ids`` fully from users. :returns: The Nucleus model as an object. :rtype: :class:`Model` .. py:method:: get_slice(slice_id) Returns a slice object by Nucleus-generated ID. :param slice_id: Nucleus-generated dataset ID (starts with ``slc_``). This can be retrieved via :meth:`Dataset.slices` or a Nucleus dashboard URL. :returns: The Nucleus slice as an object. :rtype: :class:`Slice` .. py:method:: list_jobs(show_completed = False, from_date = None, to_date = None, job_types = None, limit = None, dataset_id = None, date_limit = None) Fetches all of your running jobs in Nucleus. :param job_types: Filter on set of job types, if None, fetch all types :param from_date: beginning of date range filter :param to_date: end of date range filter :param limit: number of results to fetch, max 50_000 :param show_completed: dont fetch jobs with Completed status :param stats_only: return overview of jobs, instead of a list of job objects :param dataset_id: filter on a particular dataset :param date_limit: Deprecated, do not use Returns: List[:class:`AsyncJob`]: List of running asynchronous jobs associated with the client API key. .. py:method:: make_request(payload, route, requests_command=requests.post, return_raw_response = False) Makes a request to a Nucleus API endpoint. Logs a warning if not successful. :param payload: Given request payload. :param route: Route for the request. :param Requests command: ``requests.post``, ``requests.get``, or ``requests.delete``. :param return_raw_response: return the request's response object entirely :returns: Response payload as JSON dict or request object. .. py:method:: valid_dirname(dirname) :staticmethod: Validate directory exists :param dirname: Path of directory :returns: Existing directory path .. py:class:: Point A point in 2D space. :param x: The x coordinate of the point. :type x: float :param y: The y coordinate of the point. :type y: float .. py:class:: Point3D A point in 3D space. :param x: The x coordinate of the point. :type x: float :param y: The y coordinate of the point. :type y: float :param z: The z coordinate of the point. :type z: float .. py:class:: PolygonAnnotation A polygon annotation consisting of an ordered list of 2D points. :: from nucleus import PolygonAnnotation polygon = PolygonAnnotation( label="bus", vertices=[Point(100, 100), Point(150, 200), Point(200, 100)], reference_id="image_2", annotation_id="image_2_bus_polygon_1", metadata={"vehicle_color": "yellow"}, embedding_vector=[0.1423, 1.432, ..., 3.829], track_reference_id="school_bus", ) :param label: The label for this annotation. :type label: str :param vertices: The list of points making up the polygon. :type vertices: List[:class:`Point`] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] :param embedding_vector: Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors. :param track_reference_id: A unique string to identify the annotation as part of a group. For instance, multiple "car" annotations across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: PolygonPrediction(label, vertices, reference_id, confidence = None, annotation_id = None, metadata = None, class_pdf = None, embedding_vector = None, track_reference_id = None) Prediction of a polygon. :param label: The label for this annotation (e.g. car, pedestrian, bicycle). :type label: str :param vertices List[:class:`Point`]: The list of points making up the polygon. :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param confidence: 0-1 indicating the confidence of the prediction. :param annotation_id: The annotation ID that uniquely identifies this annotation within its target dataset item. Upon ingest, a matching annotation id will be ignored by default, and updated if update=True for dataset.annotate. :type annotation_id: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] :param class_pdf: An optional complete class probability distribution on this annotation. Each value should be between 0 and 1 (inclusive), and sum up to 1 as a complete distribution. This can be useful for computing entropy to surface places where the model is most uncertain. :param embedding_vector: Custom embedding vector for this object annotation. If any custom object embeddings have been uploaded previously to this dataset, this vector must match the dimensions of the previously ingested vectors. :param track_reference_id: A unique string to identify the prediction as part of a group. For instance, multiple "car" predictions across several dataset items may have the same `track_reference_id` such as "red_car". .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: Quaternion Quaternion objects are used to represent rotation. We use the Hamilton/right-handed quaternion convention, where :: i^2 = j^2 = k^2 = ijk = -1 The quaternion represented by the tuple ``(x, y, z, w)`` is equal to ``w + x*i + y*j + z*k``. :param x: The x value. :type x: float :param y: The y value. :type y: float :param x: The z value. :type x: float :param w: The w value. :type w: float .. py:method:: from_json(payload) :classmethod: Instantiates quaternion object from schematized JSON dict payload. .. py:method:: to_payload() Serializes quaternion object to schematized JSON dict. .. py:class:: SceneCategoryAnnotation A scene category annotation. :: from nucleus import SceneCategoryAnnotation category = SceneCategoryAnnotation( label="running", reference_id="scene_1", taxonomy_name="action", metadata={ "weather": "clear", }, ) :param label: The label for this annotation. :type label: str :param reference_id: User-defined ID of the scene to which to apply this annotation. :type reference_id: str :param taxonomy_name: The name of the taxonomy this annotation conforms to. See :meth:`Dataset.add_taxonomy`. :type taxonomy_name: Optional[str] :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: SceneCategoryPrediction(label, reference_id, taxonomy_name = None, confidence = None, metadata = None) A prediction of a category for a scene. :: from nucleus import SceneCategoryPrediction category = SceneCategoryPrediction( label="running", reference_id="scene_1", taxonomy_name="action", confidence=0.83, metadata={ "weather": "clear", }, ) :param label: The label for this annotation (e.g. action, subject, scenario). :param reference_id: The reference ID of the scene you wish to apply this annotation to. :param taxonomy_name: The name of the taxonomy this annotation conforms to. See :meth:`Dataset.add_taxonomy`. :param confidence: 0-1 indicating the confidence of the prediction. :param metadata: Arbitrary key/value dictionary of info to attach to this annotation. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Returns True if annotation has local files that need to be uploaded. Nearly all subclasses have no local files, so we default this to just return false. If the subclass has local files, it should override this method (but that is not the only thing required to get local upload of files to work.) .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: Segment Segment represents either a class or an instance depending on the task type. For semantic segmentation, this object should store the mapping between a single class index and the string label. For instance segmentation, you can use this class to store the label of a single instance, whose extent in the image is represented by the value of ``index``. In both cases, additional metadata can be attached to the segment. :param label: The label name of the class for the class or instance represented by index in the associated mask. :type label: str :param index: The integer pixel value in the mask this mapping refers to. :type index: int :param metadata: Arbitrary key/value dictionary of info to attach to this segment. Strings, floats and ints are supported best by querying and insights features within Nucleus. For more details see our `metadata guide `_. :type metadata: Optional[Dict] .. py:class:: SegmentationAnnotation A segmentation mask on a 2D image. When uploading a mask annotation, Nucleus expects the mask file to be in PNG format with each pixel being a 0-255 uint8. Currently, Nucleus only supports uploading masks from URL. Nucleus automatically enforces the constraint that each DatasetItem can have at most one ground truth segmentation mask. As a consequence, if during upload a duplicate mask is detected for a given image, by default it will be ignored. You can change this behavior by setting ``update = True``, which will replace the existing segmentation mask with the new mask. :: from nucleus import SegmentationAnnotation segmentation = SegmentationAnnotation( mask_url="s3://your-bucket-name/segmentation-masks/image_2_mask_id_1.png", annotations=[ Segment(label="grass", index="1"), Segment(label="road", index="2"), Segment(label="bus", index="3", metadata={"vehicle_color": "yellow"}), Segment(label="tree", index="4") ], reference_id="image_2", annotation_id="image_2_mask_1", ) :param mask_url: A URL pointing to the segmentation prediction mask which is accessible to Scale. This "URL" can also be a path to a local file. The mask is an HxW int8 array saved in PNG format, with each pixel value ranging from [0, N), where N is the number of possible classes (for semantic segmentation) or instances (for instance segmentation). The height and width of the mask must be the same as the original image. One example for semantic segmentation: the mask is 0 for pixels where there is background, 1 where there is a car, and 2 where there is a pedestrian. Another example for instance segmentation: the mask is 0 for one car, 1 for another car, 2 for a motorcycle and 3 for another motorcycle. The class name for each value in the mask is stored in the list of Segment objects passed for "annotations" :type mask_url: str :param annotations: The list of mappings between the integer values contained in mask_url and string class labels. In the semantic segmentation example above these would map that 0 to background, 1 to car and 2 to pedestrian. In the instance segmentation example above, 0 and 1 would both be mapped to car, 2 and 3 would both be mapped to motorcycle :type annotations: List[:class:`Segment`] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param annotation_id: For segmentation annotations, this value is ignored because there can only be one segmentation annotation per dataset item. Therefore regardless of annotation ID, if there is an existing segmentation on a dataset item, it will be ignored unless update=True is passed to :meth:`Dataset.annotate`, in which case it will be overwritten. Storing a custom ID here may be useful in order to tie this annotation to an external database, and its value will be returned for any export. :type annotation_id: Optional[str] .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Check if the mask url is local and needs to be uploaded. .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: SegmentationPrediction Predicted segmentation mask on a 2D image. :: from nucleus import SegmentationPrediction segmentation = SegmentationPrediction( mask_url="s3://your-bucket-name/pred-seg-masks/image_2_pred_mask_id_1.png", annotations=[ Segment(label="grass", index="1"), Segment(label="road", index="2"), Segment(label="bus", index="3", metadata={"vehicle_color": "yellow"}), Segment(label="tree", index="4") ], reference_id="image_2", annotation_id="image_2_pred_mask_1", ) :param mask_url: A URL pointing to the segmentation prediction mask which is accessible to Scale. This "URL" can also be a path to a local file. The mask is an HxW int8 array saved in PNG format, with each pixel value ranging from [0, N), where N is the number of possible classes (for semantic segmentation) or instances (for instance segmentation). The height and width of the mask must be the same as the original image. One example for semantic segmentation: the mask is 0 for pixels where there is background, 1 where there is a car, and 2 where there is a pedestrian. Another example for instance segmentation: the mask is 0 for one car, 1 for another car, 2 for a motorcycle and 3 for another motorcycle. The class name for each value in the mask is stored in the list of Segment objects passed for "annotations" :type mask_url: str :param annotations: The list of mappings between the integer values contained in mask_url and string class labels. In the semantic segmentation example above these would map that 0 to background, 1 to car and 2 to pedestrian. In the instance segmentation example above, 0 and 1 would both be mapped to car, 2 and 3 would both be mapped to motorcycle :type annotations: List[:class:`Segment`] :param reference_id: User-defined ID of the image to which to apply this annotation. :type reference_id: str :param annotation_id: For segmentation predictions, this value is ignored because there can only be one segmentation prediction per dataset item. Therefore regardless of annotation ID, if there is an existing segmentation on a dataset item, it will be ignored unless update=True is passed to :meth:`Dataset.annotate`, in which case it will be overwritten. Storing a custom ID here may be useful in order to tie this annotation to an external database, and its value will be returned for any export. :type annotation_id: Optional[str] .. py:method:: from_json(payload) :classmethod: Instantiates annotation object from schematized JSON dict payload. .. py:method:: has_local_files_to_upload() Check if the mask url is local and needs to be uploaded. .. py:method:: to_json() Serializes annotation object to schematized JSON string. .. py:method:: to_payload() Serializes annotation object to schematized JSON dict. .. py:class:: Slice(slice_id, client) A Slice represents a subset of DatasetItems in your Dataset. Slices are subsets of your Dataset that unlock curation and exploration workflows. Instead of thinking of your Datasets as collections of data, it is useful to think about them as a collection of Slices. For instance, your dataset may contain different weather scenarios, traffic conditions, or highway types. Perhaps your Models perform poorly on foggy weather scenarios; it is then useful to slice your dataset into a "foggy" slice, and fine-tune model performance on this slice until it reaches the performance you desire. Slices cannot be instantiated directly and instead must be created in the dashboard, or via API endpoint using :meth:`Dataset.create_slice`. :: import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") dataset = client.get_dataset("YOUR_DATASET_ID") ref_ids = ["interesting_item_1", "interesting_item_2"] slice = dataset.create_slice(name="interesting", reference_ids=ref_ids) .. py:method:: add_tags(tags) Tag a slice with custom tag names. import nucleus client = nucleus.NucleusClient("YOUR_SCALE_API_KEY") slc = client.get_slice("YOUR_SLICE_ID") slc.add_tags(["tag_1", "tag_2"]) :param tags: list of tag names .. py:method:: append(reference_ids = None) Appends existing DatasetItems from a Dataset to a Slice. The endpoint expects a list of DatasetItem reference IDs which are set at upload time. The length of reference_ids cannot exceed 10,000 items per request. :param reference_ids: List of user-defined reference IDs of dataset items or scenes to append to the slice. :returns: Dict of the slice_id and the newly appended IDs. :: { "slice_id": str, "new_items": List[str] } :raises BadRequest: If length of reference_ids is too large (> 10,000 items) .. py:method:: dataset_items() Fetch all DatasetItems contained in the Slice. We recommend using :meth:`Slice.items_generator` if the Slice has more than 200k items. Returns: list of DatasetItem objects .. py:method:: export_embeddings(asynchronous = True) Fetches a pd.DataFrame-ready list of slice embeddings. :param asynchronous: Whether or not to process the export asynchronously (and return an :class:`EmbeddingsExportJob` object). Default is True. :returns: If synchronous, a list where each element is a columnar mapping:: List[{ "reference_id": str, "embedding_vector": List[float] }] Otherwise, returns an :class:`EmbeddingsExportJob` object. .. py:method:: export_predictions(model) Provides a list of all DatasetItems and Predictions in the Slice for the given Model. :param model: the nucleus model objects representing the model for which to export predictions. :type model: Model :returns: List where each element is a dict containing the DatasetItem and all of its associated Predictions, grouped by type (e.g. box). :: List[{ "item": DatasetItem, "predictions": { "box": List[BoxAnnotation], "polygon": List[PolygonAnnotation], "cuboid": List[CuboidAnnotation], "segmentation": List[SegmentationAnnotation], "category": List[CategoryAnnotation], } }] .. py:method:: export_predictions_generator(model) Provides a list of all DatasetItems and Predictions in the Slice for the given Model. :param model: the nucleus model objects representing the model for which to export predictions. :type model: Model :returns: Iterable where each element is a dict containing the DatasetItem and all of its associated Predictions, grouped by type (e.g. box). :: List[{ "item": DatasetItem, "predictions": { "box": List[BoxAnnotation], "polygon": List[PolygonAnnotation], "cuboid": List[CuboidAnnotation], "segmentation": List[SegmentationAnnotation], "category": List[CategoryAnnotation], } }] .. py:method:: export_raw_items() Fetches a list of accessible URLs for each item in the Slice. :returns: List where each element is a dict containing a DatasetItem and its accessible (signed) Scale URL. :: List[{ "id": str, "ref_id": str, "metadata": Dict[str, Union[str, int]], "original_url": str, "scale_url": str }] .. py:method:: export_raw_json() Exports object slices in a raw JSON format. Note that it currently does not support item-level slices. For each object or match in an object slice, this method exports the following information: - The item that contains the object. - The prediction and/or annotation (both, if the slice is based on IOU matches). - If the object is part of a scene, it includes scene-level attributes in the export. :returns: An iterable where each element is a dictionary containing JSON-formatted data. :: List[{ "item": DatasetItem (as JSON), "annotation": BoxAnnotation/CuboidAnnotation (as JSON) "prediction": BoxPrediction/CuboidPrediction (as JSON) "scene": Scene (as JSON) } }] .. py:method:: export_scale_task_info() Fetches info for all linked Scale tasks of items/scenes in the slice. :returns: A list of dicts, each with two keys, respectively mapping to items/scenes and info on their corresponding Scale tasks within the dataset:: List[{ "item" | "scene": Union[DatasetItem, Scene], "scale_task_info": { "task_id": str, "task_status": str, "task_audit_status": str, "task_audit_review_comment": Optional[str], "project_name": str, "batch": str, "created_at": str, "completed_at": Optional[str] }] }] .. py:method:: info() Retrieves the name, slice_id, and dataset_id of the Slice. :returns: A dict mapping keys to the corresponding info retrieved. :: { "name": Union[str, int], "slice_id": str, "dataset_id": str, "type": str "pending_job_count": int "created_at": datetime "description": Union[str, None] "tags": } .. py:method:: items_and_annotation_generator() Provides a generator of all DatasetItems and Annotations in the slice. :returns: Generator where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type (e.g. box). :: Iterable[{ "item": DatasetItem, "annotations": { "box": List[BoxAnnotation], "polygon": List[PolygonAnnotation], "cuboid": List[CuboidAnnotation], "line": List[LineAnnotation], "segmentation": List[SegmentationAnnotation], "category": List[CategoryAnnotation], "keypoints": List[KeypointsAnnotation], } }] .. py:method:: items_and_annotations() Provides a list of all DatasetItems and Annotations in the Slice. :returns: List where each element is a dict containing the DatasetItem and all of its associated Annotations, grouped by type (e.g. box). :: List[{ "item": DatasetItem, "annotations": { "box": List[BoxAnnotation], "polygon": List[PolygonAnnotation], "cuboid": List[CuboidAnnotation], "line": List[LineAnnotation], "segmentation": List[SegmentationAnnotation], "category": List[CategoryAnnotation], "keypoints": List[KeypointsAnnotation], } }] .. py:method:: items_generator(page_size=100000) Generator yielding all dataset items in the dataset. :: collected_ref_ids = [] for item in dataset.items_generator(): print(f"Exporting item: {item.reference_id}") collected_ref_ids.append(item.reference_id) :param page_size: Number of items to return per page. If you are experiencing timeouts while using this generator, you can try lowering the page size. :type page_size: int, optional :Yields: an iterable of DatasetItem objects. .. py:method:: send_to_labeling(project_id) Send items in the Slice as tasks to a Scale labeling project. This endpoint submits the items of the Slice as tasks to a pre-existing Scale Annotation project uniquely identified by projectId. Only projects of type General Image Annotation are currently supported. Additionally, in order for task submission to succeed, the project must have task instructions and geometries configured as project-level parameters. In order to create a project or set project parameters, you must use the Scale Annotation API, which is documented here: `Scale Annotation API Documentation `_. When the newly created annotation tasks are annotated, the annotations will be automatically reflected in the Nucleus platform. For self-serve projects, user can choose to submit the slice as a calibration batch, which is recommended for brand new labeling projects. For more information about calibration batches, please reference `Overview of Self Serve Workflow `_. Note: A batch can be either a calibration batch or a self label batch, but not both. Note: Nucleus only supports bounding box, polygon, and line annotations. If the project parameters specify any other geometries (ellipses or points), those objects will be annotated, but they will not be reflected in Nucleus. :param project_id: Scale-defined ID of the target annotation project. .. todo :: Add the below parameters, if needed. calibration_batch (Optional[bool]): Relevant to Scale Rapid projects only. An optional boolean signaling whether to send as a "calibration batch" for taskers to preliminarily evaluate your project instructions and parameters. self_label_batch (Optional[bool]): Relevant to Scale Rapid projects only. An optional boolean signaling whether to send as a "self-label batch," in which your team can label internally through Scale Rapid. .. py:class:: VideoScene Video or sequence of images over time. Nucleus video datasets are comprised of VideoScenes. These can be comprised of a single video, or a sequence of :class:`DatasetItems ` which are equivalent to frames. VideoScenes are uploaded to a :class:`Dataset` with any accompanying metadata. Each of :class:`DatasetItems ` representing a frame also accepts metadata. Note: Updates with different items will error out (only on scenes that now differ). Existing video are expected to retain the same frames, and only metadata can be updated. If a video definition is changed (for example, additional frames added) the update operation will be ignored. If you would like to alter the structure of a video scene, please delete the scene and re-upload. :param reference_id: User-specified identifier to reference the scene. :type reference_id: str :param frame_rate: Required if uploading items. Frame rate of the video. :type frame_rate: Optional[int] :param video_location: Required if not uploading items. The remote URL containing the video MP4. Remote formats supported include any URL (``http://`` or ``https://``) or URIs for AWS S3, Azure, or GCS (i.e. ``s3://``, ``gcs://``). :type video_location: Optional[str] :param items: Required if not uploading video_location. List of items representing frames, to be a part of the scene. A scene can be created before items have been added to it, but must be non-empty when uploading to a :class:`Dataset`. A video scene can contain a maximum of 3000 items. :type items: Optional[List[:class:`DatasetItem`]] :param metadata: Optional metadata to include with the scene. Coordinate metadata may be provided to enable the Map Chart in the Nucleus Dataset charts page. These values can be specified as `{ "lat": 52.5, "lon": 13.3, ... }`. Context Attachments may be provided to display the attachments side by side with the dataset item in the Detail View by specifying `{ "context_attachments": [ { "attachment": 'https://example.com/1' }, { "attachment": 'https://example.com/2' }, ... ] }`. :type metadata: Optional[Dict] Refer to our `guide to uploading video data `_ for more info! .. py:method:: add_item(item, index = None, update = False) Adds DatasetItem to the specified index for videos uploaded as an array of images. :param item: Video item to add. :type item: :class:`DatasetItem` :param index: Serial index at which to add the item. :param update: Whether to overwrite the item at the specified index, if it exists. Default is False. .. py:method:: from_json(payload, client = None) :classmethod: Instantiates scene object from schematized JSON dict payload. .. py:method:: get_item(index) Fetches the DatasetItem at the specified index for videos uploaded as an array of images. :param index: Serial index for which to retrieve the DatasetItem. :returns: DatasetItem at the specified index. :rtype: :class:`DatasetItem` .. py:method:: get_items() Fetches a sorted list of DatasetItems of the scene for videos uploaded as an array of images. :returns: List of DatasetItems, sorted by index ascending. :rtype: List[:class:`DatasetItem`] .. py:method:: info() Fetches information about the video scene. :returns: Payload containing:: { "reference_id": str, "length": Optional[int], "frame_rate": int, "video_url": Optional[str], } .. py:method:: to_json() Serializes scene object to schematized JSON string. .. py:method:: to_payload() Serializes scene object to schematized JSON dict.