fiftyone-dataset-export
Export FiftyOne Datasets
Key Directives
ALWAYS follow these rules:
1. Load and understand the dataset first
set_context(dataset_name="my-dataset")
dataset_summary(name="my-dataset")
2. Confirm export settings with user
Before exporting, present:
- Dataset name and sample count
- Available label fields and their types
- Proposed export format
- Export directory path
3. Match format to label types
Different formats support different label types:
| Format | Label Types |
|---|---|
| COCO | detections, segmentations, keypoints |
| YOLO (v4, v5) | detections |
| VOC | detections |
| CVAT | classifications, detections, polylines, keypoints |
| CSV | all (custom fields) |
| Image Classification Directory Tree | classification |
4. Use absolute paths
Always use absolute paths for export directories:
params={
"export_dir": {"absolute_path": "/path/to/export"}
}
5. Warn about overwriting
Check if export directory exists before exporting. If it does, ask user whether to overwrite.
Complete Workflow
Step 1: Load Dataset and Understand Content
# Set context
set_context(dataset_name="my-dataset")
# Get dataset summary to see fields and label types
dataset_summary(name="my-dataset")
Identify:
- Total sample count
- Media type (images, videos, point clouds)
- Available label fields and their types (Detections, Classifications, etc.)
Step 2: Get Export Operator Schema
# Discover export parameters dynamically
get_operator_schema(operator_uri="@voxel51/io/export_samples")
Step 3: Present Export Options to User
Before exporting, confirm with the user:
Dataset: my-dataset (5,000 samples)
Media type: image
Available label fields:
- ground_truth (Detections)
- predictions (Detections)
Export options:
- Format: COCO (recommended for detections)
- Export directory: /path/to/export
- Label field: ground_truth
Proceed with export?
Step 4: Execute Export
Export media and labels:
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/export"},
"label_field": "ground_truth"
}
)
Export labels only (no media copy):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/labels.json"},
"label_field": "ground_truth"
}
)
Export media only (no labels):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_ONLY",
"export_dir": {"absolute_path": "/path/to/media"}
}
)
Step 5: Verify Export
After export, verify the output:
ls -la /path/to/export
Report exported file count and structure to user.
Supported Export Formats
Detection Formats
| Format | dataset_type Value |
Label Types | Labels-Only |
|---|---|---|---|
| COCO | "COCO" |
detections, segmentations, keypoints | Yes |
| YOLOv4 | "YOLOv4" |
detections | Yes |
| YOLOv5 | "YOLOv5" |
detections | No |
| VOC | "VOC" |
detections | Yes |
| KITTI | "KITTI" |
detections | Yes |
| CVAT Image | "CVAT Image" |
classifications, detections, polylines, keypoints | Yes |
| CVAT Video | "CVAT Video" |
frame labels | Yes |
| TF Object Detection | "TF Object Detection" |
detections | No |
Classification Formats
| Format | dataset_type Value |
Media Type | Labels-Only |
|---|---|---|---|
| Image Classification Directory Tree | "Image Classification Directory Tree" |
image | No |
| Video Classification Directory Tree | "Video Classification Directory Tree" |
video | No |
| TF Image Classification | "TF Image Classification" |
image | No |
Segmentation Formats
| Format | dataset_type Value |
Label Types | Labels-Only |
|---|---|---|---|
| Image Segmentation | "Image Segmentation" |
segmentation | Yes |
General Formats
| Format | dataset_type Value |
Best For | Labels-Only |
|---|---|---|---|
| CSV | "CSV" |
Custom fields, spreadsheet analysis | Yes |
| GeoJSON | "GeoJSON" |
Geolocation data | Yes |
| FiftyOne Dataset | "FiftyOne Dataset" |
Full dataset backup with all metadata | Yes |
Note: Formats with "Labels-Only: No" require export_type: "MEDIA_AND_LABELS" (cannot export labels without media).
Export Type Options
export_type Value |
Description |
|---|---|
"MEDIA_AND_LABELS" |
Export both media files and labels |
"LABELS_ONLY" |
Export labels only (use labels_path instead of export_dir) |
"MEDIA_ONLY" |
Export media files only (no labels) |
"FILEPATHS_ONLY" |
Export CSV with filepaths only |
Target Options
Export from different sources:
target Value |
Description |
|---|---|
"DATASET" |
Export entire dataset (default) |
"CURRENT_VIEW" |
Export current filtered view |
"SELECTED_SAMPLES" |
Export selected samples only |
Common Use Cases
Use Case 1: Export to COCO Format
For training with frameworks that use COCO format:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/coco_export"},
"label_field": "ground_truth"
}
)
Output structure:
coco_export/
├── data/
│ ├── image1.jpg
│ └── image2.jpg
└── labels.json
Use Case 2: Export to YOLO Format
For training YOLOv5/v8 models:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "YOLOv5",
"export_dir": {"absolute_path": "/path/to/yolo_export"},
"label_field": "ground_truth"
}
)
Output structure:
yolo_export/
├── images/
│ └── train/
│ └── image1.jpg
├── labels/
│ └── train/
│ └── image1.txt
└── dataset.yaml
Use Case 3: Export Filtered View
Export only a subset of samples:
# Set context
set_context(dataset_name="my-dataset")
# Filter samples in the App
set_view(tags=["validated"])
# Export the filtered view
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"target": "CURRENT_VIEW",
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/validated_export"},
"label_field": "ground_truth"
}
)
Use Case 4: Export Labels Only
When media should stay in place:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/annotations.json"},
"label_field": "ground_truth"
}
)
Use Case 5: Export for Classification Training
For image classification datasets:
set_context(dataset_name="my-classification-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "Image Classification Directory Tree",
"export_dir": {"absolute_path": "/path/to/classification_export"},
"label_field": "ground_truth"
}
)
Output structure:
classification_export/
├── cat/
│ ├── cat1.jpg
│ └── cat2.jpg
└── dog/
├── dog1.jpg
└── dog2.jpg
Use Case 6: Export to CSV
For analysis in spreadsheets:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "CSV",
"labels_path": {"absolute_path": "/path/to/data.csv"},
"csv_fields": ["filepath", "ground_truth.detections.label"]
}
)
Use Case 7: Export FiftyOne Dataset (Full Backup)
For complete dataset backup including all metadata:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "FiftyOne Dataset",
"export_dir": {"absolute_path": "/path/to/backup"}
}
)
Output structure:
backup/
├── metadata.json
├── samples.json
├── data/
│ └── ...
├── annotations/
├── brain/
└── evaluations/
Python SDK Alternative
For more control, guide users to use the Python SDK directly:
import fiftyone as fo
import fiftyone.types as fot
# Load dataset
dataset = fo.load_dataset("my-dataset")
# Export to COCO format
dataset.export(
export_dir="/path/to/export",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export labels only
dataset.export(
labels_path="/path/to/labels.json",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export a filtered view
view = dataset.match_tags("validated")
view.export(
export_dir="/path/to/validated",
dataset_type=fot.YOLOv5Dataset,
label_field="ground_truth",
)
Python SDK dataset types:
fot.COCODetectionDataset- COCO formatfot.YOLOv4Dataset- YOLOv4 formatfot.YOLOv5Dataset- YOLOv5 formatfot.VOCDetectionDataset- Pascal VOC formatfot.KITTIDetectionDataset- KITTI formatfot.CVATImageDataset- CVAT image formatfot.CVATVideoDataset- CVAT video formatfot.TFObjectDetectionDataset- TensorFlow Object Detection formatfot.ImageClassificationDirectoryTree- Classification folder structurefot.VideoClassificationDirectoryTree- Video classification foldersfot.TFImageClassificationDataset- TensorFlow classification formatfot.ImageSegmentationDirectory- Segmentation masksfot.CSVDataset- CSV formatfot.GeoJSONDataset- GeoJSON formatfot.FiftyOneDataset- Native FiftyOne format
Exporting to Hugging Face Hub
For complete HF Hub export documentation, see HF-HUB-EXPORT.md.
Quick reference:
| Method | Use Case |
|---|---|
push_to_hub() |
Personal accounts, simple upload |
| Manual upload | Organizations, private org repos |
Quick start:
from fiftyone.utils.huggingface import push_to_hub
# Personal account
push_to_hub(dataset, repo_name="my-dataset", private=False)
# With options
push_to_hub(
dataset,
repo_name="my-dataset",
description="My dataset description",
license="apache-2.0",
private=True,
)
IMPORTANT: Always generate and get user approval for dataset card before uploading. See HF-HUB-EXPORT.md for complete documentation including authentication setup, dataset card workflow, parameters reference, use cases, and troubleshooting.
Troubleshooting
Error: "Export directory already exists"
- Add
"overwrite": trueto params - Or specify a different export directory
Error: "Label field not found"
- Use
dataset_summary()to see available label fields - Verify the field name spelling
Error: "Unsupported label type for format"
- Check that the export format supports your label type
- COCO: detections, segmentations, keypoints
- YOLO: detections only
- Classification formats: classification labels only
Error: "Permission denied"
- Verify write permissions for the export directory
- Check parent directory exists
Export is slow
- Large datasets take time; consider exporting a view first
- Export to local disk rather than network drives
- For labels only, use
LABELS_ONLYexport type
Best Practices
- Understand your data first - Use
dataset_summary()to know what fields and label types exist - Match format to purpose - Use COCO/YOLO for training, CSV for analysis, FiftyOne Dataset for backups
- Confirm with user - Present export settings before executing
- Export filtered views - Only export what's needed rather than entire datasets
- Verify after export - Check exported file counts match expectations
- Use labels_path for LABELS_ONLY - When exporting labels only, use
labels_pathnotexport_dir
Resources
More from voxel51/fiftyone-skills
fiftyone-find-duplicates
Finds duplicate or near-duplicate images in FiftyOne datasets using brain similarity computation. Use when deduplicating datasets, finding similar images, or removing redundant samples.
20fiftyone-dataset-import
Imports datasets into FiftyOne with automatic format detection. Supports all media types (images, videos, point clouds), label formats (COCO, YOLO, VOC, KITTI), multimodal grouped datasets, and Hugging Face Hub datasets. Use when importing datasets from local files or Hugging Face, loading autonomous driving data, or creating grouped datasets.
14fiftyone-model-evaluation
Evaluate model predictions against ground truth using COCO, Open Images, or custom protocols. Use when computing mAP, precision, recall, confusion matrices, or analyzing TP/FP/FN examples for detection, classification, segmentation, or regression tasks.
12fiftyone-dataset-inference
Run ML model inference on FiftyOne datasets. Use when running models for detection, classification, segmentation, or embeddings. Discovers available models dynamically from the Zoo, plugin operators, or custom sources — never assumes a fixed model list.
12fiftyone-embeddings-visualization
Visualizes datasets in 2D using embeddings with UMAP or t-SNE dimensionality reduction. Use when exploring dataset structure, finding clusters, identifying outliers, or understanding data distribution.
11fiftyone-code-style
Writes Python code following FiftyOne's official conventions. Use when contributing to FiftyOne, developing plugins, or writing code that integrates with FiftyOne's codebase.
10