Scene-Based Synthetic Dataset Generation (SDG) using Isaac Sim
By Juliana Faux, Andrés Escobar and Santiago Ferreiros
This blog summarizes the workflow for synthetic dataset generation using Isaac Sim. The provided code demonstrates how to automate object movement and rotation within a simulated environment, capture images from a camera, and generate annotations such as bounding boxes. This process is essential for creating synthetic datasets for training and testing computer vision models.
Overview
We are going to perform the following main tasks:
- Scene Setup: Configures the simulation environment and loads the required assets.
- Object Movement and Rotation: Moves an object forward and rotates it around its axis.
- Data Capture: Captures RGB images, depth data, and generates annotations.
- Visualization: Visualizes the annotations by overlaying bounding boxes on the captured images.
Load an USD in Isaac Sim
With Isaac Sim already open, the first step for synthetic dataset generation is to navigate to the Isaac Assets section, where you’ll find several preloaded scenes. Look for an asset named “warehouse_with_forklift”, as shown in the image below. To open the scene, simply double-click on it.
Now, you need to import another asset called RSD455. This is a camera that must be dragged and positioned inside the forklift, as shown in the image.
The RSD455 camera is embedded inside the forklift with the purpose of capturing everything in front of the forklift. This is essential to begin collecting the dataset for training a computer vision model using synthetic data. However, when the camera is embedded, it is not properly aligned or oriented to face the front of the forklift. Therefore, it must be manually positioned.
In the Transform section make sure the Offset toggle in the right hand corner is deactivated. If it is activated the word Offset will appear next to Transform.
To properly orient the camera, you need to enter the following values in the Transform section, specifically under Translate and Orient / Rotate:
Transform | X | Y | Z |
Translate | 0.04012 | -0.64179 | 1.06215 |
Orient / Rotate | 0.0 | 0.0 | -90.0 |
To verify the camera’s view, select the camera-shaped icon located in the upper left corner of the scene. Then click on “Camera” and select “Camera_Omnivision_0V9782_Color”. This will allow you to view something similar to the following image:
Usage Instructions
To use this script:
- Prerequisites:
- Ensure that NVIDIA Isaac Sim and all required modules are properly installed.
- Verify that the scene contains the objects and cameras specified by
prim_path
andcamera_path
.
- Customization:
- Object and Camera Paths: Adjust
prim_path
andcamera_path
to match your scene’s object and camera paths. - Initial and Final Positions: Set
init_pos
andfinal_pos
to define the movement path. - Number of Steps: Modify
number_of_steps
to control the movement granularity. - Rotation Parameters: Update
rotation_steps_list
to specify rotation angles and steps.
- Object and Camera Paths: Adjust
- Save Paths:
- Ensure the directories specified in
save_paths
exist or update them to valid paths. - The notebook saves RGB images, depth images, and annotations to these paths.
- Ensure the directories specified in
- Execution:
- Run the notebook within the Isaac Sim environment or as an extension.
- The script may require integration with the Isaac Sim event loop to function correctly.
- Results:
- Captured images and annotations will be saved in the specified directories.
- Visualized images with bounding boxes will be available for review.
Importing Necessary Modules
Open an Omniverse notebook in JupyterLab (see previous blog); the notebook begins by importing various modules required for the simulation and data processing:
import sys import os import asyncio import json import numpy as np import PIL.Image import cv2 import omni.kit.commands as cmd from pxr import Gf, Usd, Sdf, UsdGeom import omni.usd from omni.kit.viewport.utility import get_active_viewport, capture_viewport_to_file import omni.replicator.core as rep from omni.isaac.sensor import Camera from omni.isaac.core.utils import prims import omni.kit.app
These modules provide functionalities for:
- Handling 3D geometry and transformations (pxr, omni.usd).
- Image processing and data handling (numpy, PIL.Image, cv2).
- Asynchronous execution (asyncio).
- Interacting with Isaac Sim’s API (omni.* modules).
Auxiliary Functions
Moving objects
Moves an object to a new specified position:
def move_prim_to_pos(stage, prim_path, new_pos, debug=False): prim = stage.GetPrimAtPath(prim_path) success = prim.GetAttribute("xformOp:translate").Set(new_pos, 0) if debug: new_translation = prim.GetAttribute("xformOp:translate").Get(0) print("New position:", new_translation) print("Success:", success) return success
Moves a prim by a specified step vector:
def move_prim_step(stage, prim_path, step=None, debug=False): prim = stage.GetPrimAtPath(prim_path) old_pos = prim.GetAttribute("xformOp:translate").Get(0) if debug: print("Previous position:", old_pos) new_pos = old_pos + step success = prim.GetAttribute("xformOp:translate").Set(new_pos, 0) if debug: new_translation = prim.GetAttribute("xformOp:translate").Get(0) print("New position:", new_translation) print("Success:", success) return success
Calculates the step size needed to move from point a to point b in N steps:
def step_from_a_to_b(a, b, N): return (b - a) / N
Rotating Prims
Rotates a prim by a specified degree vector:
def rotate_prim(stage, prim_path, rotation_deg, debug=False): prim = stage.GetPrimAtPath(prim_path) xformable = UsdGeom.Xformable(prim) xformOps = xformable.GetOrderedXformOps() # Search for an existing rotation operation or create a new one rotationOp = None for op in xformOps: if op.GetOpType() == UsdGeom.XformOp.TypeRotateXYZ: rotationOp = op break if rotationOp is None: rotationOp = xformable.AddXformOp( UsdGeom.XformOp.TypeRotateXYZ, UsdGeom.XformOp.PrecisionFloat) if debug: print("Created a new rotation XformOp.") # Get the current rotation current_rot = rotationOp.Get(Usd.TimeCode.Default()) if current_rot is None: current_rot = Gf.Vec3f(0, 0, 0) if debug: print("Current rotation:", current_rot) # Calculate the new cumulative rotation new_rot = current_rot + rotation_deg rotationOp.Set(new_rot) if debug: print("New rotation:", new_rot)
Normalizing Depth Data
Normalizes depth data to prepare it for image representation:
def normalize_depth_data(depth_data): # Replace infinite or NaN values with zero depth_data = np.nan_to_num(depth_data, nan=0.0, posinf=0.0, neginf=0.0) depth_min = np.min(depth_data) depth_max = np.max(depth_data) # Avoid division by zero if depth_max == depth_min: normalized = np.zeros_like(depth_data) else: normalized = (depth_data - depth_min) / (depth_max - depth_min) normalized = (normalized * 255).astype('uint8') return normalized
Scene Setup
Set up the scene by obtaining the stage and loading the specified prim:
def setup_scene(prim_path, camera_path): stage = omni.usd.get_context().get_stage() object_prim = stage.GetPrimAtPath(prim_path) camera_prim = stage.GetPrimAtPath(camera_path) return stage, object_prim, camera_prim
Sets the initial position of the object. If no position is provided, it uses the current position or defaults to the origin:
def set_initial_position(stage, prim_path, init_pos): if init_pos is None: # Get the current position prim = stage.GetPrimAtPath(prim_path) init_pos = prim.GetAttribute("xformOp:translate").Get(0) if init_pos is None: init_pos = Gf.Vec3d(0, 0, 0) else: move_prim_to_pos(stage, prim_path, Gf.Vec3d(*init_pos), debug=False) return Gf.Vec3d(*init_pos)
Configuring Annotators
Configures the annotators to capture different types of data such as RGB images, depth data, and bounding boxes:
def configure_annotators(camera_path, resolution=(1024, 1024)): render_product = rep.create.render_product(camera_path, resolution) rgb = rep.AnnotatorRegistry.get_annotator("rgb") distance_to_camera = rep.AnnotatorRegistry.get_annotator("distance_to_camera") distance_to_image_plane = rep.AnnotatorRegistry.get_annotator("distance_to_image_plane") bbox_2d_loose = rep.AnnotatorRegistry.get_annotator("bounding_box_2d_loose") rgb.attach([render_product]) distance_to_camera.attach([render_product]) distance_to_image_plane.attach([render_product]) bbox_2d_loose.attach([render_product]) return rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose
Simulation Functions
Running the Simulation
Moves the object along a path and captures data at each step:
async def run_simulation(stage, prim_path, step, number_of_steps, rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose, save_paths, frame_idx=0): # Generate the frames and move the object at each step. for i in range(number_of_steps): # Move the object. move_prim_step(stage, prim_path, step, debug=False) # Update the scene. for _ in range(10): await omni.kit.app.get_app().next_update_async() # Run the orchestrator. await rep.orchestrator.step_async() # Capture the annotators' data. rgb_data = rgb.get_data() distance_to_camera_data = distance_to_camera.get_data() distance_to_image_plane_data = distance_to_image_plane.get_data() bbox_2d_loose_data = bbox_2d_loose.get_data() # Define the save paths for the images. path_to_rgb = save_paths['rgb'] + str(frame_idx) + ".png" path_to_dcam = save_paths['dcam'] + str(frame_idx) + ".png" path_to_dplane = save_paths['dplane'] + str(frame_idx) + ".png" path_to_bbox_2d = save_paths['bbox_2d'] + str(frame_idx) + ".npy" path_to_bbox_2d_info = save_paths['bbox_2d'] + str(frame_idx) + "_info.json" # Process and save rgb_data. rgba_image = PIL.Image.fromarray(rgb_data.astype('uint8')) rgb_image = rgba_image.convert('RGB') rgb_image.save(path_to_rgb) # Process and save distance_to_camera_data. dcam_normalized = normalize_depth_data(distance_to_camera_data) dcam_image = PIL.Image.fromarray(dcam_normalized, mode='L') dcam_image.save(path_to_dcam) # Process and save distance_to_image_plane_data. dplane_normalized = normalize_depth_data(distance_to_image_plane_data) dplane_image = PIL.Image.fromarray(dplane_normalized, mode='L') dplane_image.save(path_to_dplane) # Save bbox array and labels info. with open(path_to_bbox_2d, 'wb') as f: np.save(f, bbox_2d_loose_data["data"]) with open(path_to_bbox_2d_info, "w") as fp: json.dump(bbox_2d_loose_data["info"]["idToLabels"], fp) # Increment the frame counter. frame_idx += 1 return frame_idx
Rotating and Capturing Data
Rotates the object and captures data at each rotation step:
async def rotate_and_capture(stage, prim_path, rotation_steps, rotation_angle, rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose, save_paths, frame_idx): rotation_step = Gf.Vec3f(0, 0, rotation_angle / rotation_steps) for j in range(rotation_steps): # Rotate the object. rotate_prim(stage, prim_path, rotation_step, debug=False) # Update the scene. for _ in range(5): await omni.kit.app.get_app().next_update_async() # Run the orchestrator. await rep.orchestrator.step_async() # Capture the annotators' data. rgb_data = rgb.get_data() distance_to_camera_data = distance_to_camera.get_data() distance_to_image_plane_data = distance_to_image_plane.get_data() bbox_2d_loose_data = bbox_2d_loose.get_data() # Define the save paths for the images. path_to_rgb = save_paths['rgb'] + str(frame_idx) + ".png" path_to_dcam = save_paths['dcam'] + str(frame_idx) + ".png" path_to_dplane = save_paths['dplane'] + str(frame_idx) + ".png" path_to_bbox_2d = save_paths['bbox_2d'] + str(frame_idx) + ".npy" path_to_bbox_2d_info = save_paths['bbox_2d'] + str(frame_idx) + "_info.json" # Process and save the images and annotations. rgba_image = PIL.Image.fromarray(rgb_data.astype('uint8')) rgb_image = rgba_image.convert('RGB') rgb_image.save(path_to_rgb) dcam_normalized = normalize_depth_data(distance_to_camera_data) dcam_image = PIL.Image.fromarray(dcam_normalized, mode='L') dcam_image.save(path_to_dcam) dplane_normalized = normalize_depth_data(distance_to_image_plane_data) dplane_image = PIL.Image.fromarray(dplane_normalized, mode='L') dplane_image.save(path_to_dplane) # Save bbox array and labels info. with open(path_to_bbox_2d, 'wb') as f: np.save(f, bbox_2d_loose_data["data"]) with open(path_to_bbox_2d_info, "w") as fp: json.dump(bbox_2d_loose_data["info"]["idToLabels"], fp) # Increment the frame counter. frame_idx += 1 return frame_idx
Visualization of Annotations
Overlays bounding boxes on the captured images for visualization:
def visualize_annotations(num_frames, save_paths): for i in range(num_frames): # Paths to the bbox and label files. path_to_bbox_2d = save_paths['bbox_2d'] + f"{i}.npy" path_to_bbox_2d_info = save_paths['bbox_2d'] + f"{i}_info.json" image_path = save_paths['rgb'] + f"{i}.png" # Read the image in BGR format (default in OpenCV). image = cv2.imread(image_path) # Load bbox data bbox_data = np.load(path_to_bbox_2d) with open(path_to_bbox_2d_info, "r") as fp: id_to_labels = json.load(fp) # Convert bbox_data to a 2D array. bbox_array = np.array([list(b) for b in bbox_data]) # Iterate over each bounding box and draw it on the image. for bbox in bbox_array: obj_id = int(bbox[0]) xmin, ymin, xmax, ymax = bbox[1:5].astype(int) confidence = bbox[5] label = id_to_labels.get(str(obj_id), "Unknown") # Define the label text. text = f"{label}: {confidence:.2f}" # Choose a color for the bounding box (BGR). color = (0, 255, 0) # Verde # Draw the bounding box rectangle. cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, 2) # Get the text size. (text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1) # Draw a rectangle behind the text for better readability. cv2.rectangle(image, (xmin, ymin - text_height - 4), (xmin + text_width, ymin), color, -1) # Place the label text on top of the bounding box. cv2.putText(image, text, (xmin, ymin - 2), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1) # Save the image with the bounding boxes. output_path = save_paths['bboxes'] + f"image_with_bboxes_{i}.png" cv2.imwrite(output_path, image)
Main Execution Flow
Coordinates the entire workflow for synthetic dataset generation by setting up the scene, running the simulation, and visualizing the results.
Before defining the main workflow, you need to first define three things:
- The path to the forklift.
- Path to the camera.
- The path where each extracted frame and annotation from Isaac Sim will be saved.
To determine the path to the forklift, click on the Prim, and in the properties section at the bottom, you can see this path. The same process applies to finding the path to the camera. For our example, these paths are as follows:
prim_path = "/World/warehouse_with_forklifts/Forklift"
camera_path = "/World/warehouse_with_forklifts/Forklift/rsd455/RSD455/Camera_OmniVision_OV9782_Color"
your_path=”your_path/”
def main(prim_path="/World/warehouse_with_forklifts/Forklift", camera_path="/World/warehouse_with_forklifts/Forklift/rsd455/RSD455/Camera_OmniVision_OV9782_Color", init_pos=None, final_pos=None, number_of_steps=20, rotation_steps_list=[(10, -90), (20, 180)], save_paths=None): # Initialize paths. if save_paths is None: # Here you must define each path: SAVE_PATH_RGB = os.path.join(your_path,"imagen_rgb_") SAVE_PATH_DCAM = os.path.join(your_path,"imagen_dcam_") SAVE_PATH_DPLANE = os.path.join(your_path,"imagen_dplane_") SAVE_PATH_BBOX_2D = os.path.join(your_path,"bbox_2d_loose_") SAVE_PATH_BBOXES = os.path.join(your_path,"bboxes") os.makedirs(SAVE_PATH_RGB, exist_ok=True) os.makedirs(SAVE_PATH_DCAM, exist_ok=True) os.makedirs(SAVE_PATH_DPLANE, exist_ok=True) os.makedirs(SAVE_PATH_BBOX_2D, exist_ok=True) os.makedirs(SAVE_PATH_BBOXES, exist_ok=True) save_paths = { 'rgb': os.path.join(SAVE_PATH_RGB, "image_rgb_"), 'dcam': os.path.join(SAVE_PATH_DCAM, "image_dcam_"), 'dplane': os.path.join(SAVE_PATH_DPLANE, "image_dplane_"), 'bbox_2d': os.path.join(SAVE_PATH_BBOX_2D, "bbox_2d_loose_"), 'bboxes': os.path.join(SAVE_PATH_BBOXES, "bboxes_") } # Set up the scene. stage = omni.usd.get_context().get_stage() # Set the initial position. init_pos_vec = set_initial_position(stage, prim_path, init_pos) # Select the final position and the number of steps. if final_pos is None: final_pos_vec = init_pos_vec + Gf.Vec3d(0, 10, 0) else: final_pos_vec = Gf.Vec3d(*final_pos) # Calculate the required step. step = step_from_a_to_b(init_pos_vec, final_pos_vec, number_of_steps) # Calculate the required step. rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose = configure_annotators(camera_path) # Run the simulation. async def simulation_sequence(): frame_idx = 0 # Linear motion. frame_idx = await run_simulation(stage, prim_path, step, number_of_steps, rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose, save_paths, frame_idx) # Rotations specified in rotation_steps_list. for rotation_steps, rotation_angle in rotation_steps_list: frame_idx = await rotate_and_capture(stage, prim_path, rotation_steps, rotation_angle, rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose, save_paths, frame_idx) return frame_idx # Run the asynchronous simulation sequence. frame_idx = asyncio.ensure_future(simulation_sequence()) # Visualize the annotations. visualize_annotations(50, save_paths) # Keep the script running to allow asynchronous tasks to complete.
At the end of the notebook, the main function is called with specific initial and final positions:
main( init_pos = [-2.28838, -3.61546, 0], final_pos = [-2.28838, 10, 0] )
Once the execution of the main function is complete, the next step is to review the paths where the frames and annotations were saved to ensure everything was executed correctly. Below, we provide an image showing how a frame with the bounding boxes of the objects present in the scene should look.
Conclusion
This notebook automates the generation of synthetic datasets by moving and rotating objects within a simulated environment, capturing images, and generating annotations. This process is valuable for training machine learning models, particularly in scenarios where acquiring real-world data is challenging.
By customizing the movement, rotation, and capture parameters, users can generate diverse datasets tailored to their specific needs. The script serves as a foundation for more complex synthetic data generation workflows in Isaac Sim.
For similar blogs please visit Marvik Blogs.