December 5, 2024

Scene-Based Synthetic Dataset Generation (SDG) using Isaac Sim

By Juliana Faux, Andrés Escobar and Santiago Ferreiros

This blog summarizes the workflow for synthetic dataset generation using Isaac Sim. The provided code demonstrates how to automate object movement and rotation within a simulated environment, capture images from a camera, and generate annotations such as bounding boxes. This process is essential for creating synthetic datasets for training and testing computer vision models.

Overview

We are going to perform the following main tasks:

Scene Setup: Configures the simulation environment and loads the required assets.
Object Movement and Rotation: Moves an object forward and rotates it around its axis.
Data Capture: Captures RGB images, depth data, and generates annotations.
Visualization: Visualizes the annotations by overlaying bounding boxes on the captured images.

Load an USD in Isaac Sim

With Isaac Sim already open, the first step for synthetic dataset generation is to navigate to the Isaac Assets section, where you’ll find several preloaded scenes. Look for an asset named “warehouse_with_forklift”, as shown in the image below. To open the scene, simply double-click on it.

Now, you need to import another asset called RSD455. This is a camera that must be dragged and positioned inside the forklift, as shown in the image.

The RSD455 camera is embedded inside the forklift with the purpose of capturing everything in front of the forklift. This is essential to begin collecting the dataset for training a computer vision model using synthetic data. However, when the camera is embedded, it is not properly aligned or oriented to face the front of the forklift. Therefore, it must be manually positioned.

In the Transform section make sure the Offset toggle in the right hand corner is deactivated. If it is activated the word Offset will appear next to Transform.

To properly orient the camera, you need to enter the following values in the Transform section, specifically under Translate and Orient / Rotate:

Transform	X	Y	Z
Translate	0.04012	-0.64179	1.06215
Orient / Rotate	0.0	0.0	-90.0

To verify the camera’s view, select the camera-shaped icon located in the upper left corner of the scene. Then click on “Camera” and select “Camera_Omnivision_0V9782_Color”. This will allow you to view something similar to the following image:

Usage Instructions

To use this script:

Prerequisites:
- Ensure that NVIDIA Isaac Sim and all required modules are properly installed.
- Verify that the scene contains the objects and cameras specified by prim_path and camera_path.
Customization:
- Object and Camera Paths: Adjust prim_path and camera_path to match your scene’s object and camera paths.
- Initial and Final Positions: Set init_pos and final_pos to define the movement path.
- Number of Steps: Modify number_of_steps to control the movement granularity.
- Rotation Parameters: Update rotation_steps_list to specify rotation angles and steps.
Save Paths:
- Ensure the directories specified in save_paths exist or update them to valid paths.
- The notebook saves RGB images, depth images, and annotations to these paths.
Execution:
- Run the notebook within the Isaac Sim environment or as an extension.
- The script may require integration with the Isaac Sim event loop to function correctly.
Results:
- Captured images and annotations will be saved in the specified directories.
- Visualized images with bounding boxes will be available for review.

Importing Necessary Modules

Open an Omniverse notebook in JupyterLab (see previous blog); the notebook begins by importing various modules required for the simulation and data processing:

import sys
import os
import asyncio
import json
import numpy as np
import PIL.Image
import cv2
import omni.kit.commands as cmd
from pxr import Gf, Usd, Sdf, UsdGeom
import omni.usd
from omni.kit.viewport.utility import get_active_viewport, capture_viewport_to_file
import omni.replicator.core as rep
from omni.isaac.sensor import Camera
from omni.isaac.core.utils import prims
import omni.kit.app

These modules provide functionalities for:

Handling 3D geometry and transformations (pxr, omni.usd).
Image processing and data handling (numpy, PIL.Image, cv2).
Asynchronous execution (asyncio).
Interacting with Isaac Sim’s API (omni.* modules).

Auxiliary Functions

Moving objects

Moves an object to a new specified position:

def move_prim_to_pos(stage, prim_path, new_pos, debug=False):
    prim = stage.GetPrimAtPath(prim_path)
    success = prim.GetAttribute("xformOp:translate").Set(new_pos, 0)
    if debug:
        new_translation = prim.GetAttribute("xformOp:translate").Get(0)
        print("New position:", new_translation)
        print("Success:", success)
    return success

Moves a prim by a specified step vector:

def move_prim_step(stage, prim_path, step=None, debug=False):
    prim = stage.GetPrimAtPath(prim_path)
    old_pos = prim.GetAttribute("xformOp:translate").Get(0)
    if debug:
        print("Previous position:", old_pos)
    new_pos = old_pos + step
    success = prim.GetAttribute("xformOp:translate").Set(new_pos, 0)
    if debug:
        new_translation = prim.GetAttribute("xformOp:translate").Get(0)
        print("New position:", new_translation)
        print("Success:", success)
    return success

Calculates the step size needed to move from point a to point b in N steps:

def step_from_a_to_b(a, b, N):
    return (b - a) / N

Rotating Prims

Rotates a prim by a specified degree vector:

def rotate_prim(stage, prim_path, rotation_deg, debug=False):
    prim = stage.GetPrimAtPath(prim_path)
    xformable = UsdGeom.Xformable(prim)
    xformOps = xformable.GetOrderedXformOps()

    # Search for an existing rotation operation or create a new one
    rotationOp = None
    for op in xformOps:
        if op.GetOpType() == UsdGeom.XformOp.TypeRotateXYZ:
            rotationOp = op
            break
    if rotationOp is None:
        rotationOp = xformable.AddXformOp(
            UsdGeom.XformOp.TypeRotateXYZ, UsdGeom.XformOp.PrecisionFloat)
        if debug:
            print("Created a new rotation XformOp.")

    # Get the current rotation
    current_rot = rotationOp.Get(Usd.TimeCode.Default())
    if current_rot is None:
        current_rot = Gf.Vec3f(0, 0, 0)
    if debug:
        print("Current rotation:", current_rot)

    # Calculate the new cumulative rotation
    new_rot = current_rot + rotation_deg
    rotationOp.Set(new_rot)
    if debug:
        print("New rotation:", new_rot)

Normalizing Depth Data

Normalizes depth data to prepare it for image representation:

def normalize_depth_data(depth_data):
    # Replace infinite or NaN values with zero
    depth_data = np.nan_to_num(depth_data, nan=0.0, posinf=0.0, neginf=0.0)
    depth_min = np.min(depth_data)
    depth_max = np.max(depth_data)
    # Avoid division by zero
    if depth_max == depth_min:
        normalized = np.zeros_like(depth_data)
    else:
        normalized = (depth_data - depth_min) / (depth_max - depth_min)
        normalized = (normalized * 255).astype('uint8')
    return normalized

Scene Setup

Set up the scene by obtaining the stage and loading the specified prim:

def setup_scene(prim_path, camera_path):
    stage = omni.usd.get_context().get_stage()
    object_prim = stage.GetPrimAtPath(prim_path)
    camera_prim = stage.GetPrimAtPath(camera_path)
    return stage, object_prim, camera_prim

Sets the initial position of the object. If no position is provided, it uses the current position or defaults to the origin:

def set_initial_position(stage, prim_path, init_pos):
    if init_pos is None:
        # Get the current position
        prim = stage.GetPrimAtPath(prim_path)
        init_pos = prim.GetAttribute("xformOp:translate").Get(0)
        if init_pos is None:
            init_pos = Gf.Vec3d(0, 0, 0)
    else:
        move_prim_to_pos(stage, prim_path, Gf.Vec3d(*init_pos), debug=False)
    return Gf.Vec3d(*init_pos)

Configuring Annotators

Configures the annotators to capture different types of data such as RGB images, depth data, and bounding boxes:

def configure_annotators(camera_path, resolution=(1024, 1024)):
    render_product = rep.create.render_product(camera_path, resolution)
    rgb = rep.AnnotatorRegistry.get_annotator("rgb")
    distance_to_camera = rep.AnnotatorRegistry.get_annotator("distance_to_camera")
    distance_to_image_plane = rep.AnnotatorRegistry.get_annotator("distance_to_image_plane")
    bbox_2d_loose = rep.AnnotatorRegistry.get_annotator("bounding_box_2d_loose")
    
    rgb.attach([render_product])
    distance_to_camera.attach([render_product])
    distance_to_image_plane.attach([render_product])
    bbox_2d_loose.attach([render_product])
    return rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose

Simulation Functions

Running the Simulation

Moves the object along a path and captures data at each step:

async def run_simulation(stage, 
                         prim_path, 
                         step, 
                         number_of_steps, 
                         rgb, 
                         distance_to_camera, 
                         distance_to_image_plane, 
                         bbox_2d_loose, 
                         save_paths, 
                         frame_idx=0):
    # Generate the frames and move the object at each step.   
    for i in range(number_of_steps):
        # Move the object.   
        move_prim_step(stage, prim_path, step, debug=False)
        
        # Update the scene.   
        for _ in range(10):
            await omni.kit.app.get_app().next_update_async()
        
        # Run the orchestrator.   
        await rep.orchestrator.step_async()
        
        # Capture the annotators' data.   
        rgb_data = rgb.get_data()
        distance_to_camera_data = distance_to_camera.get_data()
        distance_to_image_plane_data = distance_to_image_plane.get_data()
        bbox_2d_loose_data = bbox_2d_loose.get_data()
        
        # Define the save paths for the images.   
        path_to_rgb = save_paths['rgb'] + str(frame_idx) + ".png"
        path_to_dcam = save_paths['dcam'] + str(frame_idx) + ".png"
        path_to_dplane = save_paths['dplane'] + str(frame_idx) + ".png"
        path_to_bbox_2d = save_paths['bbox_2d'] + str(frame_idx) + ".npy"
        path_to_bbox_2d_info = save_paths['bbox_2d'] + str(frame_idx) + "_info.json"
        
        # Process and save rgb_data.   
        rgba_image = PIL.Image.fromarray(rgb_data.astype('uint8'))
        rgb_image = rgba_image.convert('RGB')
        rgb_image.save(path_to_rgb)
        
        # Process and save distance_to_camera_data.  
        dcam_normalized = normalize_depth_data(distance_to_camera_data)
        dcam_image = PIL.Image.fromarray(dcam_normalized, mode='L')
        dcam_image.save(path_to_dcam)
        
        # Process and save distance_to_image_plane_data.   
        dplane_normalized = normalize_depth_data(distance_to_image_plane_data)
        dplane_image = PIL.Image.fromarray(dplane_normalized, mode='L')
        dplane_image.save(path_to_dplane)
        
        # Save bbox array and labels info.   
        with open(path_to_bbox_2d, 'wb') as f:
            np.save(f, bbox_2d_loose_data["data"])
        with open(path_to_bbox_2d_info, "w") as fp:
            json.dump(bbox_2d_loose_data["info"]["idToLabels"], fp)
        
        # Increment the frame counter.   
        frame_idx += 1
    
    return frame_idx

Rotating and Capturing Data

Rotates the object and captures data at each rotation step:

async def rotate_and_capture(stage, 
                             prim_path, 
                             rotation_steps, 
                             rotation_angle, 
                             rgb, 
                             distance_to_camera, 
                             distance_to_image_plane, 
                             bbox_2d_loose, 
                             save_paths, 
                             frame_idx):
    rotation_step = Gf.Vec3f(0, 0, rotation_angle / rotation_steps)
    
    for j in range(rotation_steps):
        # Rotate the object.
        rotate_prim(stage, prim_path, rotation_step, debug=False)
        
        # Update the scene. 
        for _ in range(5):
            await omni.kit.app.get_app().next_update_async()
        
        # Run the orchestrator.   
        await rep.orchestrator.step_async()
        
        # Capture the annotators' data.   
        rgb_data = rgb.get_data()
        distance_to_camera_data = distance_to_camera.get_data()
        distance_to_image_plane_data = distance_to_image_plane.get_data()
        bbox_2d_loose_data = bbox_2d_loose.get_data()
        
        # Define the save paths for the images.   
        path_to_rgb = save_paths['rgb'] + str(frame_idx) + ".png"
        path_to_dcam = save_paths['dcam'] + str(frame_idx) + ".png"
        path_to_dplane = save_paths['dplane'] + str(frame_idx) + ".png"
        path_to_bbox_2d = save_paths['bbox_2d'] + str(frame_idx) + ".npy"
        path_to_bbox_2d_info = save_paths['bbox_2d'] + str(frame_idx) + "_info.json"
        
        #   Process and save the images and annotations.
        rgba_image = PIL.Image.fromarray(rgb_data.astype('uint8'))
        rgb_image = rgba_image.convert('RGB')
        rgb_image.save(path_to_rgb)
        
        dcam_normalized = normalize_depth_data(distance_to_camera_data)
        dcam_image = PIL.Image.fromarray(dcam_normalized, mode='L')
        dcam_image.save(path_to_dcam)
        
        dplane_normalized = normalize_depth_data(distance_to_image_plane_data)
        dplane_image = PIL.Image.fromarray(dplane_normalized, mode='L')
        dplane_image.save(path_to_dplane)
        
        # Save bbox array and labels info.   
        with open(path_to_bbox_2d, 'wb') as f:
            np.save(f, bbox_2d_loose_data["data"])
        with open(path_to_bbox_2d_info, "w") as fp:
            json.dump(bbox_2d_loose_data["info"]["idToLabels"], fp)
        
        # Increment the frame counter.   
        frame_idx += 1
    return frame_idx

Visualization of Annotations

Overlays bounding boxes on the captured images for visualization:

def visualize_annotations(num_frames, save_paths):
    for i in range(num_frames):
        # Paths to the bbox and label files.  
        path_to_bbox_2d = save_paths['bbox_2d'] + f"{i}.npy"
        path_to_bbox_2d_info = save_paths['bbox_2d'] + f"{i}_info.json"
        image_path = save_paths['rgb'] + f"{i}.png"
        
        # Read the image in BGR format (default in OpenCV).    
        image = cv2.imread(image_path)
        
        # Load bbox data
        bbox_data = np.load(path_to_bbox_2d)
        with open(path_to_bbox_2d_info, "r") as fp:
            id_to_labels = json.load(fp)
        
        # Convert bbox_data to a 2D array.  
        bbox_array = np.array([list(b) for b in bbox_data])
        
        # Iterate over each bounding box and draw it on the image.   
        for bbox in bbox_array:
            obj_id = int(bbox[0])
            xmin, ymin, xmax, ymax = bbox[1:5].astype(int)
            confidence = bbox[5]
            label = id_to_labels.get(str(obj_id), "Unknown")
        
            # Define the label text.   
            text = f"{label}: {confidence:.2f}"
        
            # Choose a color for the bounding box (BGR).   
            color = (0, 255, 0)  # Verde
        
            # Draw the bounding box rectangle.   
            cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, 2)
        
            # Get the text size.   
            (text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
        
            # Draw a rectangle behind the text for better readability.   
            cv2.rectangle(image, (xmin, ymin - text_height - 4), (xmin + text_width, ymin), color, -1)
        
            # Place the label text on top of the bounding box.   
            cv2.putText(image, text, (xmin, ymin - 2), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1)
        
        # Save the image with the bounding boxes.   
        output_path = save_paths['bboxes'] + f"image_with_bboxes_{i}.png"
        cv2.imwrite(output_path, image)

Main Execution Flow

Coordinates the entire workflow for synthetic dataset generation by setting up the scene, running the simulation, and visualizing the results.

Before defining the main workflow, you need to first define three things:

The path to the forklift.
Path to the camera.
The path where each extracted frame and annotation from Isaac Sim will be saved.

To determine the path to the forklift, click on the Prim, and in the properties section at the bottom, you can see this path. The same process applies to finding the path to the camera. For our example, these paths are as follows:

prim_path = "/World/warehouse_with_forklifts/Forklift"

camera_path = "/World/warehouse_with_forklifts/Forklift/rsd455/RSD455/Camera_OmniVision_OV9782_Color"

your_path=”your_path/”

def main(prim_path="/World/warehouse_with_forklifts/Forklift",
         camera_path="/World/warehouse_with_forklifts/Forklift/rsd455/RSD455/Camera_OmniVision_OV9782_Color",
         init_pos=None,
         final_pos=None,
         number_of_steps=20,
         rotation_steps_list=[(10, -90), (20, 180)],
         save_paths=None):
    # Initialize paths.   
    if save_paths is None:
 # Here you must define each path: 
        SAVE_PATH_RGB = os.path.join(your_path,"imagen_rgb_")
        SAVE_PATH_DCAM = os.path.join(your_path,"imagen_dcam_")
        SAVE_PATH_DPLANE = os.path.join(your_path,"imagen_dplane_")
        SAVE_PATH_BBOX_2D = os.path.join(your_path,"bbox_2d_loose_")
        SAVE_PATH_BBOXES = os.path.join(your_path,"bboxes")
 
 os.makedirs(SAVE_PATH_RGB, exist_ok=True)
        os.makedirs(SAVE_PATH_DCAM, exist_ok=True)
        os.makedirs(SAVE_PATH_DPLANE, exist_ok=True)
        os.makedirs(SAVE_PATH_BBOX_2D, exist_ok=True)
        os.makedirs(SAVE_PATH_BBOXES, exist_ok=True) 


        save_paths = {
            'rgb': os.path.join(SAVE_PATH_RGB, "image_rgb_"),
            'dcam': os.path.join(SAVE_PATH_DCAM, "image_dcam_"),
            'dplane': os.path.join(SAVE_PATH_DPLANE, "image_dplane_"),
            'bbox_2d': os.path.join(SAVE_PATH_BBOX_2D, "bbox_2d_loose_"),
            'bboxes': os.path.join(SAVE_PATH_BBOXES, "bboxes_")
        }
    
    # Set up the scene.   
    stage = omni.usd.get_context().get_stage()
    
    # Set the initial position.   
    init_pos_vec = set_initial_position(stage, prim_path, init_pos)
    
    # Select the final position and the number of steps.   
    if final_pos is None:
        final_pos_vec = init_pos_vec + Gf.Vec3d(0, 10, 0)
    else:
        final_pos_vec = Gf.Vec3d(*final_pos)
    
    # Calculate the required step.   
    step = step_from_a_to_b(init_pos_vec, final_pos_vec, number_of_steps)
    
    # Calculate the required step.   
    rgb, distance_to_camera, distance_to_image_plane, bbox_2d_loose = configure_annotators(camera_path)
    
    # Run the simulation.   
    async def simulation_sequence():
        frame_idx = 0

        # Linear motion.   
        frame_idx = await run_simulation(stage, 
                                         prim_path, 
                                         step, 
                                         number_of_steps, 
                                         rgb, 
                                         distance_to_camera, 
                                         distance_to_image_plane, 
                                         bbox_2d_loose, 
                                         save_paths,
                                         frame_idx)
        
        # Rotations specified in rotation_steps_list.   
        for rotation_steps, rotation_angle in rotation_steps_list:
            frame_idx = await rotate_and_capture(stage, 
                                                 prim_path, 
                                                 rotation_steps, 
                                                 rotation_angle, 
                                                 rgb, 
                                                 distance_to_camera, 
                                                 distance_to_image_plane, 
                                                 bbox_2d_loose, 
                                                 save_paths, 
                                                 frame_idx)
        
        return frame_idx
    
    # Run the asynchronous simulation sequence.   
    frame_idx = asyncio.ensure_future(simulation_sequence())

    # Visualize the annotations.   
    visualize_annotations(50, save_paths)
    
    # Keep the script running to allow asynchronous tasks to complete.

At the end of the notebook, the main function is called with specific initial and final positions:

main(
    init_pos = [-2.28838, -3.61546, 0],
    final_pos = [-2.28838, 10, 0]
    )

Once the execution of the main function is complete, the next step is to review the paths where the frames and annotations were saved to ensure everything was executed correctly. Below, we provide an image showing how a frame with the bounding boxes of the objects present in the scene should look.

Conclusion

This notebook automates the generation of synthetic datasets by moving and rotating objects within a simulated environment, capturing images, and generating annotations. This process is valuable for training machine learning models, particularly in scenarios where acquiring real-world data is challenging.

By customizing the movement, rotation, and capture parameters, users can generate diverse datasets tailored to their specific needs. The script serves as a foundation for more complex synthetic data generation workflows in Isaac Sim.

For similar blogs please visit Marvik Blogs.