Synthetic Data

Although real-world data is realistic, there are several drawbacks:

  • Excessive costs due to vehicle instrumentation, data collection campaigns, and data storage.
  • Data privacy issues, data annotation, data labelling and extracting reliable ground truth data can be costly.
  • Data might also be unavailable or incomplete early in development. 

Synthetic data might be useful since it:

  • comes with lower costs.
  • could be available at an early stage of development.
  • can be more complete than real-world data, as it can include corner and edge cases that are difficult to capture in the real world.
  • does not require the expensive annotation and labelling of ground truth data.
  • has no privacy issues.

Despite its advantages, synthetic data is imperfect and should be used alongside real-world data with caution, given its lower level of realism. 

In order to generate synthetic data using simulation, qualified simulation environments and skilled engineers with expertise in computer science, environmental and sensor modelling, and model validation are required.

For more details, please refer to the following AIthena publications:

  1. The synthetic data set, which contains the ground truth, physics-based sensor outputs: camera (.jpg), radar (.pcd) lidar (.pcd) and ideal depth map at: Synthetic data set generated using Simcenter Prescan – Vulnerable Road Users in urban driving scenario
  2. The user manual describing the synthetic dataset structure and data format at: SYNTHETIC DATA DESCRIPTION – Data generated using Simcenter Prescan
  3. A white paper about synthetic data benefits/drawbacks and possible usage at: Supporting automated driving systems development with synthetic data
  4. AIthena news article: Synthetic data – how to use it, what are the benefits and drawbacks?