Robotics & Automation News

Market trends and business perspectives

Data company Scale releases massive self-driving dataset

Data and artificial intelligence developer Scale has released a huge amount of data relating to self-driving vehicles.

The data provides annotation for the “nuScenes dataset”, which is led by nuTonomy, an autonomous car developer.

Scale claims this is the “largest open source multi-sensor self-driving dataset available to public”. 

Autonomous vehicle innovators and researchers to get access to nearly 1.4 million camera images, 400,000 lidar – light detection and ranging – sweeps, and 1.1 million 3D boxes.

Scale says the multi-sensor aspect means data is from lidar, radar and cameras.

Academic researchers and autonomous vehicle innovators can access the open-sourced dataset at nuScenes.

Scale’s Sensor Fusion Annotation API, which leverages machine learning, statistical modeling, and human labeling to process lidar, radar, and camera sensor data into impeccable ground truth data, played a critical role in the creation of this new standard.

The nuScenes open source dataset is based on lidar point cloud, camera sensor, and radar data sourced from nuTonomy and then labeled through Scale’s sophisticated and thorough processing to deliver data ideal for training autonomous vehicle perception algorithms.

The open source tool made available by nuTonomy and Aptiv surpasses the public Kitti dataset, Baidu ApolloScape dataset, Udacity self-driving dataset, and the even the more recent Berkeley DeepDrive dataset that have until now served as the standard for academic and even industry use.

nuScenes provides significantly greater data volume, accuracy, and precision, claims Scale, adding: “The full dataset will include 1,000 twenty-second scenes, nearly 1.4 million camera images, 400,000 lidar sweeps, and 1.1 million 3D boxes.”

Similar to radar, lidar emits invisible infrared laser light that reflects off surrounding objects, allowing systems to compile three-dimensional point cloud data maps of their environments and identify the specific objects within them.

Correctly identifying surrounding objects from lidar data allows autonomous vehicles to anticipate those objects’ behavior – whether they are other vehicles, pedestrians, animals or other obstacles – and to safely navigate around them.

In this pursuit, the quality of a multi-sensor dataset is a critical differentiator that defines an autonomous vehicle’s ability to perceive what is around it and operate safely under real-world and real-time conditions.

Alexandr Wang, CEO, Scale, says: “We’re proud to provide the annotations for nuScenes as the most robust open source multi-sensor self-driving dataset ever released.

“We believe this will be an invaluable resource for researchers developing autonomous vehicle systems, and one that will help to shape and accelerate their production for years to come.”

Oscar Beijbom, machine learning lead at nuTonomy, an Aptiv company, says: “Our partnership with Scale on the production of the annotations for nuScenes is a milestone for AV innovators and the academic community.

“Scale’s outstanding agility, tooling, scalability and quality made them our preferred partner and the natural choice for this project.”

Scale, whose autonomous vehicle customers also include Lyft, General Motors through its Cruise business unit, Zoox, Nuro and many others, recently raised $18 million in Series B funding.