How deep learning enables autonomous vehicles to understand their environment

Humans are constantly taking in data from the world around them using five primary senses. You hear your phone ring, see a notification on your computer screen or touch something hot.

However, without perception, there’s no way to decipher those inputs and determine what’s relevant. That you should answer the call, know there’s an email to respond to or pull away your hand before it’s burned.

Now imagine driving on a highway, where a constant stream of information surrounds you. From lane markings and street signs to lane-splitting motorcyclists, merging trucks and traffic jams – the ability to make instant, informed decisions is not just a skill, it’s an imperative.

Just as perception enables humans to make instant associations and act on them, the ability to extract relevant knowledge from immediate surroundings is a fundamental pillar for the safe operation of an autonomous vehicle.

With the power of perception, a car can detect vehicles ahead using cameras and other sensors, identify if they become potential hazards and know to continuously track their movements.

This capability extends to the 360-degree field around the vehicle, enabling the car to detect and track all moving and static objects as it travels.

Perception is the first stage in the computational pipeline for the safe functioning of a self-driving car. Once the vehicle is able to extract relevant data from the surrounding environment, it can plan the path ahead and actuate, all without human intervention.

Finding the signal through the noise

Autonomous vehicle sensors generate massive amounts of data every second. From other cars, to pedestrians, to street signs, to traffic lights, every mile contains indicators for where the self-driving car should and shouldn’t go.

Identifying these indicators and determining those needed to safely move is incredibly complex, requiring a diversity of deep neural networks working in parallel.

The Nvidia Drive software stack – a primary component of the Nvidia Drive platform – contains libraries, frameworks and source packages that allow the necessary deep neural networks to work together for comprehensive perception.

These networks include DriveNet, which detects obstacles, and OpenRoadNet, which detects drivable space. To plan a path forward, LaneNet detects lane edges and PilotNet detects drivable paths.

Nvidia Drive software enables this integration by building on top of highly optimized and flexible libraries. These diverse networks run simultaneously and can overlap, providing redundancy, a key element to safety.

‘Inherently safe’

In addition to the redundancy within the perception layer, these networks back up the overall function of the vehicle, enhancing safety at every level, according to Nvidia, which claims the system is “inherently safe”.

For example, the car’s high-definition map can indicate a four-way intersection, and when paired with real-time sensor data, the perception layer shows the car precisely where to stop, enabling a more powerful way to pinpoint the car’s location.

Perception also contributes to the diversity of autonomous vehicle capabilities, enabling the car to see the world with the same sophistication as humans.

Rather than just identify obstacles, it can discern stationary objects as well as moving ones, and determine their path.

With added software capabilities, like those offered by Nvidia partner Perceptive Automata, the car can even predict human behavior by reading body language and other markers.

This added human behavior perception capability can run simultaneously with other algorithms operating an autonomous vehicle thanks to the computing horsepower from the Nvidia Drive platform.

With this combined hardware and software solution, developers are continuously adding new perception abilities to the car’s self-driving brain.