Robotics & Automation News

Market trends and business perspectives

Insight: How neural networks are powering autonomous delivery

By Tanel Pärnamaa, deep learning engineer at Starship Technologies

Artificial neural networks are one of the main tools used in machine learning to convert unstructured, low-level data into higher level information. As the ‘neural’ part suggests, they are brain-inspired systems intended to replicate the way that humans learn.

At Starship, we are building a fleet of autonomous delivery robots using trainable units, including mostly neural networks, where code is written by the model itself. The robots are recording and processing large sets of data in order to recognise surroundings and how to react in real time as each situation requires.

To begin with, the robots start by gaining a sense of the world through radars, a multitude of cameras and ultrasonics. However, this poses challenges because most of this knowledge gained is low-level and non-semantic. 

For example, a robot may be able to sense that an object (pedestrian, bicycle, animal and so on) is 10 metres away, but without knowing what category the object falls into it’s much harder for the robot to make a decision on the best course of action. It’s machine learning through neural networks that is extremely useful in converting this unstructured low-level data into higher level information.

In the case of robots that drive safely on pavements/sidewalks and need to cross streets, it is critical for there to be an understanding of the surrounding environment in real time. Take the example above of pedestrians or cyclists; it’s not only vital to be aware of their presence, but also what direction they are moving in and how quickly.

To be able to ascertain this sort of information, a central component is an object detection module – a program that inputs images and returns a list of object boxes. But this in itself is not straightforward because an image is a large 3D array made up of a multitude of numbers that represent pixel intensities.

These values change significantly in different environments. For example, if an image is taken at night rather than during the day, when the object’s colour or position changes, or when the object itself is obstructed.

This means that in some cases teaching is a better solution than programming. As mentioned, at Starship we have a set of trainable units, mostly neural networks, where the code is written by the model itself. The program is represented by a set of weights and we can visualise what each specific neuron is trying to detect.

For example, the first layers of our network activate to standard patterns like horizontal and vertical edges. The next block of layers detect more complex textures, while higher layers detect car parts and full objects.

Our engineers present the model examples of what they would like to predict and ask the network to get better at doing so the next time it sees a similar input. By iteratively changing the weights, the optimisation algorithm searches for programs that predict bounding boxes (imaginary boxes around objects that are being checked for collision) more and more accurately.

However, when teaching a machine, big data is merely not enough. The data collected must be rich and varied. For example, only using equally sampled images and then annotating them would display numerous pedestrians and cars for example, but the model would lack examples of bicycles, animals or other objects to reliably detect these categories as well.

At the same time, annotating data takes time and resources. Ideally, it’s best to train and enhance models with less data. This is where architecture engineering comes into play, in terms of encoding prior knowledge into the architecture and optimisation processes to reduce the search space to programs that are more likely in the real world.

This is useful in the case of autonomous delivery because the model needs to know whether a robot is on a pavement/sidewalk or crossing a road. By encoding the relevant global context into the neural network architecture, the model then determines whether to use it or not without having to learn it from scratch each time.

Neural networks empower Starship’s robots to be safe on road crossings by avoiding obstacles like cars, and on sidewalks by understanding all the different directions that humans and other obstacles can choose to go.

To date, the robots have safely travelled over 125,000 miles in more than 20 countries and 100 cities. They are on streets in cities around the world right now on a daily basis, navigating pavements, crossings and pedestrians, without the support of a human pilot or large processing systems, to offer automated deliveries to consumers.