Dynamic vision sensors and event cameras could reduce machine perception’s computing workload
Rewired’s Santiago Tenorio and Jae Yong-Lee explain dynamic vision sensors and event cameras in this exclusive interview
Just about every machine that moves requires some sort of sensing technology which informs it of its exact location and its position within its environment and its proximity to other objects within that environment.
One of the most well-known computational solutions for such sensing systems is called “simultaneous localisation and mapping”, or SLAM, which can be found in things like robotic vacuum cleaners, warehouse robots and other moving vehicles.
In recent years, another technology called “light detection and ranging”, or LiDAR, has also received much attention because it seems to be popular with automakers.
LiDAR does a similar thing to SLAM, and although both, of course, require some detection componentry – namely, lasers and sensors – each one is at heart dependent on algorithms.
Industrial robots tend to use vision systems like cameras with resolutions of about 2 megapixels, and a fast frame rate, and they too, of course, require significant amounts of computing and rely on algorithms.
Sensors relating to the well-known “global positioning system”, or GPS, are also used in many machines, although they’re probably not necessary for most industrial robots since they stay within a relatively small area within a factory.
GPS sensors can, of course, be found in road vehicles of all types as well as mobile devices such as smartphones, although they, of course, do not move under their own steam.
Then, of course, you have what might be described as conventional cameras, with their sensors to capture images just like cameras have always done from way back when they were invented.
The list of sensors and systems used in machines that move – robots, cars or anything else – is quite extensive and growing all the time. But they all rely on a combination of hardware and software.
It’s a big market with a good number of innovations being developed.
There’s been something of a revolution in the sensor market as a result of prices being brought down because of the massive demand from the smartphone sector.
These days, you can even get sensors for detecting smells – from a company called Aromyx, which claims to be “digitising taste and smell”.
The phrase “there’s an app for that” could easily be “there’s a sensor for that”.
One of the most important new innovations in the field is something called the “dynamic vision sensors”, which form the basis of another new technology called “event cameras”.
This is according to activist investors Santiago Tenorio and Jae-Yong Lee.
Tenorio is a general partner at Rewired, a robotics-focused venture capital firm. Lee is an investment manager at the same company. Both spoke exclusively to Robotics and Automation News.
Tenorio and Lee are following the progress of the dynamic vision sensors and event cameras, and are of the opinion that the technology not only represents a good investment but could also be the answer to some of the critical questions in the robotics and automation sector.
All of the technologies mentioned above are dependent on computing, which is an important point to make at this stage because if there’s one thing that everyone in the robotics and automation industry is preoccupied with now it’s computing – or rather, reducing the requirement for computing.
Reducing the requirements for memory, processing, storage, bandwidth, power, and latency – which can be thought of as response time – is of vital interest to technologists everywhere because almost everything relies on computing.
The problem with technology is computing
Dynamic vision sensors or event cameras are very new, and most research and development work is still in the academic sector.
Only one company in the world is committing a significant amount of money into bringing it to market, but when you know that the company in question is Samsung, it’s certainly worth taking an interest.
The principle behind dynamic vision sensors is simple: whereas a conventional camera would record everything all the time, an event camera only detects or records changes in a particular scene.
Obviously, that means the amount of computer memory and data processing that an event camera requires is significantly less than a conventional camera.
Tenorio, whose academic background includes a wide range of technologies including cognitive science, explains everything from the beginning, starting with what he calls “state-of-the-art” technologies, meaning what’s on the market, for dealing with tasks such as machine vision, SLAM and so on.
Tenorio says: “When you think about how a robot would perceive the world and navigate an environment, for example in a GPS-denied environment where you don’t have external GPS data to guide its navigation.
“There are really two approaches to solving that problem. You can either build an indoor GPS, which is also called a motion-capture system.
“An alternative way is to rely on onboard cameras and computing, and visual SLAM operation systems to guide the robot and enable it to map and navigate its environment.
“The difference between these two approaches is that in the first approach – indoor motion capture – it essentially makes a robot blind because it relies on external computing, while the second approach really makes the robot able to see the world.
“State-of-the-art or current perception algorithms are mature but not robust. What I mean by that is there are limitations in terms of latency.
“Typical vision algorithms today have latencies of between 50 to 200 milliseconds, which puts a halt on the maximum agility of the platforms, so we really do need faster sensors and algorithms.
“There are other problems with robustness and the performance of these algorithms in low-texture environments, high-dynamic-range scenes and motion blur, which are all situations in which the [event] cameras we will describe excel.
“The agility of a robot is limited in large part by the latency in its sensing pipeline.
“So if you have a sensor that is much, much faster, such as a high-dynamic-vision sensor, the robot will be able to respond to its environment and navigate it more precisely.
“When a typical camera’s latency is between 30 to 10 milliseconds, the temporal resolution of a dynamic vision sensor is much, much faster – much faster than 1-microsecond latency.”
Temporal resolution refers to the precision of a measurement with respect to time.
Tenorio continues: “In addition to that, it’s able to perform in high-dynamic-range situations. What I mean by that is, for example, imagine turning in an autonomous vehicle and seeing the sun in a clear sky – similar to the situation the Tesla car faced two years ago.
“With a traditional computer vision system that is relying on a normal frame-based camera, the sensor will be completely blinded by the sun’s power.
“Whereas, if you used an event camera which has a dynamic-range resolution of 140 dp [density independent pixel] instead of 60 dp, it would be able to spot the obstacle ahead of it and avoid catastrophe.
“So that’s another example of the superiority of these new sensors, at least as demonstrated in lab experiments.
“Another characteristic of event cameras that separate them from traditional cameras is that they have a high optic rate of 1 megahertz versus 30-60 hertz for regular cameras, so this means that it’s 1 million times faster than a regular camera and it’s able to process much more information.
“And it’s also low power, consuming about 1/100th the energy of a standard camera – 20 milliwatts instead of 1.5 watts for a standard camera.
“So when you think about potential applicability in the internet of things, where you might have sensors collecting information all the time and you may be replacing batteries every couple of days, it opens up more commercial applications for these sensors.
“Drone navigation is another example where you don’t want a power-hungry sensor sucking up a large fraction of the power that the drone has.
“So, there’s different advantages to these sensors which make them superior to traditional vision sensors.”
Lee emphasises the advantage of energy efficiency, something which is critical even if the device is small. The more energy-efficient a device is, the more it will save on batteries and be capable of greater levels of computing.
Lee says: “The low-power requirement is becoming more important for all machines and devices, not only for drones. But we know that typical drones only have an operational timespan of around 20-25 minutes – that’s the industry standard.
“The reason for that is that, with the current lithium battery technology, it’s really difficult to sustain such flying machines for much longer.
“And if you have a sensor payload, such as LiDAR, which consumes a lot of battery power, you will significantly decrease that time to 10 or even 5 minutes. And that’s not commercially viable.
“So the lower-power factor would be advantageous there as well.
“And when you think about a large vehicle – for example, cars – you might not think that vehicle will encounter that power issue, but over the next decades, we will move into electric vehicles, and cars already have a pretty hefty sensor payload, including short-range radar, long-range radar, LiDAR, sonar, and existing cameras, and when you transition to an electric vehicle model, where most cars on the road do not have an internal combustion engine, then you will want to move towards sensors and other energy-efficient ways for the computers to run – because basically, cars will be computers on wheels.”
The solution to computing is sensors
Tenorio and Lee are both confident that dynamic vision sensors and event cameras have important contributions to make in terms of making robots, autonomous vehicles and many other technologies work better, or significantly more efficiently.
But the technology is still at the relatively early stage. And even if Samsung is backing it, doesn’t mean it will take off.
Betamax video was a superior format to VHS, but we all know what the market chose.
But unlike that particular technology juncture, where consumers – who can often be swayed by marketing more than technical details – made the critical purchasing decisions, dynamic vision sensors and event cameras will be judged by technologists and business people.
And while nothing is guaranteed, the power and computing advantages would seem to recommend the new technology.
“It’s still at the early stage,” says Tenorio. “There’s a number of labs around the world that are pioneering research in the field. A lot of the top researchers in this field are based in Zurich, Switzerland.
“There’s also a team based at the University of Pennsylvania that has been experimenting with these sensors for the past couple of years, in machine perception specifically.
“Then there’s another team in the UK, at Imperial College.
“At the academic level, a number of labs around the world have been researching this technology in the field of drone navigation specifically and other visual SLAM applications.
“In terms of commercialising the technology, Samsung has announced that they are going to be working on mass producing their version of these cameras.
“There are also some spin-offs from the University of Zurich that are commercialising the hardware.
“But I would say this is still very much the early stages.”
Lee adds: “Our opinion on this technology is that, basically, LiDAR is five years ago. We can really see this new technology succeeding because of its stability, robustness and versatility in many different applications.
“We strongly believe this will take off and the technology will be present in many different applications.”
A virtual success, augmented with reality
Sometimes it’s easy to get carried away with enthusiasm for technologies that are thought to have huge market potential, as dynamic vision sensors and event cameras seem to have.
Not only could they find a place in every car and robot in the world, dynamic vision sensors would seem to have a lot of promise in fields such as virtual and augmented reality because they “enable you to solve for what’s called ‘inside-out’ positioning of the devices”, as Tenorio puts it.
Virtual reality headsets such as the HTC Vive, Oculus, Sony and so on all rely on external sensors for positioning, meaning you may have to put sensors at all four corners of a room.
A dynamic vision sensor only detects motion and, according to Tenorio and Lee, would not need any additional infrastructure outside of a device – such as a smartphone, smart glasses, VR headsets, or a human-machine interface – for augmented reality to work more efficiently.
Dynamic vision sensors are being miniaturised and could very soon be part of the payload of the AR and VR headsets as well as mobile phones.
So, it would seem that the market is potentially massive.
But Tenorio and Lee are level-headed about what is, at the very least, a reasonably good investment opportunity.
Tenorio says: “If you think about the sensor payload of an autonomous vehicle, one view that I have on how this market could evolve is for neuromorphic cameras or these dynamic vision sensors to replace the traditional camera in that sensor payload.
“We’re not saying it’s going to replace all sensors, but it could definitely have a role in the overall payload because of the advantages it has over traditional cameras.”
Jae elaborates: “We don’t believe that one single sensor is the only solution required for all problems.
“It’s really about sensor fusion, and it’s only by redundancy [the duplication or back-ups of critical functions] of data and also complimentary sensors that we can really make a robust and resilient system that is fail-proof, especially for drones and autonomous vehicles that will be operating alongside humans and might cause accidents, so you really want them to be robust.
“And for that, I think it’s actually dangerous to believe that there is that one sensor to rule them all.
“The dynamic vision sensor is definitely not a sensor that can solve all the problems, but added to the existing sensor suite and payload that exists now, they can hugely augment the current sensor payload – that is what we‘re working towards.”
Rewired, the company for which Tenorio and Lee manage investments, has venture capital funds totalling approximately $100 million.
That’s obviously a lot of money, which perhaps means enormous responsibility and pressure. Add to that the fact that Rewired is focused specifically on robotics, which in many ways is a nascent field because new technologies are emerging all the time and understanding them and figuring out how and why they should be brought to market is probably quite challenging.
So how does one approach the job of investing in such a field, where there is not much financial analysis to go on – just promising discoveries in research labs and new startups with energy and what seem like good ideas?
Some might take what might be a “technology-first” approach, meaning, once you make something that you think is clever or useful in some way, and then find a market for it afterwards.
But Tenorio says his approach is from the opposite direction. “I think for me it starts with understanding the market and some of the limitations of the state-of-the-art technologies, some of the challenges that innovators in a particular field – like autonomous navigation – are facing.
“And then going out and discovering technologies that could help address those concrete problems.
“In the case of dynamic vision sensors, we see the technology potentially helping to address a number of different challenges in autonomous vehicles, for instance, and also challenges in AR, like we alluded to.
“But it feeds both ways as well. You can also observe what industry players are doing by their presence at different academic institutions, by their relationships with various researchers, by their corporate development, and their corporate venture capital bets.
“You also start to identify validating data points to support a thesis. So if you start seeing a certain Tier 1 supplier to OEMs, or OEMs in general, looking at a particular technology space, that gives you an idea that you’re going along the right path.
“It helps support a thesis for that technology and maybe gives you a view towards an exit if you were to invest in that technology.”
Lee – who also has considerable experience in technology, largely computing and robotics – agrees, but adds something that provides an insight into what might be one of the day-to-day activities of a technology investment company.
Lee says: “I think another way is – and this is not necessarily a more business-focused way – to look at patents.
“For example, the different patents that Apple or Amazon are filing – by following the trends in what kind of technologies are being patented by, let’s say Apple in terms of technologies that can help autonomous vehicles, or for Amazon, it would typically be drones and warehouse automation.
“If you look at the patents carefully, you can see a pattern of what direction those innovators and those companies are going towards.
“That’s also how we understand the trends that maybe the mainstream media or investors or people, in general, are missing.
“There may be things developing behind the scenes or going under the radar, and we might only see them popping up in three or four years when those technologies are mature enough and reliable enough to be rolled out for the mass market.
“That’s one of the properties we use also to assess the market acceptance and the viability of these technologies if we want to commercialise them – if we see the potential to commercialise them.”