The transcript of this conversation has been edited for clarity.
LUKE RENNER. This is Advanced Autonomy. I'm Luke Renner. You know, one of my favorite things about working in this space is that we are at the very, very beginning. And there are still tons of problems in the autonomous vehicle sector that we're trying to solve. These problems span all areas of in-vehicle technology to wider infrastructure communication.
My guest today is the VP of Engineering and the Head of Autonomy at Cyngn, a self-driving industrial vehicle company that we both work for. In this conversation, he's going to give us an engineer's insider look into the three main problems that autonomy has yet to solve: occlusion (which is seeing beyond blind spots), prediction, and the coordination of a network vehicle fleet.
Hi, Biao, welcome back.
BIAO MA. Hey, Luke. Happy to be back.
LUKE RENNER. So, let's dive into those. The first one you wanted to talk about today was occlusion. What is occlusion?
BIAO MA. Occlusion is there is a third object between you and the target object you're trying to perceive. You have one object you're trying to perceive but for some reason, there is a different object in front of you, blocking the first object. We call this situation occlusion and this is a problem that has a technical impact on a series of subsystems in autonomy.
LUKE RENNER. So occlusion is a middle object, blocking the view of the driver or the autonomous vehicle and preventing it from seeing another object?
BIAO MA. Yes. The middle object could be a moving object. It could be a static object, could be a vehicle, a tree, or a building.
LUKE RENNER. And this is a problem because just like humans can't see through trees, a lot of our sensory technology also can't travel through objects.
BIAO MA. Yes. So most of the sensors are bounded by light traveling straight. Certain technologies are being developed to have a certain level of penetration but not so far.
LUKE RENNER. Got it. So, I know that occlusion is not only about making it easier to see, it's also about making it easier to understand when you can't see. What can you tell me about that?
BIAO MA. That's true. There are two sides to this. The second angle of that is in certain scenarios, you should have that capability to anticipate or expect that something might happen or that some object might be there. So, this ability to anticipate should be built into your system.
LUKE RENNER. Okay. So this is clear. So what are researchers doing to try to solve the problem of occlusion?
BIAO MA. There are two branches of exploration to try to push the technology forward.
LUKE RENNER. Okay.
BIAO MA. One is really about, how do you see beyond the line of sight? Basically, yes, we know, light travels straight. And there is not really a good way to change that. But some messages and systems could be explored to see beyond that. For example, you could have collaborative perception. You have fleet or crowd perception points that could contribute to better perception.
LUKE RENNER. So, you're talking about using the other cars on the road to see for the other cars on the road?
BIAO MA. They don't have to be cars, right? It could be a different perception unit providing that to you. Maybe in the corner of the street, which you know, is be a school area, you’d have an intelligent light pole, sending perception, either info or objects or raw data, to you, right?
So that really gives you not only one eye, but also a crowd of eyes, looking at the street.
LUKE RENNER. So, that's one way. You said there were two ways that researchers are trying to solve the occlusion problem. What is the other way?
BIAO MA. So the second is one going in a different direction. The first is about how do you see objects. The second is about even though you can’t see a particular object, can you be smarter to anticipate that there will be danger or objects that you need to respond to?
LUKE RENNER. So, the second branch is training the vehicle to develop an intuition about things that it can't see?
BIAO MA. Yep.
LUKE RENNER. And humans do this naturally, right? Like, around a schoolyard, we slow down. When we're driving over a mountain pass, we behave differently because a deer could jump in front of the road at any moment. And so it's about kind of giving the autonomous vehicle some of that intuition and some of that intelligence.
BIAO MA. Exactly but there's always a constant trade-off of responsiveness and stability, right? So, you can't really achieve both. You want to make sure your system has a certain level of responsiveness. Just as you described, a car could come up out of nowhere, and you need to be able to quickly respond to that. On the other hand, you cannot be fragile. You cannot have a driving system that is constantly changing because of things that may or may not be relevant to you, right?
LUKE RENNER. You can’t be a scaredy-cat driver?
BIAO MA. Exactly. You can’t be a scaredy-cat.
LUKE RENNER. Okay, so this is really interesting. Let's transition to the second major obstacle: prediction. So before we get into why predictions are so difficult, I'd like you to actually define what you mean when you say prediction.
BIAO MA. Prediction is about predicting the trajectory of a target object in the coming frames — typically three to five seconds. By trajectory, I mean the location and the speed of the perceived object.
LUKE RENNER. So, it's about looking at objects that are moving and guessing where they're going to be in a few seconds later, is that right?
BIAO MA. Yes. But it could be one or many trajectories that get predicted.
LUKE RENNER. So making predictions, not only includes where the object may be headed but also a range of possibilities of where the object could head. Human drivers know that children are far less predictable than adults. So, driving by a kid usually necessitates greater care. So, in a situation like that, presumably, the AV takes in the information of its surroundings, runs it through AI processing, and uses all of this to make a decision.
So, can you give us a little bit of insight into how this is actually possible, you know? Show us under the hood.
BIAO MA: There are three key technical factors for prediction. One is really about the semantic. We can think of semantic as the meaning bounded by the environment or a segmented environment that you're driving. For example, this lane is left turn only; this lane is straight only.
So basically, you can think of a particular section of the driving environment having a common role that is bounded by either the traffic or social convention or a specific area of the targeted driving space.
LUKE RENNER. So, it's the context. It's the way we can drive on a two-lane highway and not freak out when a car passes us.
BIAO MA. Yeah, exactly. So you have a two-directional highway, even though the upcoming vehicle is close to you and relatively fast, the vehicle or the system shouldn't freak out. Because, you know, the semantic is saying that that lane over there is heading in the opposite direction and this is expected behavior.
LUKE RENNER. Alright, so that was the first one. And you said there were a couple of others. What are they?
BIAO MA. The second is that objects are, at the end of the day, bounded by physical limitations. Sometimes we say there's no Superman, right? Okay, so the kid may be highly unpredictable in terms of movement but the kid can’t fly.
LUKE RENNER. So the system can differentiate between a car that could go 60 miles an hour to run you over versus a child who cannot?
BIAO MA. Exactly. Exactly.
LUKE RENNER. Okay. And then what's the third?
BIAO MA. The third thing is under the hood and there are at least three layers in this category. The first one is relevancy prediction. So, relevancy prediction is there are many objects around me, right? So, a lot of times half these nearby objects are not important nor relevant at all. The question becomes, how do you differentiate, how do you predict that this is or is not relevant? For example, if you are taking a right turn at the intersection, and there’s a car across from you also taking a right turn, both of you are turning right but you aren’t relevant to that driver and that vehicle is not relevant to you.
So that’s the first layer. The second layer is predictions based on trajectory, using the current motion of the target object with the current velocity of the object. So, the question is will this object be important for you to alter your trajectory?
The third is really about whether there is a significant crosspoint of your trajectory that forces you to respond. Or are there other implications of the object by the nature of its classification or some implication related to the place you are traveling at?
There is a little ball in front of you. Do we have the intelligence to anticipate that maybe there is also a little boy because this is a school area?
So that is, again, this is a topic to be further explored and implemented.
LUKE RENNER. So, just to make sure I understand. So the first thing is, there's a lot of objects in the space. So the first filter is the object anywhere near us and relevant? The second filter is is the object moving toward us in a way that could be relevant? And the third is, is there some context around the object that might make it behave in a way that requires additional caution? Are those the three filters?
BIAO MA. Simply speaking, yes. And in terms of methods to do this, there are learning-based methods, components of which we call predictors or evaluators. Predictors can be specific algorithm-based, optimization-based, or rule-based so finding different and better ways to do this is really an interesting area for further development.
LUKE RENNER. Got it. So how good are we at predictions now?
BIAO MA. A simple answer to that is there are significant advances in this area but it is not good enough yet. What I mean by that is there are mechanisms regarding the semantic layer of the high definition map. For example, learning-based methods are trying to provide prediction with such input and there are algorithms implemented based on the semantic and emotion of prediction.
So, different prediction mechanisms require the upstream systems to align and require the downstream system to use the information of the predicted trajectory.
LUKE RENNER. Yeah, so, I'm wondering like, how are the researchers going to know that their algorithms are actually getting better at prediction?
BIAO MA. Think about, if you roll the clock back five seconds in your data, using your tool or your infrastructure. By doing that, you actually get a ground truth of the trajectory that the object will travel in the next few seconds in the future. So, this ground truth can give you a good way to evaluate how good you are at predicting that. You have one ground truth trajectory and you have one predicted trajectory, right? So, that gives you a ground truth.
LUKE RENNER.You compare what really happened to what your system expected and see how close those two are together.
BIAO MA. Yes.
LUKE RENNER. So, what are researchers doing to help their autonomous vehicles get better at making predictions?
BIAO Ma. There are at least three directions that I can see in the industry. Number one is to reduce the need for input. What I mean by that is, some prediction systems require the knowledge of the semantic or require certain classification provided to it to predict what the object will do. The upstream system needs to tell the AV that this is a pedestrian, this is a motorcycle so that assumption could be reduced so prediction could be smarter in a simple way.
The second direction is about the location of the input and output. How could prediction improve, given higher confidence or higher granularity? And not only improve trajectory prediction but also get better at predicting higher-level information such as what is the vehicle trying to do.
For example, what if we knew beyond which direction the vehicle was traveling? What if we could also predict that this vehicle is going to take a left turn or is trying to change lanes, right? So more behavior-level predictions can be used to help the vehicle drive better, right? So not only the vehicle's direction but also at a higher level. That's the second.
The third is about technical direction wise, the best prediction is communication. No matter how good you are at predicting what I will do, the best way is for me to tell you what I'll do.
So, actually, if we stop thinking of prediction as one subsystem, if we look at the holistic view of the whole stack, some modules or subsystems can grow together.
LUKE RENNER. This is all very fascinating. So, we've covered occlusion, we've covered prediction. Let's talk about the final challenge, which is fleet planning and control. So, my question for you is what is fleet planning and control and what makes it different from regular autonomous vehicle development?
BIAO MA. The initial phase of autonomous vehicle development has really been about a challenge. Can we make a car drive itself, right? That's the whole idea behind the initial development and the initial bring-up of autonomy, its settings, and its initial set of sensors — to get a proof of concept.
The second phase is really about how to scale up.
Algorithms are being proposed and developed to consider the coordination of a fleet. Instead of each of the vehicles needing to perceive and track and predict what they will do, algorithms, in a centralized way or in a far more efficient way, will provide this to the fleet. So, that is the new opportunity for getting things done more effectively.
LUKE RENNER. So, last time you were here, I asked you to make predictions. And I'm going to ask you to do it again. So of the three things that we talked about, occlusion, prediction, and fleet management, which of these problems do you think will be solved first?
BIAO MA. I don't think they will be solved one after the other. Solving occlusion, prediction, and fleet management actually requires the next generation of the autonomous vehicle stack. Each of the subsystems will need to work together. So, I do think there will be stuff down the road, not to mention a new generation of designs and methods.
LUKE RENNER. And how long do you think it's gonna take?
BIAO MA. I'm optimistic. So, I do think in the next three to five years, there will be significant improvement coming to occlusion prediction, large-scale decision finding, and control.
LUKE RENNER Alright, Biao. I appreciate the time. Thanks so much. It was interesting.
BIAO MA. Great. Yeah. Very happy to be here.