The wide-ranging possibilities of machine learning are transforming many industries, shaping the development of many essential applications, from self-driving cars, to healthcare, to fraud detection. But what happens when machine learning breaks down?
Like any software, deep learning applications depend on the hardware they run on – hardware that can break down and cause applications to make mistakes. Two ECE graduate students, Zitao Chen and Ali Asgari, are working to prevent these accidents. Their new software solution, Ranger, checks for and corrects faulty values in deep learning systems.
“Basically,” says Zitao Chen, the lead investigator on this project, “The problem we are looking at here is how to make machine learning models be even more reliable.”
Most computer systems are composed of software and hardware working together. However, hardware can be subject to faults such as cosmic rays, stress on the system, age, faulty designs, purposeful attacks, and so-on; all can cause a system to malfunction. When the hardware isn’t working properly, the software will be affected, and computation errors are the result. Usually, when a fault arises in a software’s computation, someone can go in and recompute it manually. But this can be difficult and laborious, and is hard to complete under time constraints, like when a self-driving car is operating.
What Zitao and Ali’s tool, called Ranger, does is anticipate and correct errors in the hardware. It performs range checks (hence the name), checking that the values of the software fall within an accepted statistical range. If there is a value that falls outside this range, it brings this erroneous value back to a safe region.
In the above example, Ranger is correcting an autonomous vehicle’s faulty computer vision. On the far left, the car recognizes the road correctly. In the center, a fault has occurred, affecting the car’s ability to infer which way to go- and directing it into traffic. In the right image, Ranger has corrected this error and the car can proceed safely.
Zitao and co-researcher Ali Asgari began developing this research at the Dependable Systems Lab, led by Karthik Pattabiraiman. “We had been working on another paper, trying to understand why the machine learning model would fail from a hardware fault,” says Zitao. “Long story short….. We came up with a nice way to characterize the patterns of when the model fails and when the model would not fail.” By identifying these patterns, they were able to develop a software that anticipated and corrected errors, bringing values back to a tolerable region.
“The very cool thing about this is that this tool is application-oblivious; so it doesn’t matter if you’re using a self-driving car or you’re using an autonomous robot- you can use this application.” says Ali.
This tool will soon be used in industry. Intel has added this software to OpenVINO – a toolkit that optimizes and improves the reliability of computer vision hardware and software. Ranger will soon be used as part of this toolkit to help develop all different kinds of deep learning technology.
“We can easily incorporate it without a lot of programming interventions.” says Zitao. “You don’t really have to make a lot of changes.” As well, “it’s really low cost…. And super effective. I think this is what makes this technique so appealing for them.”
“Despite its simplicity, it provides a very high level of reliability.” adds Ali.
This was the first project Zitao worked on after joining Karthik’s lab. Says Zitao, “I think to me the most exciting part [of this research] was understanding the problem we’re dealing with… I was playing with tools to see how to model, see how it would fail, and then, through this process, by looking at all these crazy details…gradually this emerged.”
“Once we’d identified what this problem was, we moved on to see how we can make things better based on our understanding.” he says.
“I found this research project interesting because I found it more close to the application.” Ali explains. “This technique is used in the industry and has impact. And the specific area we’re focusing on, artificial intelligence and machine learning, is being used in many applications and is ever-growing in its popularity.”
“I can see an important shift happening- formerly, maybe [developers] didn’t care much about reliability of AI applications.” says Ali. However, he sees this tool as part of a change in the industry. “I think this is a first step that could also lead to looking at reliability [in software] from many different perspectives.”