Unlike humans, computer vision models typically do not exhibit directness of perception, so they learn to represent visual information in highly unpredictable ways. But if machine learning models had this ability, it could allow them to better predict how objects or people will move.
MIT researchers have found that a specific training method can help computer vision models learn more perceptually straightforward representations like humans do. The training involves showing the machine learning model millions of examples to learn the task.
Researchers have found that training computer vision models using a technique called adversarial training, which makes them less reactive to small errors added to images, improves the models’ perceptual directness.
The team also found that the directness of perception is affected by the task the model is supposed to be training. Models trained to perform abstract tasks such as image classification learn more perceptually straightforward representations than models trained to perform finer-grained tasks such as assigning each pixel in an image to a category.
For example, nodes in the model have internal activations that represent “dog”, which allows the model to detect a dog when it sees any image of a dog. Perceptually direct representations maintain a more stable “dog” representation when there are small changes in the image. This makes them more robust.
By better understanding perceptual directness in computer vision, researchers hope to uncover insights that could help them develop models that make more accurate predictions. For example, this feature can improve the safety of autonomous vehicles that use computer vision models to predict the trajectories of pedestrians, cyclists and other vehicles.
“One of the main messages is that taking inspiration from biological systems like human vision can give you insight into why certain things work the way they do, and also inspire ideas for improving neural networks,” says Vasha DuTell. , an MIT postdoctoral fellow and co-author of a paper on perceptual directness in computer vision.
DuTell is joined on the paper by lead author Anne Harrington, a graduate student in the Department of Electrical Engineering and Computer Science (EECS); Ayush Tewari, Postdoctoral Fellow; Mark Hamilton, graduate student; Simon Stent, research manager at Woven Planet; Ruth Rosenholtz, Senior Scientist in the Department of Brain and Cognitive Sciences and a member of the Computational Science and Artificial Intelligence Laboratory (CSAIL); and lead author William T. Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and a member of CSAIL. The research is presented at the international conference Learning Representations.
Read Now:The Arctic can digest plastic at low temperatures, using nature to fight the planet’s plastic problem