We know that Artificial Intelligence is now the rage. AI works in your houses, colleges, worksites, the stock market, airports….the list is endless. But we have all heard of AI messing up – genuine transactions flagged as fraudulent, face recognition failed, and toxic comments being posted on social media. Have you ever wondered why this happens? Isn’t AI the answer to everything, the panacea, and better than humans in all aspects?
Time to burst the bubble – just like humans, AI has its own flaws too. AI is powered by machine learning, which, at its core, is a series of complex mathematical transformations. AI doesn’t understand, think, or feel – it simply learns from numbers without caring about what they might mean.
There is an entire branch of machine learning dedicated to exploiting the flaws of AI and circumventing models, called adversarial machine learning. Adversarial machine learning is a technique used in machine learning to fool or misguide a model with malicious input. While adversarial machine learning can be used in a variety of applications, this technique is most commonly used to execute an attack or cause a malfunction in a machine learning system.
Early research on adversarial machine learning focused mainly on image classification. Studies by Szegedy et. al. in 2014 showed how adversarial examples could be constructed with perturbation, and how examples generated for one model could also be used to attack other models. Research by Papernot et. al. provided more and more methods to generate adversarial samples. These methods apply very small changes to the images; so small that the naked eye cannot distinguish them, but the AI misclassifies the image as something completely different.
More recent work has also emerged in text classification. A study by Samanta et. al. from IBM shows that a sentiment analyzer could be fooled by adding or removing certain words from the text. Another study by Rajvardhan Oak from UC Berkeley demonstrated how making minor changes such as incorrect spellings, dropping certain characters could fool a toxic content classifier. Oak’s research brings to light the real impact of the problem; if ML systems can be fooled, user security is at risk. Previous research on images required sophisticated programming to generate adversarial examples. But as Oak proved in his experiments, a bad actor (abuser or a bully) with absolutely no knowledge of programming can fool a sophisticated machine learning system. Quoting from the study, a tweet with the word ‘fuckin immigrant’ was classified as not being toxic or hateful – but clearly, that is not the case.
This has now provided a new dimension to cyber attacks; attackers can specially craft examples to trick machine learning models. To learn more about his study, and understand why adversarial examples occur from an expert point-of-view, we reached out to Oak. “A model learns from whatever inputs we provide it. It doesn’t understand, just learns. If we don’t provide noisy examples (like misspellings, unexpected punctuations or spaces) during training, it will never expect them, and so treat them as something completely new. At that point, it may make a guess that is pretty much random” says Oak. Along with a group of Princeton researchers, he recently published another study, one that introduces similarly simple and non-sophisticated attacks in the image domain. This study shows that image classification models learn a non-trivial amount of information from the background of the image. “Our recent work shows that a model can accurately predict the image of a panda even when we remove the panda from the image but keep the trees in the background. This also means that we can fool a model into classifying something as a panda by adding a forest background in part of the image”, Oak adds.
These findings by Oak and his team have important implications. Can someone fool face recognition systems or traffic cameras by simply including random objects in the background? Only time will tell. In the meanwhile, we should be cautious while using AI and making decisions based on AI predictions. This applies especially to governments, policy organizations and law enforcement agencies where decisions have the ability to affect millions of lives. AI should be used just as an aid – we should remember that it does not understand, only learns.