AI Attacks: Understanding, Identifying and Mitigating Attacks against AI Systems
In some respects, cybersecurity attacks that involve AI systems are similar to other, more traditional forms of attack. In other respects, they are completely unique threats that call for novel, innovative cybersecurity protections.
Thus, as this article explains, protecting against AI attacks requires a combination of traditional and innovative cybersecurity processes, tools, and strategies. Keep reading for guidance on everything you need to know about AI attacks today, including how they happen, the main types of AI attacks, and how to detect and mitigate AI security threats.
In this article:
- What is an AI attack?
- AI attacks vs. AI-generated attacks
- Main types of AI attacks
- Types of AI systems and types of AI attacks
- Detecting and remediating AI attacks
What is an AI attack?
An AI attack is a cybersecurity attack that manipulates an AI system for malicious purposes.
The goals behind AI attacks vary widely. In some cases, threat actors might seek simply to disable an AI service as a form of Denial-of-Service (DoS) attack. In others, their goal is to exfiltrate sensitive data by “tricking” an AI service into exposing sensitive information. The objective could also be to cause an AI system to malfunction by, for instance, forcing it to interpret data in inaccurate ways.
But no matter what attackers aim to do or how they do it, you can consider the incident to be an AI attack if it involves AI systems in some ways.
AI attacks vs. AI-generated attacks
It’s important not to confuse AI attacks with AI-generated attacks.
The latter term refers to attacks that use AI technology to help enable or streamline attacks. For instance, if threat actors use a generative AI tool to create content for phishing messages, or use machine learning to analyze potential targets, they are relying on AI to assist with their attacks.
That’s different from attacking an AI system itself, which is the type of risk we’re focusing on in this article.
Main types of AI attacks
AI attacks come in many forms, and opinions vary about exactly how attacks against AI systems should be categorized or classified. A useful guideline, however, is the one established by NIST researchers, who in 2024 identified four main types of AI attacks: Evasion, poisoning, privacy, and abuse.
Evasion
According to the NIST researchers, an evasion attack involves manipulating the live data that an AI system ingests to make decisions (i.e., during inference, to use the more technical term).
For instance, imagine that attackers modify access log files that an AI system analyzes to detect anomalous access events. If the attackers scrub anomalies entries from the logs, the system will fail to detect illegitimate access.
Poisoning
Poisoning attacks happen when threat actors manipulate the data used to train an AI model (as opposed to the data fed into the model during inference, which constitutes an evasion attack, as described above).
For instance, if developers were training an AI model to detect anomalous access by feeding it log files that include anomalous events that are labeled as such, attackers could poison the model by changing the labels files prior to training. As a result, the model likely would fail to understand why types of events are considered risky.
Privacy
A privacy attack against an AI system is an effort to collect information about how the system works. For example, attackers might inject carefully crafted prompts into a chatbot that are designed to “trick” the chatbot into displaying information about the data sources it trained on.
On their own, privacy attacks do not necessarily cause harm, especially if the data revealed through a privacy attack is not sensitive. However, threat actors can use information gleaned through privacy attacks to find weaknesses in AI systems, which they can then exploit via other attacks. For example, if they learn that an AI system ingests certain types of log files, they’ll know that they can manipulate that file to launch an evasion attack.
In addition, some privacy attacks can cause direct harm. For example, an AI system that automatically retrains itself over time based on prompts from users could end up generating inaccurate or inappropriate data due to privacy attacks that involve malicious prompts.
Abuse
An abuse attack is an attempt to insert malicious information into a source that an AI system will later ingest. Unlike evasion and poisoning attacks, abuse attacks don’t directly manipulate data sources that an AI model is known to work with. Instead, they pollute broader data environments with unreliable or inaccurate information that could eventually harm the AI model.
A classic example of an abuse attack against an AI system is the creation of a website containing false information. If a generative AI model later trains on the website data – or if the misinformation is copied into other data sources that the model uses for training, even if it does not train on the original website – the model may not work as its developers intend due to manipulated training data.
Types of AI systems and types of AI attacks
When evaluating types of AI attacks, it’s important to keep in mind that the types of attacks that can affect an AI system may depend on the type of AI system in question.
For example, generative AI technology requires training data, so generative AI services can be subject to poisoning attacks (which, as explained above, involve manipulation of training data). But AI systems that rely on a rule-based approach don’t use training data, so they’re not at risk of this type of attack – although they could be compromised through other attack techniques, like evasion, if attackers manipulate the data they ingest.
Detecting and remediating AI attacks
The processes of preventing, identifying, and remediating attacks against AI systems start with the best practices for managing security threats for any type of software system, not just AI. They include, for example:
- Performing a risk assessment to identify security weaknesses in AI systems.
- Managing access controls to restrict unauthorized access to AI services and tools.
- Scanning for vulnerabilities in applications that connect to AI systems, since compromised apps linked to AI services could potentially become beachheads that threat actors use to compromise the AI services themselves.
However, AI attack mitigation should also include additional steps that address the unique security risks of AI, such as:
- Monitoring the data fed into AI systems to detect suspicious or unusual data, which could signal an evasion attack.
- Monitoring the integrity of training data (by, for example, using checksums to ensure data sources do not change unexpectedly) to prevent poisoning attacks.
- Monitoring interactions with AI systems to detect suspicious prompts, which could be part of a privacy attack.
- Scanning data ecosystems for signs of malicious data that threat actors might have planted to launch an abuse attack.
How Aqua can help
Aqua’s end-to-end application security protections can help ensure that AI systems and the apps that depend on them are free from risks and threats of all types. With features like active monitoring of LLM behavior at runtime to prevent unauthorized actions and code integrity scanning to mitigate unsafe use of LLM code inside applications, Aqua helps keep digital systems safe in the age of AI.