By Michael Blake, Defense Opinion Writer.
As artificial intelligence (AI) systems become deeply integrated into military decision-making, mission success hinges on AI data integrity.
The complexity of multiple ingestion points and different AI models interacting in real-time creates new challenges in verifying that data is accurate, untampered and free from adversarial manipulation. Without that certainty, decision making will be constrained by the slower, traditional methods that AI is intended to replace.
In any given AI model used in the military, its collected sample “pipeline” data gets split into two sets–one for training and one for testing the model’s predictive accuracy. Most people assume their model then maintains a steady state, even as the machine learning (ML) technology inherent in many AI systems advances a model’s understanding of new data and predictive capabilities.
It doesn’t.
The real world is dynamic and patterns shift. As data changes, the AI model’s predictive accuracy “drifts” from what is genuine and verifiable, making it untrustworthy for critical decision-making. Big events, such as how the Covid-19 pandemic dramatically changed predictive stock market dynamics and investor behaviors, can also quickly make a model obsolete.
Cyberattack surface grows as data increases
A combination of smaller events can have the same effect. As the volume of model data increases and its velocity accelerates, the attack surface grows. That compounds the chance for covert actions by enemy hacking, adversarial interventions or data poisoning, possibly repeated over time. Consequently, operators must persistently adapt to ensure data remains intact and reliable.
Some larger AI companies are developing methods to calculate drift. The key is to apply continuous training to measure the quality of an AI model’s performance against pre-established thresholds and key performance indicators (KPIs). As predictive accuracy decreases, those KPIs will indicate when it is time to retrain the model.
However, without validating the data’s correct formatting (syntactics) and logic (semantics) or sanitizing it, the retraining will likely be done with degraded or even bad data, which then produces misinformation or AI hallucinations.
Depending on how long it has been since testing was done, having potentially weeks or months of bad data could render the AI model and its predictions inaccurate for an extended period of time. That will impede timely decision-making in critical warfighting situations.
Validating data is key
The military needs proactive measures to prevent data poisoning or other malicious interference. For battlefield situations, military leaders must be confident that the data fed into their AI adheres to expected formats and values so that predictive recommendations remain unbiased. Achieving that demands a secure system architecture that validates data at every point of ingress and egress.
Cybersecurity technologies like cross domain solutions (CDS) can help. They can defend against the training or testing data being poisoned or corrupted by malicious or anomalous data by applying rules that enforce the correct data format and constraining the values of new data inputs to accepted ranges. By inspecting and logging all data flows, only approved, unaltered information can cross security boundaries.
Cross domain solutions require secure mechanisms for appropriate data selection and pre-screening. CDS employs preliminary “coarse” screening (for general accuracy and performance issues) to quickly eliminate irrelevant, redundant or low-quality data from large datasets to reduce the AI’s computational costs.
CDS uses “fine” screening to refine the data so that it aligns with an AI model’s requirements. For instance, it makes sure it has the correct format or logic. Both types of screening can apply strict validation rules during data collection, so that only clean and compliant data is accepted into training and testing environments.
High-side and low-side networks
For use in the military, the architecture needs to accommodate multiple networks and environments used in defensive operations, such as “high side” for transmitting and storing classified or sensitive information and “low side” for handling routine or less sensitive information.
High and low side networks are deliberately separated to prevent data leakage or compromise, but the military could use CDS to securely manage and monitor any required data transfers between them.
When integrating an AI model into these environments, training and testing need to align to operational goals. For example, high-side systems used for inference require a simpler architecture. However, running continuous training on the high side requires both training and inference hardware operating in parallel. A mechanism like a CDS can ensure that the incremental model updates use only known, validated data.
Using AI on the low side can help optimize bandwidth utilization by transferring only the most relevant data to the high side. This will preserve an efficient use of network resources while still providing timely, critical information to the leaders who need it.
An AI model’s effectiveness hinges on the quality and security of its training data. Without robust validation through collection, transfer and retraining, even advanced models risk degradation or exploitation in critical settings.
Given evolving threats, the AI models integrated into a growing number of military applications demand constant refinement of screening protocols and architectural hardening. Only then will they yield the trustworthy information on which mission success, and often lives, depend.
Michael Blake, a 25-year software industry veteran, is technical fellow and chief architect at Owl Cyber Defense, which is based in Columbia, Maryland.
Are you a Defense Daily reader with a thought-provoking opinion on a defense issue? We want to hear from you.
- We welcome submissions of opinion articles on national security, defense spending, weapons systems and related areas.
- We welcome submissions from lawmakers, administration officials, industry representatives, military officials, academics, think tank experts, congressional candidates, international experts and others on issues important to the national defense community.
- We welcome a diverse range of opinions all along the political spectrum.
- Email: editor@DefenseOpinion.com