Microsoft Corporation

05/05/2026 | Press release | Distributed by Public on 05/05/2026 13:25

Microsoft signs new deals with US and UK partners to advance AI testing and safety

Today, Microsoft is announcing new agreements with the Center for AI Standards and Innovation (CAISI) in the US and the AI Security Institute (AISI) in the UK to advance the science of AI testing and evaluation, including through collaborative work to test Microsoft's frontier models, assess safeguards, and help mitigate national security and large-scale public safety risks. These agreements matter because ongoing, rigorous testing is essential to building trust and confidence in advanced AI systems. Well-constructed tests help us understand whether our systems are working as intended and delivering the benefits they are designed to provide. Testing also helps us stay ahead of risks, such as AI-driven cyberattacks and other criminal misuses of AI systems, that can emerge once advanced AI systems are deployed in the world.

While Microsoft regularly undertakes many types of AI testing on its own, testing for national security and large-scale public safety risks necessarily must be a collaborative endeavor with governments. This type of testing depends on deep technical, scientific, and national security expertise that is uniquely held by institutions like CAISI in the US and AISI in the UK and the government agencies they work with. By combining that government expertise with Microsoft's experience building and deploying AI systems at global scale, together we are better positioned to anticipate and manage national security and public safety risks in ways that build public trust and confidence in advanced AI systems.

Improving AI evaluation science through cooperative research and operational experience

Advancing the science of AI evaluation requires more than isolated research or one-off testing. It depends on sustained collaboration between industry, government, and research institutions. Through our new and expanded partnerships with the US and UK governments-alongside national security-focused evaluations of model capabilities-Microsoft is bringing technical expertise and operational experience to strengthen AI evaluation methods and practical testing foundations.

  • In the US, with CAISI, Microsoft and NIST will collaborate on improving methodologies for adversarial assessments-testing AI systems in ways that probe unexpected behaviors, misuse pathways, and failure modes, much like stress-testing whether airbags, seatbelts, and braking systems work effectively and reliably in safety-critical driving scenarios. This work involves co-developing more systematic and reproducible approaches to evaluation, including shared frameworks, datasets, and workflows for assessing safety, security, and robustness risks in advanced AI systems. It also builds on our AI Red Team's novel research and tools to detect compromised models at scale .
  • In the UK, with AISI, Microsoft will collaborate on research related to frontier safety and security, including methods for evaluating high-risk capabilities and the effectiveness of the safeguards used to address them. The partnership will also include societal resilience research examining how conversational AI systems interact with users in sensitive contexts.

These collaborations are designed to improve measurement science, evaluation methodologies, practical testing workflows, and real-world mitigation impact. They reflect a shared commitment to rigorous, practical approaches that can make safeguards stronger and evaluations more reliable.

Looking ahead

No organization can address these challenges alone. Our partnerships with CAISI and AISI are a key part of a wider effort to build the institutions, research base, and shared methodologies needed for effective AI testing. This effort also includes:

  • Pursuing research and evaluation in collaboration with other AI institutes globally while helping advance shared priorities and methodologies for testing through the International Network for AI Measurement, Evaluation and Science.
  • Helping deliver industry best practices through the Frontier Model Forum (FMF), an initiative dedicated to advancing the science and practice of frontier AI safety and security. Through the FMF, we are working with other leading AI developers to support independent research, develop shared evaluation methodologies, and promote transparency around risk mitigation strategies.
  • Contributing to MLCommons , a multistakeholder non-profit that develops and operationalizes testing tools such as AILuminate , a family of safety and security benchmarks . In February , we announced efforts underway with institutions in India, Japan, Korea, and Singapore to expand AILuminate to support multilingual, multicultural, and multimodal evaluation, helping to make sure that AI systems work well in the languages and cultural contexts in which people around the world use them.

As AI capabilities advance, so too must the rigor of the testing and safeguards that underpin them. We will apply what we learn from these partnerships directly into how we design, test, and deploy AI systems, ensuring that progress in evaluation science translates into safer, more secure products for our customers. As these partnerships progress, we will share what we learn and look for opportunities to apply insights and best practices to AI testing more broadly.

Microsoft Corporation published this content on May 05, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on May 05, 2026 at 19:26 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]