Microsoft 365 Copilot Bug Raises Data Loss Prevention Concerns After Summarizing Confidential Emails

7 hours ago
4 min read

A newly disclosed bug in Microsoft 365 Copilot is forcing enterprises to confront an uncomfortable reality about AI integration inside productivity suites. Since late January, the AI assistant has been summarizing confidential emails in ways that bypass established data loss prevention policies, according to a service advisory.

The issue, tracked internally as CW1226324 and first identified on January 21, affects the Copilot “work tab” chat experience within Microsoft 365. Instead of respecting sensitivity labels and DLP restrictions, the system reportedly ingested and summarized emails stored in users’ Sent Items and Drafts folders, including messages explicitly marked as confidential.

Microsoft confirmed the behavior in a service alert.

"Users' email messages with a confidential label applied are being incorrectly processed by Microsoft 365 Copilot chat," Microsoft said when it confirmed this issue. "The Microsoft 365 Copilot 'work tab' Chat is summarizing email messages even though these email messages have a sensitivity label applied and a DLP policy is configured."

How It's Supposed to Work

Microsoft 365 Copilot is Microsoft’s generative AI assistant embedded across Word, Excel, PowerPoint, Outlook, and OneNote. The Copilot Chat feature allows users to query internal content and generate summaries based on emails, documents, and calendar data tied to their accounts.

The company began rolling out Copilot Chat broadly to paid Microsoft 365 business customers in September 2025. The value proposition was clear. Copilot would act as a context-aware assistant that understands a user’s workspace while respecting enterprise governance controls.

Those governance controls include sensitivity labels and DLP policies, which are widely used across regulated industries to prevent unauthorized access, sharing, or exfiltration of sensitive data.

In this case, however, the AI assistant appears to have overstepped those boundaries.

Microsoft later attributed the issue to an unspecified code defect.

"A code issue is allowing items in the sent items and draft folders to be picked up by Copilot even though confidential labels are set in place," Microsoft added.

The company said it began rolling out a fix in early February and is continuing to monitor deployment. It has also contacted a subset of affected users to confirm remediation. Microsoft has not disclosed how many organizations were impacted and has not provided a definitive timeline for full resolution. The incident is currently categorized as an advisory, typically indicating limited or scoped service impact.

Why This Bug Matters for Enterprise AI Security

For many security leaders, the incident is less about a single defect and more about what it reveals.

Yagub Rahimov, CEO of Polygraf AI, argues that the root issue goes deeper than code quality.

"This incident is a typical example of why regulated industries can't afford to treat AI security as a checkbox. What makes it particularly concerning isn't the bug itself, software has bugs, that's expected, it's the architecture and engineering it exposes. When AI is deeply integrated into productivity tools, it inherits access to everything those tools can touch. DLP policies were designed for a pre-AI world, and adding them to systems that were never built with AI access patterns in mind creates exactly these kinds of gaps. LLM guardrails would also fail in this scenario, they operate at the output layer and have no visibility into whether the data fed to the model should have been accessible in the first place.

Prevention requires a different approach entirely - data governance and access controls need to be designed around AI from the start, not retrofitted after deployment.

What worries me more than this specific bug is what it signals. Every CISO who reads this story will think about their own environment and realize they can't fully answer the question: what does our AI actually have access to right now? That uncertainty, at scale, across thousands of organizations, is the real problem here."

His comments reflect a broader concern emerging across enterprise AI deployments. When generative AI systems are embedded deeply into core productivity software, they operate with the same permissions as the underlying applications. If access controls are misconfigured or if enforcement layers fail, the AI inherits that exposure.

Traditional DLP frameworks were designed to monitor human-driven data flows. They were not necessarily engineered to account for autonomous summarization, contextual retrieval, or large language model ingestion patterns.

The AI Governance Gap

The Microsoft 365 Copilot incident highlights a growing governance challenge facing enterprises adopting tools powered by large language models.

Security teams must now answer new questions:

What internal repositories can AI assistants access by default?
Are sensitivity labels enforced consistently across AI workflows?
Do AI features process content differently from traditional application logic?
How is access logged, audited, and reviewed?

In regulated sectors such as healthcare, finance, and government, the stakes are especially high. Confidential drafts and sent communications can include legal strategy, merger discussions, patient information, or intellectual property.

Even if Copilot summaries were only visible to authorized users, the bypass of DLP enforcement undermines the integrity of the policy model itself.

A Signal for CISOs

Microsoft says it continues to monitor the rollout of its fix. But the incident arrives at a pivotal moment for enterprise AI adoption. Organizations are rapidly integrating generative AI into daily workflows while governance models are still catching up.

For CISOs evaluating Microsoft 365 Copilot security, this episode underscores the need for proactive access reviews and AI-specific threat modeling. It also reinforces the idea that AI systems are not isolated tools. They are extensions of the data ecosystems they inhabit.

The most pressing question may not be how quickly Microsoft resolves this particular bug. It may be whether enterprises can clearly map and control what their AI assistants can see, summarize, and reason over in the first place.