September 10, 2025

AI in CLM: Separating value from hype

Our CTO, Oscar Klink, has spent the better part of his career at the intersection of technology, design, and legal operations. With a background that spans product development, user experience, and system architecture, he’s played a central role in shaping how Precisely builds contract management that actually works in practice. Oscar has a knack for cutting through hype to get to what’s real, especially when it comes to new technologies like AI.

In this piece, he reflects on the current wave of AI in contract lifecycle management: where it’s genuinely useful, where risks are hiding in plain sight, and how to approach it with equal parts curiosity and caution. True to his style, it’s pragmatic, hands-on, and rooted in lessons from the field rather than theory.

AI in CLM: Separating value from hype

Lately, AI has been dominating every conversation in the CLM space. That’s a good thing but it can also be risky. In this quick guide to common AI topics in CLM, I’ve seen plenty of slides claiming “95% accuracy” but the truth is, these numbers usually come from carefully controlled in-distribution test sets where the inputs are predictable and the outputs are already known. If I know the answers in advance, I can adjust prompts and rules until I hit 95% as well. That doesn’t mean the system will actually perform at that level when it’s faced with your real contracts; your NDAs, MSAs, or DPAs that have their own quirks, formats, and edge cases.

That’s why the more pragmatic path is deliberately unflashy: get the CLM basics in place, run tests on your own documents, keep humans involved in reviewing outputs, and look for step-by-step improvements rather than overnight miracles.

And when it comes to vendor claims, a common one is “we use OpenAI,” as if that alone answers every question about security, governance, and compliance. It doesn’t. Yes, OpenAI now offers EU data residency for certain types of API traffic and for ChatGPT Enterprise/Edu content, and yes, Azure OpenAI runs the same models within Microsoft’s cloud infrastructure with enterprise-grade controls. But your actual risk exposure still depends on the specific service you’re using, which region it’s configured in, and what the retention settings look like in practice.

What fuels the hype (and why it matters)

Bold AI claims often spread because there is pressure behind them: marketing teams want striking results, investors expect momentum, and many buyers are afraid of missing out. A single headline number such as “95% accuracy” can look convincing at first. In practice, that figure usually comes from a curated test set that the vendor has already tuned against.

Your own situation is rarely the same. Contract templates, scan quality, languages in use, and internal risk thresholds all affect outcomes. Once you move from a demo set to your actual repository, performance will shift. The only dependable way to know if a system works for you is to test it directly on your own data before making a purchase.

Risks of moving too fast

Compliance and residency gaps

For EU-based organizations, it is important to confirm where inference runs and which logs are retained. OpenAI’s API may retain inputs and outputs for up to 30 days for abuse monitoring, although there are endpoints that support zero-data-retention. Azure OpenAI processes data in the region or geography you select, with the option of modified abuse-monitoring if approved. These technical choices should always be mapped against GDPR requirements and the record-keeping duties of the EU AI Act.

Wrong expectations from demos

Accuracy scores presented in vendor demos are usually based on controlled, known datasets. That is very different from the mix of third-party contracts and edge cases that most legal teams actually deal with.

Sham digitization

If the underlying repository, metadata, versioning, and search functions are not in good shape, AI will not solve those problems. Instead, it risks magnifying the inconsistency.

Overpromised outcomes

Case studies sometimes highlight big wins, such as reducing contract cycle time from 45 days to 12. These can be useful examples, but they should be treated as contextual rather than universal.

What to do instead (OpenAI-focused)

Build the CLM foundation first

AI can only amplify what is already in place. A central contract repository, reliable metadata, clear version control for templates, and effective search are the basics that make advanced tools useful. Without this foundation, the gains from AI will be limited.

Test on your own documents—not the vendor’s demo

A realistic pilot should include a small but representative set of your own contracts. A golden set of 20–50 documents that cover templates, third-party paper, and known edge cases works well. Keep the set blind until test day to avoid bias. Research such as ContractNLI shows just how hard it is to capture contractual nuance, which is why results from polished demo datasets rarely translate directly to real-world use.

Keep evaluation simple and repeatable

Define acceptance criteria in plain terms that match your business needs, such as “no unreviewed edits to indemnity or liability caps”, “drafts follow our playbook”, or “no personal data leaves the EU”. Run a blind trial on your golden set and include a few unknown documents to better simulate day-to-day conditions. Once you have the protocol in place (which documents, pass/fail rules, reviewers), freeze it and repeat it whenever you change a model, prompt, or configuration.

Treat “we use OpenAI” as the start of due diligence

Service and region: Clarify whether the system runs on the OpenAI API (where EU data residency is now available) or Azure OpenAI (where regions and geographies are customer-controlled). Always get the specific region or geography confirmed in writing.
Retention and monitoring: The OpenAI API defaults to 30-day log retention, although Zero Data Retention endpoints are available. Azure OpenAI has its own default retention for abuse monitoring, with modified options on request. Ask for concrete evidence such as support tickets or configuration IDs.
Training boundaries: OpenAI and Azure OpenAI state that customer data is not used to train their base models by default. Even so, confirm the vendor’s fine-tuning policies and subprocessors.
Application-layer logs and audit: Most CLM providers also store chat logs, embeddings, and audit trails. Confirm that these are EU-resident, have defined retention periods, and can be exported as evidence for AI Act compliance. OpenAI offers Compliance and Audit APIs for enterprise workspaces that can strengthen internal controls.

Keep humans in the loop

Automation works best for low-risk, repetitive patterns. For higher-risk suggestions, human review should remain part of the process. Even with more autonomous “agentic” tools, oversight in legal workflows is still necessary.

Bottom line

AI is set to play an important role in contract lifecycle management. The teams that benefit most will be the ones that combine the strengths of OpenAI with solid governance practices, careful testing on their own documents, and ongoing human oversight. This approach turns polished demos into lasting improvements in contract cycle times, while keeping control over risk and compliance. For a broader view of how the market is evolving, see recent Financial Times coverage on legal AI adoption.

My OpenAI-aware buyer’s checklist

Residency and processing: Confirm that the service runs in an EU region. For the OpenAI API, that means the project is set up in the EU. For Azure OpenAI, that means the region or geography is explicitly EU-based. Always ask for written confirmation or evidence.
Retention: Understand how long data is stored. The OpenAI API defaults to 30-day retention, with Zero Data Retention endpoints available. Azure OpenAI has its own default abuse monitoring retention, with the option of a modified setup if approved. Ask for proof, such as an approval ID or ticket.
Training boundaries: Both OpenAI and Azure OpenAI state that customer data is not used to train their base models by default. Still, you should secure explicit commitments from both your CLM vendor and the underlying provider.
Controls: Look for safeguards that match risk levels. This can include reviewer gates, defenses against prompt injection, and reliable fallback options.
Evaluation: Plan a repeatable way to test performance. A blind trial using your golden set with clear acceptance criteria is key, and it should include a handful of “unknown” documents to better reflect reality. Re-run the trial after any changes to models, prompts, or configurations.
Lifecycle fit: Check that the solution adds value across the full contract process: from templates and drafting through negotiation, approval, signing, and archiving. A chatbot alone is not enough.
Auditability: Make sure you can export logs and annotations in a way that supports EU AI Act record-keeping. OpenAI’s Compliance and Audit APIs can help strengthen your internal governance here.

AI

You cannot prompt your way out of a confidentiality breach