Goldman Sachs and Deutsche Bank Test Agentic AI for Trade Surveillance: From “Rules + Alerts” to Reasoning Systems (and New Risks)

AI generated image for Goldman Sachs and Deutsche Bank Test Agentic AI for Trade Surveillance: From “Rules + Alerts” to Reasoning Systems (and New Risks)

Banks have a long history of buying shiny new technology, bolting it onto a legacy workflow, and then acting surprised when the shiny part doesn’t magically fix the legacy part. Trade surveillance—monitoring orders, executions, and related behaviors for market abuse—has been one of the most stubborn examples. It’s mission critical, massively data-heavy, and famously prone to “alert fatigue.”

Now two of the world’s biggest institutions, Goldman Sachs and Deutsche Bank, are testing what the industry is calling agentic AI for trade surveillance—systems that can do more than match patterns to static rules. Instead, the pitch is that these tools can reason across multiple signals, decide what to look at next, and surface fewer but more meaningful cases for compliance teams to review.

The news was reported in a February 27, 2026 article titled “Goldman Sachs and Deutsche Bank test agentic AI for trade surveillance” by Muhammad Zulhusni at AI News (TechForge Publications). This piece is the foundation for the analysis below, and you can read it here: Goldman Sachs and Deutsche Bank test agentic AI for trade surveillance. citeturn1view0

What “agentic AI” means in trade surveillance (in plain English)

Traditional surveillance engines are usually built around scenarios and thresholds:

If X happens, create an alert. (e.g., an order is large relative to normal volume, or executed in a suspicious time window.)
Analysts triage alerts. Most are closed as false positives, some become cases, and a small number end up as regulatory reports.

Agentic AI is pitched as a step beyond “if-this-then-that.” The idea is to deploy software agents that can:

Combine multiple signals (order data, execution data, market context, historical trader behavior, and potentially communications metadata).
Choose investigative steps (query related accounts, look for linked instruments, check for a repeating pattern across venues).
Escalate with narrative context so a human reviewer gets something closer to a coherent hypothesis than a raw alert.

AI News summarizes this shift as moving away from keyword scanning and preset rules toward systems that can examine patterns in real time and flag conduct for human review. citeturn1view0

That distinction matters because modern market abuse isn’t always a single obvious event. It can be a choreography: a sequence of orders, cancellations, partial fills, cross-venue timing, and hedges that looks innocuous in each individual slice—until you stitch it together.

Why banks are doing this now: the surveillance problem is scaling faster than headcount

Trade surveillance has always been about two competing failures:

Too many false positives → analysts drown and miss real issues.
Too few detections → regulators (and headlines) notice the gaps.

Markets have grown more complex: more venues, more asset classes, more algorithmic strategies, and more data exhaust. Compliance teams are not scaling at the same speed, and regulators are not in the mood for “we were very busy” as an excuse.

The operational driver is blunt: if you can reduce noise, you can reallocate human effort to the cases that actually matter. Agentic AI is being sold as a way to do that—ideally without compromising coverage.

Deutsche Bank + Google Cloud: the data plumbing is already there

One reason Deutsche Bank is a plausible candidate for agentic surveillance pilots is that it has already invested in cloud-based surveillance data architecture. Google Cloud published a detailed blog in February 2024 describing Deutsche Bank’s serverless data architecture for trade surveillance, built around services such as BigQuery, Dataproc (Spark), and Cloud Composer (Airflow) for orchestration. citeturn2search3

In that architecture:

Front-office systems publish trade/market/reference data into BigQuery tables.
Surveillance scenarios run as transformations across BigQuery SQL and Spark jobs.
Suspicious cases are written to output tables and fed into investigation workflows.
Effectiveness and false-positive calibration is fed back into scenario execution over time.

This is important context because agentic AI isn’t a magic model you sprinkle onto spreadsheets. It needs high-quality, well-governed data access and repeatable orchestration. Otherwise, your “reasoning agent” becomes a very confident improv performer with no reliable memory.

AI News reports (citing Bloomberg) that Deutsche Bank is working with Google Cloud to develop AI agents to monitor trading activity and flag anomalies in near real time. citeturn1view0

Goldman Sachs: agents are becoming a theme, not a one-off experiment

Goldman has been unusually public (for a big bank) about experimenting with agent-like systems. In early February 2026, Reuters reported—via CNBC—that Goldman Sachs is working with Anthropic to develop AI-powered agents to automate internal functions including trade and transaction accounting and client due diligence/onboarding, with Anthropic engineers embedded alongside Goldman teams for months. citeturn2search0

That’s not specifically trade surveillance, but it shows an institutional willingness to build agentic workflows in regulated operations—exactly the kind of environment where the “cool demo” phase ends quickly and the “show me the audit trail” phase begins.

It’s also consistent with Goldman’s broader pattern of testing agentic tools, including reports that it tested the AI coding agent Devin as part of a “hybrid workforce” approach. citeturn0search1

Against that backdrop, Goldman exploring agentic AI for surveillance (as described by AI News, citing Bloomberg) looks less like a moonshot and more like a next logical workload: messy, expensive, and full of repetitive investigative steps that software is itching to automate. citeturn1view0

How agentic surveillance could work (a realistic workflow, not sci‑fi)

Let’s translate “agentic trade surveillance” into a practical pipeline that a compliance technology team could build without violating physics, regulation, or common sense.

1) Data ingestion and completeness checks

The most humiliating surveillance failure is not “the model missed a subtle pattern.” It’s “we never ingested the data.” Regulators have penalized firms for surveillance gaps tied to incomplete data feeds and weak controls. For example, public commentary has highlighted enforcement actions around data completeness and surveillance coverage, underscoring that controls and verification matter as much as analytics. citeturn0search4turn2search6

An agentic system could continuously test for missing fields, broken feeds, and mismatches between expected and observed message counts—then open tickets or escalate to operations. This sounds boring, which is exactly why it’s valuable.

2) Scenario triggers + agentic enrichment

In many banks, scenario-based alerts won’t disappear overnight. Agentic AI can sit downstream:

Take a raw alert (e.g., potential spoofing pattern).
Pull related order events, cancellations, and fills across venues.
Add market microstructure context (spread, volatility regime, order book dynamics where available).
Compare with the trader’s historical profile and peer group behavior.

The result is not an automatic “guilty/not guilty” verdict, but a prioritized, evidence-backed case file that a human can review faster.

3) Narrative assembly for human reviewers

A major time sink in surveillance is writing: documenting why you closed an alert, what you reviewed, and what evidence supported the decision. Agentic systems can assemble a draft narrative with citations to the underlying data objects—so long as the institution designs it to be reproducible and auditable.

Done right, this could reduce the “compliance theater” of repetitive documentation and free analysts to focus on judgment calls. Done wrong, it creates beautifully written nonsense—now in full sentences.

4) Continuous calibration and learning loops

Deutsche Bank’s Google Cloud architecture description explicitly talks about retaining alerts, measuring effectiveness, and feeding that back into surveillance mechanisms. citeturn2search3

Agentic AI can make that feedback loop more sophisticated by:

Detecting clusters of false positives and proposing scenario threshold changes (subject to governance).
Finding “near misses” where a case looked suspicious but didn’t breach current rules.
Surfacing new behavioral signatures for modelers to validate and formalize.

Why regulators might like this (and why they might not)

Regulators are not allergic to AI; they’re allergic to unaccountable decision-making and invisible controls. If agentic AI can improve early detection and reduce market harm, it’s easy to see the appeal. AI News notes that earlier detection reduces market harm and reputational risk and that compliance teams are pressured to manage large alert volumes while maintaining strict standards. citeturn1view0

But the skeptical regulator questions are predictable:

Explainability: Why did the agent escalate this case? What features and evidence drove that choice?
Consistency: Does it behave reliably across products, venues, volatility regimes, and languages?
Governance: Who approved the model? How is it tested, tuned, and monitored for drift?
Recordkeeping: Can the bank reproduce the decision path months later during an exam?

In other words: welcome to the same governance problems, now with more tokens.

The hidden hard part: trade surveillance isn’t just a model problem—it’s a systems problem

In vendor decks, surveillance looks like an AI model detecting “bad things.” In real banks, it’s a chain of brittle systems: feed handlers, reference data, symbology, time sync, entitlements, case management, evidence retention, and audit logs.

Deutsche Bank’s published architecture for surveillance on Google Cloud highlights precisely this systems reality: multiple operational data sources, transformations, scenario execution, case outputs, and orchestration. citeturn2search3

Agentic AI has to live inside that pipeline without breaking it. That means:

Deterministic logging of every query, tool call, and output.
Access controls so the agent can’t “helpfully” pull data it shouldn’t see.
Separation of duties between detection, investigation, and disposition.
Reproducibility—the same input data should produce the same output (or at least a governed range of outputs) for audit purposes.

Security risk: agents expand the attack surface (and the compliance blast radius)

Agentic systems are, by design, tool-using and workflow-driving. That’s also why security teams get twitchy: more tools, more data access, more automation equals more ways for something to go sideways.

Recent academic work has argued that guardrails and semantic filters for agentic AI are probabilistic and can be bypassed, and proposes “authenticated workflows” that enforce policy and integrity boundaries across prompts, tools, data, and context. citeturn2academia14

You don’t need to adopt that specific framework to see the general point: in a bank, an agent that can query surveillance data, open cases, draft narratives, and escalate alerts is effectively a high-privilege operational user. Treat it like one.

Practical controls banks will likely need include:

Tool allowlists (the agent can only perform approved actions).
Context minimization (give it only what it needs for the current case).
Cryptographic audit trails or at least immutable logging for tool calls and outputs.
Human-in-the-loop gates for actions that create regulatory exposure (e.g., filing, escalation, or closing certain cases).

Industry context: vendors and exchanges are also pushing AI deeper into surveillance

This isn’t only a “build it yourself” story. Market infrastructure providers and surveillance vendors are also moving in this direction.

Nasdaq, for example, announced AI enhancements to its market surveillance platform in October 2025 and described a pilot embedding advanced AI across the investigation lifecycle. citeturn0search5

Meanwhile, surveillance specialists are explicitly marketing agentic systems aimed at alert remediation, case management, and reporting. Solidus Labs, for instance, described launching an agentic AI system for trade surveillance workflows in 2025. citeturn0search2

The competitive reality is that large banks can build, buy, or do both. Most will do both. Agentic AI will likely be the “orchestration layer” that connects existing surveillance engines, case management platforms, and data lakes—rather than replacing them outright. That viewpoint echoes commentary from Deutsche Bank analysts about AI agents acting as an orchestration layer on top of incumbent systems. citeturn2news12

Case study lesson: the fastest way to get fined is to trust your own plumbing

Surveillance failures often read like postmortems from a distributed systems class. The lesson is painfully consistent: someone assumed the data was complete because it came from a “golden source,” someone skipped a reconciliation step, and the surveillance engine quietly stopped seeing the world.

That’s why the cloud architecture and governance matters as much as the AI. Deutsche Bank’s surveillance design explicitly includes periodic effectiveness calculations and orchestration for ETL and calibrations. citeturn2search3

If agentic AI is layered on top, it can help detect completeness anomalies sooner—but it can also add new failure modes (bad tool calls, incorrect joins, hallucinated rationales) unless it’s constrained.

What changes inside compliance teams if this works

If agentic surveillance actually delivers on its promise, the job of a surveillance analyst changes in three big ways:

From triage to investigation: fewer low-quality alerts, more complex cases that require judgment.
From manual lookup to supervision: analysts supervise the agent’s evidence collection and narrative, correcting it where needed.
From reactive to proactive tuning: analysts and model teams collaborate more on calibrations, patterns, and new scenarios.

AI News frames the likely shift similarly: staff may spend less time sorting simple alerts and more time evaluating complex cases surfaced by agents, with human judgment still essential. citeturn1view0

What could go wrong (a non-exhaustive list, unfortunately)

Because this is finance, there are always at least seven ways for a good idea to become an incident report. Here are a few of the most plausible pitfalls:

1) “Fewer alerts” becomes “less coverage”

Reducing noise is good—unless it’s done by suppressing signals that were inconvenient but important. Agentic triage needs strong backtesting and monitoring, with defined metrics for missed detection risk.

2) Model drift in changing market regimes

Behavioral patterns change quickly in markets. An agent that learned “normal” in one volatility regime may under-flag or over-flag in the next. Continuous calibration loops help, but they also need governance to prevent “silent tuning” that no one approved.

3) Hallucinated rationales

LLM-style systems can generate plausible-sounding narratives that don’t match the evidence. In surveillance, that’s not just embarrassing—it’s potentially discoverable in regulatory exams or litigation. The agent’s narrative must be grounded in verifiable links to evidence and restricted to what is known.

4) Privilege creep and data exposure

Agents need access. Over time, access tends to expand. Without tight entitlements and audit logs, you could end up with an internal system that can see too much and explain too little.

5) Overtrust by humans (“automation complacency”)

When a system produces clean summaries and confident next steps, humans tend to trust it. Nasdaq’s regulatory commentary has warned about passive compliance cultures where surveillance tools exist but aren’t meaningfully used or reviewed. citeturn2search6

Agentic AI can inadvertently amplify that problem if teams start rubber-stamping agent outputs because they look professional.

So… is this the future of trade surveillance?

Most signs point to “yes,” with an asterisk the size of a regulatory handbook.

The near-term future likely looks like hybrid surveillance:

Rules and statistical models continue to generate baseline alerts.
Agents enrich, prioritize, and assemble evidence.
Humans make final decisions and remain accountable.

That’s consistent with how AI News characterizes these tools: not replacing compliance officers, but acting as an additional monitoring layer that surfaces cases for closer human inspection. citeturn1view0

The institutions testing this first—like Goldman Sachs and Deutsche Bank—have two advantages: they have the budget to build the plumbing and the governance teams to keep the whole thing exam-ready. Smaller firms will still adopt agentic surveillance, but likely through vendors and managed platforms, and often after regulators and large banks set informal “acceptable practice” expectations.

Practical takeaways for banks (and for the people who have to run these systems at 2 a.m.)

Start with data completeness and controls: agentic AI can’t surveil what you don’t ingest.
Define agent boundaries: tool allowlists, entitlements, and immutable logging are not optional.
Measure outcomes, not vibes: track false positives, time-to-disposition, and missed detection proxies.
Design for audits: reproducibility and evidence linking beat cleverness.
Train analysts as supervisors: the new skill is not “click faster,” it’s “verify the agent.”

Sources

AI News (TechForge Publications): “Goldman Sachs and Deutsche Bank test agentic AI for trade surveillance” (Muhammad Zulhusni, Feb 27, 2026) citeturn1view0
Google Cloud Blog: “Serverless data architecture for trade surveillance at Deutsche Bank” (Vladimir Elvov; Selim Muhiuddin, Feb 20, 2024) citeturn2search3
Reuters via CNBC (republished): Goldman Sachs working with Anthropic on AI agents (Feb 6, 2026) citeturn2search0
TechCrunch: Goldman Sachs testing AI agent Devin (July 11, 2025) citeturn0search1
Nasdaq press release: AI capabilities embedded in market surveillance platform (Oct 16, 2025) citeturn0search5
CFTC Press Release 8769-23: Goldman Sachs recordkeeping violations settlement (Aug 29, 2023) citeturn2search1
Nasdaq.com: Regulatory Roundup on compliance gaps and surveillance operations (Oct 2025) citeturn2search6
arXiv: “Authenticated Workflows: A Systems Approach to Protecting Agentic AI” (Rajagopalan; Rao, Feb 2026) citeturn2academia14
Corporate Compliance Insights: Solidus Labs launches AI agent for trade surveillance (May 22, 2025) citeturn0search2

Bas Dorland, Technology Journalist & Founder of dorland.org