Designing a Multi-Agent Architecture for Core Banking Modernization: Patterns, Pitfalls, and a Reference Blueprint
Core banking modernization has been “the next big thing” in financial services technology for at least two decades, and for most of that time, the actual pace of change has been glacial relative to the ambition of the roadmaps. Legacy core systems — many running on decades-old technology stacks — remain the system of record for a large share of global banking, not because institutions don’t want to modernize, but because the risk, cost, and operational disruption of a full core replacement has historically been difficult to justify against the (real, but slow-to-materialize) benefits.
Agentic AI is introducing a genuinely different path: rather than a single, high-risk “big bang” core replacement, a layer of specialized agents can be built around and on top of the existing core, progressively absorbing and modernizing specific functions without requiring the underlying system of record to change on day one. This post lays out a reference architecture for that approach, the design decisions that matter most, and the failure patterns that tend to derail these programs.
The Strategic Rationale: Why Agent-Layer Modernization, Not Core Replacement
A full core banking replacement is a multi-year, high-risk undertaking, with a well-documented history of high-profile failures and cost overruns across the industry. The agent-layer approach reframes the problem: instead of asking “how do we replace the core,” it asks “how do we progressively build an intelligent layer that absorbs the operational complexity sitting around the core — orchestration, decisioning, exception handling, customer interaction — while the core itself continues doing what it’s actually good at: reliably maintaining the system of record for balances and transactions.”
This isn’t a permanent alternative to eventual core modernization for institutions that genuinely need it — it’s a way to capture a substantial share of the operational and customer-experience benefits of modernization on a much faster, lower-risk timeline, while keeping the option open to modernize the core itself on a separate, deliberate schedule.
Reference Architecture: The Layers
Layer 1 — System of record (existing core). The legacy or modern core banking platform continues to be the authoritative source of truth for account balances, transaction history, and core ledger operations. The architecture explicitly does not attempt to replace this layer’s role; it wraps and orchestrates around it.
Layer 2 — Integration and abstraction layer. A set of well-defined APIs and, increasingly, MCP-compatible interfaces expose the core’s capabilities to the agent layer above, abstracting away the core’s often idiosyncratic internal data formats and transaction semantics behind a cleaner, more consistent interface. This layer is frequently the most underestimated piece of the entire architecture — legacy cores were rarely designed with this kind of external access in mind, and building robust, well-governed abstraction here is genuinely hard engineering work, not a thin pass-through.
Layer 3 — Specialized domain agents. A set of agents, each responsible for a specific banking function — account servicing, payments orchestration, dispute handling, KYC/onboarding (as detailed in the Intermediate series), collections — operate against the abstraction layer, each with a clearly bounded scope and explicit permissions over what core operations it can invoke.
Layer 4 — Orchestration and workflow layer. A coordination layer — typically graph-based, per the orchestration patterns covered earlier in this series — manages how domain agents collaborate on requests that span multiple functions, maintains state for long-running processes, and enforces the overall business process logic that ties individual agent actions into a coherent customer or operational outcome.
Layer 5 — Guardian and governance layer. Independent guardian agents (per the Intermediate-series pattern) review proposed actions before they reach the abstraction layer and execute against the core, enforcing policy limits, detecting anomalies, and maintaining the audit trail that regulatory examination will eventually require.
Layer 6 — Human interface and escalation layer. Channels through which customers interact with the system (chat, voice, app) and through which human staff — customer service representatives, underwriters, compliance officers — receive escalations, review queues, and oversight dashboards.
Why Layering It This Way Matters
The specific value of this layered decomposition is that it lets an institution modernize incrementally, function by function, without a “everything changes at once” cutover. A bank might start by deploying a domain agent for a single, well-bounded function — say, standing instruction management — prove out the architecture and governance model at low risk, then progressively add domain agents for higher-stakes functions as confidence and operational maturity grow. Each new domain agent benefits from the integration, orchestration, and governance layers already built, rather than requiring its own bespoke infrastructure.
The Hardest Engineering Problem: The Abstraction Layer
Architects new to this kind of program consistently underestimate Layer 2. Legacy core banking systems frequently have inconsistent data models across product lines (a savings account and a mortgage might be represented in structurally different ways internally, despite both ultimately being “an account”), batch-oriented processing assumptions baked deep into their design (some operations may only fully settle overnight, not in real time), and limited or fragile APIs that weren’t built with high-volume, real-time agent access as a design goal.
A realistic program budgets significant time and senior engineering attention specifically for this layer, treating it as a first-class deliverable rather than “integration work” to be handled quickly by whichever team has spare capacity. Getting this layer wrong — building it as a thin, leaky pass-through that exposes the core’s inconsistencies directly to the agent layer — tends to produce exactly the kind of brittle, hard-to-govern system the whole architecture was meant to avoid.
State Management Across a Distributed Agent Landscape
Banking processes frequently span hours, days, or longer (a loan application, a dispute investigation), and need to survive system restarts, handle partial failures gracefully, and remain fully reconstructable for audit purposes. This argues strongly for the orchestration layer maintaining explicit, persisted state — not relying on individual agents to remember where a multi-step process currently stands, since an agent instance handling step three of a process may not be the same instance (or even the same underlying model version) that handles step seven, especially as the system evolves and is redeployed over the process’s lifetime.
A practical pattern: model each long-running business process as a graph with explicit, named states (submitted, under-verification, pending-human-review, approved, and so on), with the current state and full transition history persisted independently of any individual agent’s runtime memory. This is what makes a process resumable, auditable, and resilient to the inevitable failures and redeployments that occur over a process that might span days.
Failure Pattern #1: Underestimating the Governance Layer Until Late
A common, costly pattern: a program builds Layers 1 through 4 with real engineering rigor, treats the guardian and governance layer as a feature to add “once the core functionality works,” and then discovers during a pre-production risk review that retrofitting proper audit logging, permission boundaries, and independent action review across an already-built system is dramatically more expensive and disruptive than building it in from the start. The governance layer needs to be part of the architecture from the first domain agent deployed, even if its rules and thresholds evolve significantly over time — building the capability to govern early is far cheaper than retrofitting it later.
Failure Pattern #2: Scope Creep in Individual Domain Agents
Domain agents that start with a tightly bounded scope have a natural tendency to accumulate additional responsibilities over time, as teams find it easier to extend an existing, working agent than to design and deploy a new one. Left unchecked, this erodes the clean separation of concerns the architecture depends on, and eventually produces the same kind of unmanageable complexity within a single agent that the layered architecture was specifically designed to avoid at the system level. Disciplined architectural governance — a real review process for proposed scope expansions, not just engineering convenience — is necessary to prevent this drift.
Failure Pattern #3: Treating the Abstraction Layer as a One-Time Build
Core banking systems themselves evolve — vendor updates, internal customizations, occasional migrations of specific modules. An abstraction layer built once and never revisited becomes a growing source of subtle bugs as the underlying core drifts away from the assumptions the abstraction layer was built against. Mature programs treat this layer as a living piece of infrastructure with its own ongoing maintenance and testing discipline, including contract tests that would catch a core-side change breaking the abstraction layer’s assumptions before it reaches production.
Measuring Success Beyond “It’s Live”
A few metrics matter more than simple deployment status for judging whether this kind of program is actually succeeding:
- Percentage of transactions for a given function fully handled by the agent layer without human escalation, tracked over time as the system matures and as new domain agents are added.
- Mean time to add a new domain agent, which should decrease as the integration and orchestration layers mature — a key signal that the architectural investment is compounding rather than each new function requiring bespoke, one-off engineering.
- Audit and examination readiness — measured concretely by how long it takes to reconstruct the full decision history for any given customer interaction, a metric regulatory examiners will, in practice, eventually test directly.
- Incident rate and severity in the guardian layer — not zero (a guardian layer that never catches anything is either perfectly designed or, more likely, not being adequately tested), but tracked and trending toward catching issues earlier and with less customer impact over time.
Coming Up Next
This reference architecture assumes a baseline of regulatory compliance throughout, but doesn’t dive deep into the specific regulatory requirements shaping it. The next post does exactly that, with a close look at the EU AI Act’s concrete implications for high-risk credit and insurance decisioning — requirements that, in practice, shape several of the architectural choices described above.
