🏦
Financial Services Post
This post covers architecture patterns specifically for banking and financial services environments — including regulated cloud, audit requirements, and compliance constraints.

Credit Underwriting with Agentic AI: A Human-in-the-Loop Lending Workflow

Credit underwriting is, in many ways, the most regulation-aware domain we’ve covered so far in this series. Unlike fraud detection, where the regulatory concern is mostly about explainability after the fact, lending decisions in most jurisdictions are governed by explicit fair-lending laws that prohibit discriminatory outcomes — not just discriminatory intent — and require specific, substantive reasons when an application is declined. Designing an agentic underwriting workflow means designing for that legal reality from the architecture diagram stage, not as a compliance review that happens after the system is built.

Why “Human-in-the-Loop” Means Something More Specific Here Than Elsewhere

In several earlier posts in this series, human-in-the-loop has meant “a human reviews ambiguous or high-risk cases.” In credit underwriting, that’s necessary but not sufficient. Many lending programs are also subject to specific requirements around model validation, documented decisioning logic, and — particularly for any model that materially influences a credit decision — formal model risk management oversight, a topic explored in much greater depth in the Expert series. The architecture needs to support not just operational human review, but a deeper, ongoing governance process involving risk, compliance, and often legal teams.

A Realistic Architecture

1. Application intake and data aggregation. An intake agent collects the application along with supporting documents (income verification, bank statements) and aggregates external data the lender is permitted to use — credit bureau data, in most cases, plus whatever alternative data sources the institution has validated for use in underwriting.

2. Document and data verification. A verification component checks submitted documents for authenticity and internal consistency — does the stated income roughly match the bank statement deposits, for instance — flagging discrepancies for review rather than silently accepting unverified self-reported data.

3. Risk scoring. A scoring component — which may combine a traditional, well-validated statistical credit model with newer machine learning approaches — produces a risk assessment. This is frequently the most heavily governed component in the entire pipeline, because it’s the part most directly subject to fair-lending scrutiny and formal model validation requirements.

4. Policy application and decisioning. A decisioning layer applies the lender’s credit policy — minimum score thresholds, debt-to-income limits, program-specific rules — to the risk score and application details, producing a recommended decision: approve, decline, or refer for manual underwriting.

5. Adverse action reasoning. For any declined or adjusted application, a dedicated component generates the specific, substantive reasons required by law — not a generic “your application was not approved at this time,” but the actual primary factors that drove the decision, in the form regulations in most jurisdictions specifically require.

6. Human underwriter review. Applications that don’t clear the auto-approve threshold, that show conflicting signals, or that fall into product categories the institution has chosen to keep under mandatory human review, are routed to a human underwriter with a complete, organized case file — including the risk score, the specific factors behind it, and any discrepancies the verification step flagged.

Where Autonomous Decisioning Should — and Shouldn’t — Apply

A defensible design draws a clear, deliberate line about which decisions an agent can finalize autonomously and which always require a human underwriter, and that line should be set by risk, compliance, and business leadership together — not left to engineering convenience.

A reasonably common pattern: auto-approval for applications that clearly meet strong, conservative criteria across every dimension the policy cares about, where the institution has high confidence the outcome is both profitable and fair. Auto-decline is handled with more caution in many institutions — some choose to route all declines through at least a lightweight human check specifically because of the adverse action and fair-lending stakes involved, even when the underlying score clearly indicates decline. Everything in between — and a meaningful share of applications genuinely fall here — goes to a human underwriter, with the agentic system’s role being to prepare and organize the case as thoroughly as possible, not to render the final verdict.

The Fair Lending Architecture Requirement

This deserves its own section because it’s not optional, and it has specific architectural implications:

Disparate impact testing has to be a built-in, ongoing process, not a one-time validation. The risk-scoring and decisioning components need to be regularly tested for whether they produce statistically different outcomes across protected characteristics — even when those characteristics aren’t explicitly used as inputs, since machine learning models can sometimes learn to approximate a prohibited characteristic through a combination of permitted ones (a well-documented risk known as proxy discrimination). This testing needs to be built into the operational lifecycle of the system, with a clear, scheduled cadence, not treated as a launch-gate exercise that’s never revisited.

Explainability has to be genuine, not superficial. A model that can’t produce a specific, accurate account of which factors drove a given decision can’t satisfy adverse action notice requirements in most jurisdictions, regardless of how accurate its predictions are. This is a hard architectural constraint that rules out certain modeling approaches for the core scoring component, or at minimum requires pairing a less interpretable model with a separate, validated explanation-generation layer.

Version control and decision reconstruction. Given that lending decisions can be challenged or examined well after the fact — sometimes years later, in litigation or regulatory examination — the architecture needs to be able to reconstruct exactly which model version, which policy version, and which specific inputs produced a historical decision, on demand.

A Worked Example: How the System Should Handle Ambiguity

Consider an applicant whose credit score is in a borderline range, whose stated income is slightly higher than what their bank statements appear to directly support, but who has an otherwise clean credit history and a stable, verifiable employment record.

A well-designed system doesn’t force this into a binary approve/decline through the automated path. The verification component flags the income discrepancy explicitly rather than silently averaging it away. The risk-scoring component produces its assessment along with the specific factors contributing to the borderline result. The case is routed to a human underwriter with all of this laid out clearly — including, ideally, a note on what additional documentation (like an updated pay stub) might resolve the discrepancy — rather than the system guessing its way to a confident-sounding but potentially wrong conclusion on a case that genuinely calls for judgment and possibly a follow-up conversation with the applicant.

Why This Domain Is a Useful Template for Other High-Stakes Decisions

Credit underwriting’s combination of strict regulation, real consequences for individuals, and a genuine need for both speed and fairness makes it a useful template for thinking about other high-stakes agentic decisioning domains — insurance underwriting, certain categories of medical eligibility decisions, and other consequential, individual-level determinations share a similar shape: autonomous handling for the clear cases, mandatory and well-prepared human review for everything else, explainability designed in from the start rather than retrofitted, and ongoing, scheduled fairness testing rather than a one-time check.

Coming Up Next

We’ve now covered fraud and credit — two decisioning-heavy domains. The next post introduces a different kind of architectural component that’s becoming increasingly important across all of these high-stakes use cases: the guardian agent, a dedicated safety layer that reviews and can override other agents’ proposed actions before they execute.

Ashish Pande
Ashish Pande
Solutions Architect · Agentic AI Specialist · AWS | GCP | Azure

20+ years delivering complex solutions in financial services. Currently building enterprise-grade Agentic AI on AWS, leading a team of 24 engineers.

View full profile →