Editor's Note: This article originally appeared on the DataVisor blog on February 12, 2017.

As mentioned in my previous articles, traditional rule-based transaction monitoring systems (TMS) have architectural limitations which make them prone to false positives and false negatives:

Naive rules create a plague of false positives that are expensive for investigators to sift through
Sophisticated money launderers know how to circumvent rule-based systems leading to false negatives and potential fines from regulators

This article focuses on the third drawback of existing TMS solutions: how their inflexible data models lead to poor data quality, resulting in additional false positives and false negatives.

I think many of us working in the anti-money laundering (AML) technology space have experienced the frustration of spending many hours retrofitting new data types to squeeze into the rigid data model of a TMS. Unfortunately, the more effort we spend retrofitting data, the more likely we introduce data quality issues. Further, when we don’t complete it in a timely fashion, we’re exposed to risk of large fines from regulators. That said, there’s hope on the horizon from machine learning solutions that are more forgiving of disparate data formats.

Square peg in a round hole

Sending data from source systems to many of the existing TMS is like trying to fit a square peg in a round hole. There are two major reasons for why this is the case.

First, TMS require a lot of data of many various types. Financial institutions typically have many disparate customer, account and transaction systems that feed data into the TMS to satisfy monitoring requirements. Second, existing TMS have a monolithic data model that’s generally difficult to adjust without significant customization.

This forces the financial institution to change its data to conform. However, this is difficult because each source system will have its own unique characteristics and ultimately serve a different business purpose. For example, a mortgage lending application may function differently than a system handling retail demand deposit accounts (DDA). Furthermore, each system will have its own data model or way to store and update information.

Unfortunately, these challenges result in a long, arduous process that’s filled with subtle gotchas, leading to missing potential AML events, leaving you exposed to huge fines from regulators. For example, imagine that a financial institution purchased a commercial loans company. The financial institution must integrate the acquired company’s data into their existing TMS, but the process takes longer than anticipated. During a regulatory exam, the regulator uncovers that the purchased company’s data is still not being monitored by the existing TMS. The regulator views the acquired firm’s lack of integration into the existing AML framework as a red flag and decides to probe the program deeper than it had in the past.

Even worse, the more the data is reshaped to fit the TMS data model, the greater the likelihood of developing additional data issues. And as you know, this will lead to false positives and false negatives down the line.

The best solution is to minimize data transformations. If the files are kept as close to the system’s original format as possible, the data integrity issues will be isolated to the system. While a certain degree of data transformations will be required before the detection algorithms are run, this can be accomplished within the TMS itself. However, this would require a TMS that is not based on a monolithic data model, and has some flexibility and adaptability.

How unsupervised machine learning (UML) leads to a more flexible TMS

There are some promising AI-based TMS solutions that are designed to solve this data inconsistency problem. Using unsupervised machine learning (UML) allows the TMS to have flexible data requirements. (For more information about how UML works in the context of AML, read my first blog post on the subject.)

To understand why, consider their differences. Traditional TMS with rule-based models look for specific scenarios and require specific fields structured in certain ways to map them to their internal data model. UML does not have a strict data model that inputs must adhere to; rather, it works with the data that it’s given.

Consider the scenario where an account was previously dormant and then suddenly began transacting very quickly. A rule would require several highly specific data fields and encode strict thresholds in order to try to match the scenario. However, the rigidity of the data fields make the initial integration difficult which increases the likelihood of data quality issues. A secondary issue is the strict thresholds, which lead to false positives and false negatives.

On the other hand, a TMS that leverages UML can take in a variety of data fields to find hidden networks of accounts with anomalous behavior. For example, UML may uncover a network of accounts that were previously dormant and started transacting quickly.

Note this example is simplified, as in practice the UML model would take into account hundreds to thousands of different data attributes to uncover the network.

There are three major benefits of using UML to power or supplement a TMS. First, with low data integration effort required, there are few chances to make mistakes that lead to data quality issues (and ultimately, false positives and false negatives). Second, it’s faster to get the TMS up and running. And third, it’s much easier to add new data fields or entire new use cases over time. This includes changing business logic (for example, new product offerings are launched) and relentless criminals adapting their methods.

The future of TMS technology

Ultimately, detecting money laundering is extremely complex. To make matters worse, customers, customer behaviors, product offerings, regulatory requirements, and even institutions themselves are under a constant state of change. We must consider that the tools we use to fight financial crime today not only limit our technical capabilities, but may actually influence the way we think about the problem itself. As Marshall McLuhan said, “We shape our tools and afterwards our tools shape us.” It’s time we got some better tools.

AML Data Quality: The Challenge of Fitting a Square Peg into a Round Hole

Square peg in a round hole

How unsupervised machine learning (UML) leads to a more flexible TMS

The future of TMS technology

B2B Fintech: Payments, Supply Chain Finance & E-invoicing Guide 2017

BankThink De-risking shows failure of AML teams to innovate

Geographic Risk Intelligencefor AML & Fraud

Five products. One API.

GeoAML

GeoFraud

GeoExtend

GeoAnalytics

GeoDynamic

Know Your Geography

1. Collect

2. Normalize & Aggregate

3. Feature Engineering

4. Machine Learning

5. Risk Indicators

Built for the full financial crime lifecycle

CDD & Risk Rating

Transaction Monitoring

Banking Out of Jurisdiction

Elder Abuse Detection

Fraud & Scam Classification

New Location Risk Assessment

Measurable results from day one

Fewer false positive alerts

Granularity over binary flags

API enrichment

Ready to enrich your risk data?

Eight ML-driven risk factors. One composite score.

What GeoAML covers

HIDTA

HIFCA

Geographic Targeting Orders

Southwest Border

Drug Trafficking Risk

Industry Risk

International Nexus

Trade-Based Money Laundering

Why zip-code-level ML matters

Traditional: County-Level HIDTA

GeoAML: Zip-Code ML Scoring

See GeoAML in action

Geographic signals for fraud detection

Fraud-specific geographic risk dimensions

Nearest Bank Branch Analysis

Elder Abuse Area Classification

Counterparty Banking Out of Jurisdiction

Gang Territory Mapping

Mapped to real-world typologies

Elder Financial Exploitation

Gang-Driven Fraud & Identity Theft

Impersonation & APP Fraud

Check Fraud & Mail Theft

Add fraud-specific geographic intelligence

Cross-attribute anomaly detection

Six geographic matching dimensions

Zip-to-Phone

Zip-to-IP Address

Phone-to-IP

Zip-to-CounterParty

CounterParty-to-FI

Closest Branch Distance

Detect geographic anomalies in real time

Deep enrichment for every data point

Phone, IP, address & bank routing

Phone Area Code

IP Address

Address & Zip

Bank Routing Number

Branch Proximity

Entity Type Classification

Risk signals regulators expect you to catch

MSB & High-Risk Sector Indicators

Registered Agent & Shared Address Detection

Business Impersonation Detection

Crypto Company Watchlist

NAICS Code Prediction

Geographic Risk Intelligence
for AML & Fraud