Fraud and financial crimes are like whack-a-mole. It never ends. There are new patterns and methods of deception being created daily.  Predatory practices are changing daily, whether it’s new account fraud, account takeover, embezzlement, identity theft, or money laundering.¹ The most serious of financial crimes is realized in money laundering, and it’s a big business. According to the UN Office on Drugs and Crime, 2-5% of global GDP is laundered yearly. That is between $800 billion and $2 trillion annually. In 2020, the UK Financial Intelligence Unit (UKFIU) received and processed almost 600,000 suspicious activity reports (SARs). Over 95% of these reports came from financial institutions, costing them £28.7 billion.² 

The banking landscape has thousands of financial products and services, with new ones coming online daily, accounting for hundreds of billions of transactions daily. Each one of the products and services creates an opportunity for money laundering. Furthermore, fraudsters aggressively seek loopholes and vulnerabilities to use new techniques and exploit the products and services to launder funds. 

As financial institutions continue to revamp their AML efforts to combat fraudsters across know-your-customer (KYC) processes, monitoring, and investigations, this blog will explore architectural considerations financial institutions should consider to improve their analytics muscles for AML monitoring capabilities. The recommendations below are based on how leading financial services customers use Starburst to access and query their relevant federated data to improve their toolkits in the fight against financial crimes. 

Challenges with AML Monitoring today

Monitoring and detection are based on identifying sequences of events and transactions that raise red flags that prompt an investigation of potentially fraudulent activity.  Identifying what constitutes a red flag is driven by: 

  1. observation of previous laundering activities,
  2. risk factors and thresholds that result from KYC profiling, e.g., who are individuals more likely to engage in money laundering, and 
  3. specific knowledge of individual financial instruments and transactions. 

 As you might expect, this complexity is constantly changing and evolving in addition to the regulatory landscape, i.e., new country-level policies on cryptocurrencies. 

Therefore, one of the most critical technology factors that impact the effectiveness of money laundering monitoring is the ability to ascertain customer activity vis-a-vis their historical activities, and that requires the ability to go across various silos and do so fast not to impede business.  In an environment of rapidly changing conditions, e.g., billions of transactions per hour, hundreds of thousands of changing financial products, and millions of customers and entities, rapidly feeding and maintaining detection models depends on the ability to access and process petabytes of data quickly.  

A typical data architecture for anti-money laundering is summarized in the following figure.

All post-transaction AML functions (KYC, Monitoring, and Investigations) are carried out on curated data populated into a data lake of some form of distributed object storage (never directly against the transactional source systems). Relevant data from the transactional source feeds (transactions, products, customers, etc.) is typically dumped in raw form directly into a staging area of the data lake (Land).  Data integration processes (ELT) cleanse and transform the data into dimensional model structures familiar to a data warehouse for high-performance access and processing (Structure).  To satisfy the high-performance requirements of AML functions, the data is further refined into analytical data objects precisely tuned for specific AML operations (Consume).  

The significant data-related challenges in this architecture that affect AML effectiveness includes but are not limited to the following:

  • Raw data access/query performance detection models must run fast to be effective, making near-real-time access to all relevant data a critical requirement.
  • Proliferation of data copies – increases costs, risk, and time from moving and duplicating data just to make it usable.
  • Data latency and relevance expanding on the point above, the longer it takes to get the data from raw data feeds to analytical models, the greater the risk that a money laundering event will be detected too late to take effective action.
  • Limited drill through to detailed data – effective investigations and rapid elimination of false positives require agile access to detailed data that may reside in disparate data sets and sources.  Efficient drill-through access is critical to successful investigations.
  • Regulatory compliance complexity – limited agility to changing rules over hundreds of jurisdictions.
  • Data sovereignty – mandated data privacy protection and propriety between 100+ countries.

The impact and costs of inadequately addressing the above challenges leading to subpar monitoring capabilities are significant. 

  • Delayed detection – repercussions from delayed detection can vary depending on the size and frequency of the fraction. Banks can suffer from financial penalties, legal issues like financial crimes charges, increased regulatory scrutiny, loss of license (very extreme case), and even reputation damage. 
  • Delayed deployment of new prevention models – any amount of delay is too long in the world of fighting AML. Delays can increase the risk of fraudulent activities, lead to an inefficient AML process, increase operational costs, and cause non-compliance with regulatory requirements. 
  • Lengthy and delayed investigations – can lead to financial losses, reputation damage, regulatory scrutiny, increased risk exposure, and higher operational costs for banks. 
  • Multiple data copies carry costs and risks – while having multiple copies of read-only backup for high-alert fraud/AML data is required, there are other forms of duplication created by moving and copying data to make it usable. This can carry high costs and risk challenges for AML practices. Banks should carefully consider the potential risks of data inconsistencies, increased exposure to data breaches and complexity, and rising storage costs of multiple data copies.
  • Risk of losses and regulatory fines – incremental to the losses in funds, globally, 100 AML fines totaling $3.85 billion were issued between 2020 and 2021. What’s interesting to note is that though the United States Banking Secrecy Act has been considered the de facto standard for AML regulations, there has been an increasing volume of fines from outside the US, mostly the UK and the EU.³

Starburst blueprint for a modern Anti-money Laundering Monitoring center of excellence

Built on Trino, the open-source standard for SQL query engines, Starburst easily integrates into your existing AML architecture by connecting to 50+ data stores. Starburst eliminates the need for extraneous copying and moving data. Instead, individual Starburst connectors are enhanced with table statistics, aggregate pushdown, dynamic filtering, parallelism, and more. Together they provide a single point of access to all your data and create a data consumption layer for your data team. A single query can return results from data in Hadoop, S3, Snowflake, ADLS, Delta Lake, BigQuery, Kafka, Redshift, and many others.

Traditional SQL engines are strained querying data lakes and have difficulty with large data spread across multiple sources. Starburst provides a highly efficient, parallelized execution path that speeds queries while slashing time to insight to just minutes. With the true separation of data storage and compute,  Starburst future-proofs any analytics architecture to leverage best-of-breed BI applications better today—and tomorrow. Advanced caching to memory and other data sources, in addition to aggregate pushdown, dramatically improves performance on RDBMS sources. And with federated cost-based query optimization, Starburst brings the efficiency and flexibility to accelerate time to insight with high concurrency.  

Lastly, Starburst reinforces security within your AML architecture.  Security features such as end-to-end encryption, different authentication types, fine-grained access control, detailed security auditing, and more continue to provide security and AML teams with the needed reassurance that the right people have the right level of access at any time. This enables companies to integrate with a centralized security framework to manage fine-grained access control across an enterprise data lake and other data sources. In addition to our out-of-the-box security features, AML teams can leverage our deep integrations with our premier security partners.

In addition to the core platform capabilities, Starburst further enhances AML monitoring with the following:

  • Starburst Stargate allows users to access multiple data sources that are regionally or globally dispersed to enable cross-cloud and multi-region analytics while maintaining compliance with data sovereignty requirements. With Stargate, users can create a virtual data warehouse that combines data from different sources into a single logical read-only view without data movement or republication. This makes it easier to query and analyze data across the organization without complex ETL processes. 
  • Warp Speed increases query performance by up to 7x, and it can reduce cloud compute costs by up to 40% on AWS. Starburst Warp Speed splits data into blocks and automatically selects the most effective index for each block. This ensures data is available for fast analysis. Automatically cache frequently accessed data and provide data teams with the tools to adjust settings to meet performance and budget requirements. Workload-level monitoring detects hot data and bottlenecks, pointing data engineering teams to areas for improvement.
  • Data Products allows teams to create, publish, discover, and manage curated datasets. Each data product is a schema that holds a collection of one or more datasets, which are views or materialized views (if supported by the catalog). Data products are not limited in the number of datasets they can contain.  They can also be secured with fine-grained access control, ensuring consistent governance from source level to data products.

AML in practice with Starburst

The goal of any successful AML practice, beyond civic duty,  is to detect money laundering activities before the regulators do and file the necessary documentation like SARs to avoid the millions in fines from a third infraction. 

 A Starburst-powered monitoring AML practice could use the following approach:

  1. Identify the risks you want to observe and mitigate.
    • External – risks flagged by regulators 
    • Internal – Identify the AML risks for each financial product/service the institution offers.
  2. Design a model to detect and/or prevent the risk from occurring.
    • Modeling tools can use Starburst as the single point of access to more volume and diversity of data.
    • More relevant data from diverse sources yields better model training; typically, three or more months of consumer data is needed for training.
    • Know Your Customer (KYC) context is applied to the monitoring models to adjust threshold sensitivities for entities previously identified as higher-risk actors.
  3. Create threshold rules to flag suspicious transactions and generate alerts.
  4. Continuous retraining of machine learning models to fine-tune risk thresholds.
  5. Once a suspicious transaction alert is triggered, an investigation is initiated.  Starburst allows the investigator(s) to drill into and explore any relevant information to understand the suspicious event better.  Investigators can rapidly and directly query any required data repositories to get answers instantly instead of relying on AML teams to send time-consuming extracts from their databases.

Sample Anti-money laundering SQL queries:

With Starburst, users can access and analyze more complete data using their existing tools and skills based on ANSI SQL. Furthermore, with capabilities like Data Products, Starburst Stargate, and Query Federation, users can query data across multiple sources with simple SQL syntax for cross-cluster queries.  This avoids writing complex SQL queries and data integration operations, introducing multiple opportunities for query failure and risks to get the same results.

Without Starburst With Starburst
transaction_amount >= 10000 
transaction_date BETWEEN DATE_SUB(NOW(), INTERVAL 30 DAY) AND NOW ()
starburst.data_products.transactions AS t
starburst.data_products.customers AS c
t.transaction_amount >= 10000
t.transaction_date BETWEEN DATE_SUB(NOW(), INTERVAL 30 DAY) AND NOW ()
*These are example queries; specific AML requirements can vary depending on the jurisdiction and industry. It’s important to consult with legal and compliance experts to ensure that your AML monitoring process meets all applicable regulations.
This query selects information from two tables: transactions and customers. It returns the customer’s name, address, ID, and transaction information, including the transaction ID, amount, and date.

The WHERE clause includes two conditions. First, it selects transactions with an amount greater than or equal to $10,000. This threshold is often used as an indicator of potential money laundering activity. Second, it selects transactions that occurred within the last 30 days. This is based on the requirement of most AML regulations to monitor transactions in near-real-time.


This query is similar to the previous one, but it uses Starburst. The FROM clause specifies the data products created with Starburst, where the transactions and customers tables are stored. The JOIN clause joins the two tables on the customer_id column.

The query uses the WHERE clause to filter transactions with an amount greater than or equal to $10,000 and a transaction date within the last 30 days.

In addition, the query can take advantage of Starburst Stargate, a universal data access layer that allows users to execute cross-cluster queries without data movement, helping to enforce data sovereignty requirements. The transactions and customers tables can come from different data sources, such as Hadoop data store in India, cloud object storage in the US, or other databases worldwide.

Architectural considerations for enhancing your AML monitoring systems. 

As banks and other financial institutions continue to ramp up their AML practices with modern capabilities, here are some architecture considerations to factor in when making updates:

  1. Data integration and management – establish a robust data governance strategy that ensures data quality, accuracy, completeness, and consistency across all systems. Integration and management of data sources must be streamlined and automated to ensure the data is quickly accessible and accurate for effective monitoring and investigation. Does the considered architecture allow this without data duplication and excessive data movement? Do you maintain/gain optionality or move towards lock-in? Can you analyze data from multiple channels, including digital, branch, ATM, and call center transactions? 
  2. Cloud-based infrastructure – doesn’t mean being all in on the cloud, but cloud-based infrastructure does provide a flexible, scalable, and cost-effective solution for storing and processing large amounts of data. Cloud-based (including hybrid) allows financial services firms to scale up or down based on demand and provide quick access to data from any location – especially with rising data sovereignty policies like the 2018 Payment and Settlement Systems Act of India – this regulation or its successor has yet to come to fruition.  
  3. Real-time monitoring – consider how your architecture supports using real-time analytics and machine learning algorithms to identify patterns and anomalies in transactional data and detect suspicious activity. Can you easily access relevant data without complex, time-consuming ETL processes? Can you run extremely fast analytics without ‘breaking the bank’? Pun intended. 
  4. Collaboration and investigation –  How does the architecture today enable a collaborative investigation approach to ensure effective AML monitoring? AML teams need to be able to collaborate with cross-functional teams – to allow them to secure data access to all the relevant data is part of the equation, and how to leverage data products to ensure everyone is reading from the same data sets. 
  5. Regulatory compliance – How flexible is the architecture? Is it easy to adapt to the changing regulatory requirements? Financial institutions must adhere to multiple regulatory requirements across different jurisdictions. AML monitoring systems should be designed to incorporate the latest regulatory guidelines, and updates should be made in real-time to ensure ongoing compliance. This may be difficult if your architecture relies heavily on rigid, closed, and proprietary systems. 


Read “Has the notion of a single data source for Financial Services run its course?” for additional thoughts on the future of data architectures in Financial Services.


Moving forward

AML monitoring is a critical aspect of financial institutions’ operations. However, with the constantly changing landscape of financial crimes and increasing regional regulations and scrutiny, monitoring systems must be continuously updated and improved to detect and prevent fraudulent activities effectively. Financial institutions must leverage technology and data analytics to enhance their AML monitoring systems and establish a robust data governance strategy to ensure the accuracy and reliability of their data sources. By adopting Starburst’s data lake analytics platform, financial institutions gain simplicity, access, agility, and optionality to establish a single point of access for a modern AML monitoring center of excellence that can keep up with the ever-changing landscape of financial crimes. To learn more, visit Starburst for financial services and read how FINRA uses Starburst to process 100 billion new rows of data from 25+ countries daily to catch fraud, insider trading, and abuse.


¹ Fraud and Financial Crimes – FindLaw

² Overview (unodc.org)

³ Global Enforcement of Anti-Money Laundering Regulation: Shift in Focus | Kroll

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.