Join Starburst on May 28th for Launch Point, our new product summit showcasing the future of Starburst.

Scalable analytics to ensure market integrity

This organization governs brokers and broker-dealer firms in the United States to ensure the integrity of America’s financial system. Starburst serves as a scalable, cost-effective way for the organization to analyze its constantly growing volumes of data.

  • 80 million

    trade records per day

  • 25+

    data sources

  • 20X

    faster queries

  • Region

    Americas

  • Industry

    finserv

  • Environment

    aws

  • Solution

    enterprise

  • Employees

    1000+

Cover
Anonymous

Anonymous

Director of Data Analysis

Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data— 100B rows of new data per day from 25+ sources.

  • About

    This not-for-profit organization is authorized by the U.S. Congress to regulate a critical part of the securities industry – brokerage firms doing business with the public. One way that the self-regulatory organization carries out this mission is by analyzing close to 80 billion trading events daily from financial institutions to detect fraud, insider trading, and abuse. As data must be stored for years, the addition of TBs of new data daily leads to the accumulation of many PBs over time.

    To address the challenges of massive data growth and increasing demand for efficient computing, the customer migrated its legacy data warehousing systems to an Amazon Web Services (AWS) data lake. When redesigning its data platform, the customer chose to separate compute and storage and query its multi-petabyte AWS data lake using Starburst Enterprise, the world’s fastest distributed SQL query engine.

  • Challenge

    Until a few years ago, the customer ran its data warehousing infrastructure on premises. Organizational barriers and scalability limitations forced them to create separate analytic silos, with each handling a subset of the entire dataset. The resulting data fragmentation made analytics difficult. The growing data volume and analytical needs also started to exceed the capacity of its legacy systems. Scaling was expensive and difficult. Solutions were sized to handle peak capacity, which meant that they became very costly. Expanding them was only possible through long procurement processes, and data had to be moved constantly, so it took far too long to perform analytics, slowing time to insight.

  • Solution

    Shifting to a Scalable AWS Data Lake

    To address its mounting data storage and processing challenges, the agency decided to completely rethink its data platform. In 2014, they made the decision to move from on-premise to an AWS data lake model. Today, their cloud data lake consists of:

    • Elastic Compute Clusters (both long-standing and transient ones)
    • Central Catalog (metadata repository)
    • Amazon Simple Storage Service (Amazon S3) Cloud
    • Storage (object store)

    Starburst Enterprise

    One remaining challenge was to select an interactive SQL engine that would match the performance of the legacy MPP SQL systems. For ad-hoc analytics, the query SLAs are measured in seconds. They chose Starburst Enterprise because it was the only SQL engine able to operate at petabyte scale in the cloud and execute concurrent queries interactively against data stored on Amazon S3. Strong references from other well-known Trino users such as Facebook, Netflix, and Airbnb, combined with Starburst Enterprise enhancements and enterprise support, were crucial.

    Starburst Enterprise’s proven integration with AWS was another essential feature. “Starburst was very data-lake friendly,” says the Director of Data Analysis. “It was as if it was built for that model. That was a key differentiator for us. We were very invested in the data lake.”

    Today, they use Starburst Enterprise for ad-hoc data profiling, BI, and reporting. Teams of data analysts and scientists execute multiple concurrent SQL queries via JDBC and ODBC clients. Starburst Enterprise then authenticates requests with Active Directory using LDAP and authorizes them via Hive Metastore table permission checks. Finally, during query execution, Trino reads the ORC table data directly off Amazon S3.

    The customer has also built several interactive web applications which leverage Starburst as their backend SQL query engine to access data in the AWS Amazon S3 Data Lake.

  • Results

    Faster Insights at Lower Cost

    In addition to various added features and optimizations, moving to AWS and partnering with Starburst Enterprise provides the company with a number of advantages over its legacy platform, including:

    • Scalability – no need to worry about data storage or compute resources
    • Elasticity – scaling compute up and down as desired, no need to provision for peak usage anymore
    • Accessibility – no silos and time to insight vastly reduced
    • Performance – upgrades to Starburst Enterprise has resulted in 20X faster queries
    • Flexibility – ability to use the best tool for a given analytical use case
    • Cost efficiency – thanks to AWS and cloud economies of scale

    Leveraging both Starburst and AWS Amazon S3 cloud storage eliminates the need to invest in expensive proprietary Big Data appliances to support ever increasing volumes of data. Working with Starburst also results in a significant reduction of Amazon Elastic Compute Cloud (Amazon EC2) costs.

    The company can analyze its data interactively in an ad-hoc manner without the data copying and loading required in the past. The migration from legacy data warehousing systems was seamless to end users, and the process of researching market manipulation and investigating potential fraud is now faster than before.

    Overall, Starburst gives the customer a scalable, cost-effective way to analyze its constantly growing volumes of data, which is needed to investigate potential abuse cases and conduct ad-hoc exploratory analyses looking for new fraud schemes. “We monitor market data for trading fraud,” says the Director of Data Analysis. “Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data — 100 billion rows of new data per day from 25+ sources.”

Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes. For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.