Fully managed in the cloudStarburst GalaxySelf-managed anywhereStarburst Enterprise
- Start Free
Fully managed in the cloud
trade records per day
Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data— 100B rows of new data per day from 25+ sources.
Director of Data Analysis
trade records per day
This not-for-profit organization is authorized by the U.S. Congress to regulate a critical part of the securities industry – brokerage firms doing business with the public. One way that the self-regulatory organization carries out this mission is by analyzing close to 80 billion trading events daily from financial institutions to detect fraud, insider trading, and abuse. As data must be stored for years, the addition of TBs of new data daily leads to the accumulation of many PBs over time.
To address the challenges of massive data growth and increasing demand for efficient computing, the customer migrated its legacy data warehousing systems to an Amazon Web Services (AWS) data lake. When redesigning its data platform, the customer chose to separate compute and storage and query its multi-petabyte AWS data lake using Starburst Enterprise, the world’s fastest distributed SQL query engine.
Until a few years ago, the customer ran its data warehousing infrastructure on premises. Organizational barriers and scalability limitations forced them to create separate analytic silos, with each handling a subset of the entire dataset. The resulting data fragmentation made analytics difficult. The growing data volume and analytical needs also started to exceed the capacity of its legacy systems. Scaling was expensive and difficult. Solutions were sized to handle peak capacity, which meant that they became very costly. Expanding them was only possible through long procurement processes, and data had to be moved constantly, so it took far too long to perform analytics, slowing time to insight.
Shifting to a Scalable AWS Data Lake
To address its mounting data storage and processing challenges, the agency decided to completely rethink its data platform. In 2014, they made the decision to move from on-premise to an AWS data lake model. Today, their cloud data lake consists of:
One remaining challenge was to select an interactive SQL engine that would match the performance of the legacy MPP SQL systems. For ad-hoc analytics, the query SLAs are measured in seconds. They chose Starburst Enterprise because it was the only SQL engine able to operate at petabyte scale in the cloud and execute concurrent queries interactively against data stored on Amazon S3. Strong references from other well-known Trino users such as Facebook, Netflix, and Airbnb, combined with Starburst Enterprise enhancements and enterprise support, were crucial.
Starburst Enterprise’s proven integration with AWS was another essential feature. “Starburst was very data-lake friendly,” says the Director of Data Analysis. “It was as if it was built for that model. That was a key differentiator for us. We were very invested in the data lake.”
Today, they use Starburst Enterprise for ad-hoc data profiling, BI, and reporting. Teams of data analysts and scientists execute multiple concurrent SQL queries via JDBC and ODBC clients. Starburst Enterprise then authenticates requests with Active Directory using LDAP and authorizes them via Hive Metastore table permission checks. Finally, during query execution, Trino reads the ORC table data directly off Amazon S3.
The customer has also built several interactive web applications which leverage Starburst as their backend SQL query engine to access data in the AWS Amazon S3 Data Lake.
Faster Insights at Lower Cost
In addition to various added features and optimizations, moving to AWS and partnering with Starburst Enterprise provides the company with a number of advantages over its legacy platform, including:
Leveraging both Starburst and AWS Amazon S3 cloud storage eliminates the need to invest in expensive proprietary Big Data appliances to support ever increasing volumes of data. Working with Starburst also results in a significant reduction of Amazon Elastic Compute Cloud (Amazon EC2) costs.
The company can analyze its data interactively in an ad-hoc manner without the data copying and loading required in the past. The migration from legacy data warehousing systems was seamless to end users, and the process of researching market manipulation and investigating potential fraud is now faster than before.
Overall, Starburst gives the customer a scalable, cost-effective way to analyze its constantly growing volumes of data, which is needed to investigate potential abuse cases and conduct ad-hoc exploratory analyses looking for new fraud schemes. “We monitor market data for trading fraud,” says the Director of Data Analysis. “Starburst separates compute and storage, making it possible to scale economically and analyze 25PB of data — 100 billion rows of new data per day from 25+ sources.”
Up to $500 in usage credits included