At Trino Fest 2023, we shared with the community how “Trino has [become] a one-stop shop for analytics” within Salesforce.
Trino has proven to be a game-changer for accessing and processing large volumes of production data quickly and efficiently, enabling our Platform Engineering Performance team to enhance its service availability and provide top-notch customer experiences.
In this post, we’ll delve deeper into how we’ve effectively utilized Trino at Salesforce to empower our Platform Engineering Performance team and implement advanced analytics and anomaly detection strategies.
The case for Trino at Salesforce
Salesforce hosts hundreds of thousands of customers that generate tens of millions of transactions daily, making it essential to have a powerful tool like Trino to efficiently query and analyze vast amounts of data. Trino has become the go-to platform for analytics on observability data, providing valuable insights to various teams (from service owners, SREs, performance engineers, program managers), and helping them enhance service availability.
To make analytics available to all users at Salesforce, there are internal teams that are working to make all data needs accessible in an optimal, easy, and timely fashion and that is what Trino has helped us achieve.
Building near-instant reports and dashboards
In the fast-paced world of production environments, time is of the essence when resolving investigations. Trino’s ability to rapidly perform advanced queries and spot patterns has enabled Salesforce to build near-instant reports and dashboards. This ability accelerated quick detection and resolution of potential performance issues—usage activity, utilization and risks per service— preventing any negative impact on customers.
Trino’s value: Reducing Saleforce’s analytics costs
Our performance team’s access to various data sources — log events, transactional records, etc — has significantly decreased the overall cost to build advanced analytics and anomaly detection models. Moreover, log analysis provides crucial insights into potential issues before they occur, empowering teams to be proactive by identifying root causes and preventing downtime and performance issues.
By extracting more value from production data, accelerating access, we’ve been able to facilitate faster decision-making, simplifying operations, and automated anomaly detection, ultimately leading to improved cost efficiency. Salesforce has shortened the time required to understand user behavior, enhance application and infrastructure performance, proactively mitigate risks, and ensure compliance with security policies, audits, and regulations.
Trino: Faster than querying similar queries from Splunk
Additionally, after our team transitioned from Splunk to Trino, we observed a remarkable 194% improvement in query execution time.
This significant enhancement is crucial for implementing production analytics as it enables rapid access to insights and actionable data, ensuring the maintenance of trust with customers. Trino has played a pivotal role in achieving these goals by providing quick and efficient access to vital information, benefiting the team in optimizing analytics for production processes.
Near real-time availability with extended historical data availability
Trino has proven to be a highly efficient tool, with Salesforce’s internal Trino team offering an impressive Service Level Agreement (SLA) of 20 minutes latency on all production logs, providing near real-time data availability. This real-time data plays a crucial role in enhancing productivity and operational efficiency.
The analytics derived from Trino enable the generation of concise reports that cut through the noise generated by vast data collections. Performance engineers and service owners benefit from these easily digestible reports, allowing them to pinpoint the precise information they need for building prediction and forecasting models, as well as assessing performance improvements or degradations over time.
An essential aspect that Trino addresses is the need for historical data (remember the tens of millions of transactions per day?). Unlike the previous data source, Splunk, which had a 30-day retention policy, Trino enables retention and access to data for up to two years. This extended historical data availability facilitates the examination of past and current data, enabling the prediction of performance profiles, seasonality, and daily trends for Salesforce’s customers. By foreseeing scale, capacity, and performance issues well in advance, Trino helps the organization mitigate potential problems proactively.
Moreover, Trino simplifies the process of gathering data from different sources, including app logs, metrics, business logs, and audit trails. Previously, numerous queries were required to collect information about user activity across all production systems. However, with Trino, these queries have been efficiently replaced with just a few aggregated sequels, resulting in significant improvements and streamlining data analysis, saving time and resources for the team.
How Trino can help Salesforce achieve in the future
Trino holds tremendous potential to help Salesforce achieve various goals in the future.
Performance assessment: Understanding our performance improvements and degradation over multiple releases
With its capability to store data for two years and provide easy and efficient access, Salesforce plans to assess performance changes over time for specific services across multiple releases. This analysis will involve exploring key performance indicators (KPIs) such as log volumes, JVM Heap usage, and resource usage on pods, enabling the identification of areas for optimization and improvement.
Quick and efficient Processing: Queries are able to run across more than one production pod at once
Moreover, since data is stored across all geographic regions for both first-party and high-performance cells, Salesforce can extend its models and analytics to include all customers, not just the prominent ones. This expansion will be valuable in understanding the impact of growing customer usage on underlying resources, facilitating proactive prediction and scaling without negatively impacting customers.
Integrating with Tableau dashboards
Our performance team is also working on integrating the data set seamlessly with Tableau, Salesforce’s primary dashboard solution. This will simplify information sharing across teams and enhance data visualization capabilities for better insights.
Real time anomaly detection
With Salesforce’s internal Trino team providing an 20 minute SLA on log latency, we are able to achieve real-time anomaly detection. 20-minute lag Service Level Agreement (SLA) ensures near real-time anomaly detection. This empowers performance engineers with immediate data access for troubleshooting and resolving issues swiftly, without any adverse impact on customers.
Salesforce x Trino: Shaping the future of performance engineering and enhancing customer experiences
Salesforce’s integration to Trino would not have been possible without their internal teams, particularly the Huron team. Our performance team’s collaboration with the Huron team has been crucial in enabling Salesforce to leverage Trino’s capabilities effectively.
In short, Trino has revolutionized the way Salesforce approaches performance engineering, analytics, and anomaly detection. Its efficiency, scalability, and real-time data access have been invaluable in delivering top-tier services to customers and driving continuous improvement.
Salesforce’s partnership with Trino is set to shape the future of performance engineering at the company, bringing even more exciting opportunities for enhancing customer experiences.