Datanova for Data Scientists: A Quick Recap

By: Jacqueline Vail
July 21, 2021
Share: Linked In

On July 14, after two successful Datanova conferences, we completed our third installment of the Datanova series, this time curated for the Data Scientist persona! We discussed a variety of topics including intelligent edge, data lakehouses, Data Mesh, and Trino in order to deliver knowledge and provide insights around emerging trends and best practices. 

We were pleased to have been joined by many of our partners, data scientists, and thought leaders to facilitate great discussions on all things data science. Thank you to our guests from Assurance, BNP Paribas, Red Hat, Salesforce, ThoughtWorks, Trino, and Zalando.

Thank you to all who participated and if you didn’t get the chance to join us live, check out our on-demand sessions available below!


The Intelligent Edge and Event Science

We kicked off Datanova with an insightful session on the ‘Intelligent Edge’ by leading Data Scientist, Kirk Borne. With metaphors from Formula 1 racing and quotes from Yogi Berra, he explained how the ability to use the intelligent edge and sensory data can allow data scientists to make better analytical decisions.

Data is at the forefront of everything: “Innovations are inspired by data, informed by data, enabled by data, and create value from data.” New intelligent edge technologies, like model monitoring, robotics, and drones, are powered by AI and ML in order to allow data users to “see around the corner” and monitor important data, detect early signs of risk events, and discover the right questions to ask your data. In order to make the most accurate predictions, data scientists need the most accurate data. 

Stream it on demand here.


Accelerating Data Science with Trino

This next panel featured Brian Luisi, Starburst Regional Manager, and Starburst’s CTOs, Dain Sundstrom, and David Philips, and they discussed how Starburst and Trino assist in the responsibilities of data scientists, including collecting data, interacting with data, and making predictive models. They discussed what data access means for data scientists, Trino use cases, and best practices for data scientists. 

Data scientists need to have an engine that can interact with all of their data and is able to quickly sift through it, which allows for easier profiling and exploration. Trino and Starburst allows data scientists to do this by providing a single point of access to large amounts of data. David described Trino as “fast and distributed, so if you need to process data, you can process it way faster than you could on a Python script.” In order to be able to analyze large volumes of data, Trino’s SQL-based MPP query engine provides great value to scientists looking to expedite their processing.

Stream it on demand here.


Data Lakehouse: A New Architectural Horizon

A hot topic in the big data field, this discussion led by Paco Nathan, explored the pros and cons of the data lakehouse. Although the lakehouse combines a lot of the best aspects of a data warehouse and the best aspects of a data lake, there are some complications that come along with the adoption of the data lakehouse.

The panelists, Adri Purkayasth from BNP Paribas, Anjali Samani from Salesforce, and Tom Nats of Starburst, debated what it means for organizations to implement this data architecture including, regulatory considerations, standardizations, and the struggles with blending old architecture with the new architecture. Anjali said, “To realize a lot of value out of your data science investments, you need access to alternative data sources and that’s where data lakehouse architecture is very attractive.”

Stream it on demand here.


What Data Mesh Means for Data Scientists

Data Mesh is both a technical and organizational approach to managing and accessing data. We were lucky enough to have Zhamak Dehghani, who coined the term Data Mesh, join us for this discussion on how the Data Mesh impacts data scientists. 

By having this decentralized approach and treating data as a product, data is able to be exposed and shared with those that need it the most, including the data scientists. Zhamak said, “To get value from data, particularly for analytical purposes, when we want to make predictions or we want to discover trends, we have to aggregate and centralize data in one place, under the control of one set of technologies and specifically under the control of a centralized team.”

The question of data ownership is a huge pain point for data scientists. However, with the Data Mesh architecture, scientists are able to access data where it lives and better contribute to the overall business goals. In a conversation moderated by Sophie Watkins of Red Hat, the panel comprising of Daniel Abadi, professor of Computer Science at University of Maryland, Max Schultze of Zalando, and Zhamak Dehghani of Thoughtworks, discussed pain points, best practices and the overall principles of Data Mesh.

Stream it on demand here.


Customer Lightening Talk

Mitchell Poslums, Senior Data Scientist at Assurance, which is an online insurance distribution platform, presented on how the implementation of Trino and Starburst “have really enabled [Assurance] to accelerate time to insights, improve our conversion rates, and enable robust modeling all coalescing to achieve better business outcomes.”

In order to provide the best service to their customer, Assurance needs the best out of their data. For all technology businesses, real time data is crucial; insights can’t be achieved without accurate, timely data. With Starburst and Trino, Assurance was able to solve their blockers to insights and effectively provided the best solutions to their customers.

Stream it on demand here.


Partner Spotlight: Red Hat OpenShift Data Science

Red Hat has launched a new product, Red Hat OpenShift Data Science, which gives data scientists the ability to produce production ready models as quickly as possible. It also easily supports a Data Mesh paradigm.

Through a video demonstration, Karl Eklund, Principal Architect at Red Hat, showed how this tool provides a central location to explore data and build models. There is also no vendor lock-in, so users can use other tools, like Starburst, to better leverage this new technology. So, with these technologies, “data scientists have everything they need to be successful in consuming data, building models, and then deploying and monitoring them.”

Stream it on demand here.


But wait there’s more….

Data Leaders, it’s your turn next! Stay tuned for Datanova for Data Leaders. Details will be shared soon, so watch this space for the latest updates!

Jacqueline Vail

Marketing Communications Specialist, Starburst

Jacqueline is a Marketing Communications Specialist at Starburst

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.