Fully managed in the cloudStarburst GalaxySelf-managed anywhereStarburst Enterprise
- Start Free
Fully managed in the cloud
This global vision has faded as countries assert principles of data sovereignty. Rather than building an efficient, unified cloud infrastructure, companies must manage multiple data silos optimized to comply with each country’s data regulations.
This guide will introduce data sovereignty, what’s driving countries to exert control over data, and a modern approach to analytics that unifies fragmented storage architectures.
Published: August 9, 2023
Data sovereignty is a legal concept defining jurisdiction over data. Specifically, sovereignty establishes the principle that any data collected or stored within a country is subject to its laws and regulations.
The United Nations Conference on Trade and Development reports that 70% of countries regulate how companies collect, store, and use data about their citizens.
Data residency is a data property that describes where it is stored. Data residency can also apply more broadly to compliance practices in each geographical location.
Consider a Los Angeles-based direct-to-consumer company. It stores customer data in on-premises servers and Amazon Cloud locations in Oregon and France. The company’s data resides in at least five jurisdictions:
The company’s data residency practices will map which data is stored where to help it comply with data regulations in each jurisdiction.
Complicating matters further, residency is not the sole determinant of sovereignty. The data’s origins, or its provenance, also matters. Many data privacy laws apply to data collected within a country’s jurisdiction, no matter where it is stored.
Regulations based on data sovereignty benefit people, economies, and societies. Companies must bear the burden of compliance by implementing those protections.
As American companies like Microsoft and Amazon came to dominate the cloud, people worldwide grew concerned about their data privacy. Data sovereignty gives countries control of data created and stored within their borders to protect their citizens.
Most data regulations protect individual privacy by giving people the right to decide how organizations collect and use their personal data.
Driven by intelligence-gathering practices like the USA PATRIOT Act, many nations expanded their definition of data sovereignty to include data localization — requiring data to remain within their borders.
Data sovereignty is not limited to rules protecting personal privacy. Industries or government agencies have data residency requirements that prevent the transfer of sensitive data beyond national borders.
Data fragmentation is a side-effect of data sovereignty. A company can’t leverage economies of scale by consolidating its data storage. Instead, it must manage data in multiple locations and jurisdictions. This fragmented infrastructure has significant impacts on data management efficiency and business decision-making.
Data infrastructure costs rise as companies decentralize storage across multiple cloud providers and on-premises data centers. This fragmentation creates significant hidden costs due to the egress fees that cloud service providers charge when customers move data to another service.
Centralized cloud storage offers significant improvements in network performance. Data moves within the cloud platform’s high-bandwidth, low-latency internal network. Stitching together national data silos creates bandwidth and latency penalties that can impact business performance.
Data sovereignty adds friction to innovation by inhibiting data insights. Information in one region is no longer accessible to analysts in another without significant coordination. Data teams must develop ETL pipelines to process regional data sets in compliance with data regulations. Only the most critical projects will justify this time and expense, undermining data-driven business cultures.
Data protection and data security are distinct concepts. Secure data is not necessarily protected. And protected data is not necessarily secure.
The purpose of data protection is to ensure the company always has access to the best quality data possible. These practices safeguard data integrity while allowing the recovery of data should those safeguards fail.
Data security’s purpose is to defend data and information systems from unauthorized access. Layers of security technologies and practices defend networks from external threats and data breaches. At the same time, authentication and authorization systems limit access to legitimate users.
Regulations based on data sovereignty principles set expectations for how companies protect and secure their data. Unfortunately, these expectations vary from country to country, making it impossible to set efficient company-wide policies. Instead, the data governance organization must coordinate protection and security policies that meet local requirements everywhere the company collects and stores data.
Sovereignty allows political jurisdictions to grant data privacy rights to their citizens and to set the rules organizations must follow to protect those rights. As mentioned earlier, both residency and provenance determine which privacy regulations apply to the data a company collects.
Returning to our hypothetical direct-to-consumer company, it collects data from Californians visiting its website and stores the data on company servers in Los Angeles. Residency and provenance subject the company to the California Consumer Privacy Act (CCPA). Moving servers out of state would change the data’s residency, but not its provenance, so the company would still have to comply with CCPA.
Three out of four countries have data localization laws requiring the local storage of data collected about their citizens.
China’s localization rules, for example, require in-country storage of all personal data and only allows data exports after a formal security review.
At the other extreme, trade agreements between Canada, Mexico, and the United States prohibit data localization.
The situation in Europe is murkier. Although the European Union’s General Data Protection Regulation (GDPR) does not explicitly require localization, this may be the safest path to compliance.
The Court of Justice of the European Union (CJEU) struck down data-sharing agreements between the EU and the US government. GDPR allows organizations to export EU citizens’ data provided “adequate protections” exist at the destination. The CJEU ruled that the United States does not have those protections.
Ubiquitous internet access and the power of cloud computing were supposed to make international business operations more efficient. All data would flow into the cloud, which end users could mine for insights that drove better data-based decisions. At the same time, companies could replace local IT infrastructure with more cost-effective cloud computing architectures.
Data sovereignty upends these plans. All data is no longer the same. Companies must handle European, Canadian, Indian, Chinese, and American data differently, implementing data protection and security practices unique to each jurisdiction.
The impact of data sovereignty is not limited to international business. The United States has federal data protection laws for specific industries, such as the healthcare industry’s Health Insurance Portability and Accountability Act (HIPAA), but no overarching protections for American citizens’ data privacy.
America’s lack of a national data privacy regime is one reason for the CJEU’s decision. It also creates a fragmented regulatory landscape within the United States as California, Colorado, Connecticut, Utah, and Virginia have enacted data privacy regulations.
This national and international data regulatory patchwork undermines data-driven decision-making. Companies must develop multiple data strategies to ensure compliance everywhere they do business. If data is accessible from other regions, it takes longer to analyze. Data scientists must settle for incomplete data sets for their advanced analytics projects. Ultimately, decision-makers cannot fully leverage their company’s vast resources for innovative insights.
Starburst resolves data sovereignty challenges by creating a virtual abstraction layer that unifies disparate data sources into a single point of access. Starburst customers can create clusters with catalogs and data sources specific to each geography. Data always resides at the source where governance systems can enforce locally-appropriate access control policies.
Starburst Stargate links these clusters together to make data globally accessible. When an analyst in the United States generates a query that requires data from the European Union, Stargate pushes the query to the European cluster. There, Starburst applies fine-grained access control policies to ensure the query uses data in accordance with European privacy rules. Data can be aggregated and anonymized so the analyst in America never sees the personal data of European citizens.
Starburst customers use Stargate to generate data-driven insights across geopolitical boundaries while honoring data sovereignty principles anywhere they do business.
Up to $500 in usage credits included
Up to $500 in usage credits included