Google Spanner
Author: Ronald Fung
Creation Date: 8 June 2023
Next Modified Date: 8 June 2024
A. Introduction
Google Cloud Spanner is a fully managed, horizontally scalable, relational database service provided by Google Cloud. It is designed to handle large-scale, globally distributed, mission-critical applications that require high availability, strong consistency, and low latency.
Google Cloud Spanner provides a range of features that make it a powerful and flexible database service, including:
Horizontally scalable: Google Cloud Spanner is designed to scale horizontally across multiple regions and continents, providing high availability and low latency for global applications.
Strong consistency: Google Cloud Spanner provides strong consistency across all nodes, ensuring that all clients see the same data at the same time, regardless of where they are located.
SQL support: Google Cloud Spanner supports standard SQL, making it easy to migrate existing relational databases to the cloud or develop new applications.
Automatic sharding: Google Cloud Spanner automatically shards data across multiple nodes, ensuring that data is distributed evenly and efficiently.
Automatic replication: Google Cloud Spanner automatically replicates data across multiple regions and continents, providing high availability and disaster recovery.
Integration with other Google Cloud services: Google Cloud Spanner can be easily integrated with other Google Cloud services such as Google Cloud Storage and Google Cloud Pub/Sub, making it easy to build modern, web-scale applications.
Fully managed: Google Cloud Spanner is a fully managed service, which means that Google takes care of infrastructure management, security, and scaling, allowing developers to focus on building applications.
Overall, Google Cloud Spanner is a powerful and flexible database service that can help developers build and deploy globally distributed, mission-critical applications with ease. With its scalability, strong consistency, SQL support, automatic sharding and replication, integration with other Google Cloud services, and fully managed infrastructure, Google Cloud Spanner can provide a cost-effective and efficient solution for a wide range of use cases.
B. How is it used at Seagen
Seagen can use Google Cloud Spanner as a fully managed, horizontally scalable, and highly available relational database service to store and manage large-scale data for their biopharma research applications. If Seagen is already using Microsoft Azure, they can integrate Google Cloud Spanner with their current cloud infrastructure using various tools and services provided by Google Cloud.
To get started with Google Cloud Spanner, Seagen can follow these steps:
Create a Google Cloud account: If you haven’t already, create a Google Cloud account and enable billing for the account.
Create a Google Cloud Spanner instance: Create a Google Cloud Spanner instance using the Google Cloud Console, the gcloud command-line tool, or a client library such as the Node.js client library.
Configure the instance: Configure the instance to meet your requirements, such as setting the number of nodes, the size of the database, and the replication settings.
Create a database: Create a database within the instance using standard SQL statements.
Write data to the database: Write data to the database using standard SQL statements or client libraries for various programming languages.
Query the database: Query the database using standard SQL statements or client libraries to retrieve data.
Monitor the database: Monitor the database using the Google Cloud Console or Stackdriver Logging to ensure that it is running correctly and to troubleshoot any issues.
Overall, Google Cloud Spanner can be a powerful and flexible database service that Seagen can use to store and manage large-scale data for their biopharma research applications. With its scalability, strong consistency, SQL support, automatic sharding and replication, integration with other Google Cloud services, and fully managed infrastructure, Google Cloud Spanner can provide a cost-effective and efficient solution for Seagen’s data management needs.
C. Features
Write and read scalability with no limits
Spanner decouples compute resources from data storage, which makes it possible to transparently scale in and out processing resources. Each additional compute capacity can process both reads and writes, providing effortless horizontal scalability. Spanner optimizes performance by automatically handling the sharding, replication, and transaction processing. Google Cloud experts explaining how Spanner transactions work at planet scale VIDEO
Automated maintenance
Reduce operational costs and improve reliability for any size database. Synchronous replication and maintenance are automatic and built in. 100% online schema changes and maintenance while serving traffic with zero downtime.
PostgreSQL interface
Combine the scalability and reliability of Spanner with the familiarity and portability of a PostgreSQL interface. Use the skills and tools that your teams already know, future-proofing your investment for peace of mind.
Automatic database sharding
Never worry about manually resharding your database again. Built-in sharding automatically distributes data to optimize for performance and availability. Scale up and scale down without interruption.
Strong transactional consistency
Rely on industry-leading external consistency without compromising on scalability or availability.
Single-region and multi-region configurations
No matter where your users may be, apps backed by Spanner can read and write up-to-date strongly consistent data globally. Additionally, when running a multi-region instance, your database is protected against a regional failure and offers industry-leading 99.999% availability.
Unified analytics and AI on transactional data
Query data in Spanner from BigQuery in real time without moving or copying the data, bridging the gap between operational data and analytics and creating a unified data life cycle. Invoke Vertex AI models in transactions in Spanner using a simple SQL query (Preview).
Real-time change data capture and replication
Use Datastream to deliver change data from Oracle and MySQL databases into Spanner for up-to-date information. Use Spanner change streams to capture change data from Spanner databases and integrate it with other systems for analytics, event triggering, and compliance.
Granular instance sizing
Start with Spanner with a granular instance for only $65/month and scale it based on your needs without downtime and with no need for re-architecting.
Relational interface
Everything you would expect from a relational database—schemas, SQL queries, and ACID transactions—at any scale. Use Google Standard SQL or a PostgreSQL interface.
Rich application and tool support
Meet development teams where they are with native client libraries for Java/JDBC, Go, Python, C#, Node.js, PHP, Ruby, and C++ as well as the most popular ORMs, including Hibernate and Entity Framework.
Observability
Monitor performance of Spanner databases with metrics and stats. Analyze usage patterns in Spanner databases with Key Visualizer, an interactive monitoring tool. Use query insights for troubleshooting query performance issues and quickly diagnose lock contention issues with lock insights and transaction insights.
Enterprise-grade security and controls
Customer-managed encryption keys (CMEK), data-layer encryption, IAM integration for access and controls, and comprehensive audit logging. Support for VPC-SC, Access Transparency, and Access Approval. Fine-grained access control lets you authorize access to Spanner data at the table and column level.
Backup and restore, point-in-time recovery (PITR)
Backup your database to store a consistent copy of data and restore on demand. PITR provides continuous data protection with the ability to recover your past data to a microsecond granularity.
D. Where Implemented
E. How it is tested
To test Google Cloud Spanner, you can follow these steps:
Create a Google Cloud Spanner instance: First, create a Google Cloud Spanner instance using the Google Cloud Console, the gcloud command-line tool, or a client library such as the Node.js client library.
Create a database: Create a database within the instance using standard SQL statements.
Write data to the database: Write test data to the database using standard SQL statements or client libraries for various programming languages.
Query the database: Query the database using standard SQL statements or client libraries to retrieve data and ensure that it is correct.
Monitor the database: Monitor the database using the Google Cloud Console or Stackdriver Logging to ensure that it is running correctly and to troubleshoot any issues.
Test data consistency: Test the database’s strong consistency by updating data in one region and verifying that it is immediately available in other regions.
Test scalability: Test the database’s scalability by increasing the number of nodes in the instance and monitoring its performance.
Test disaster recovery: Test the database’s disaster recovery capabilities by simulating a region failure and verifying that the database remains available.
Overall, testing Google Cloud Spanner involves creating an instance, creating a database, writing data, querying the database, monitoring the database, testing data consistency, testing scalability, and testing disaster recovery. By following these steps and testing the database thoroughly, you can ensure that it works correctly and reliably in production.
F. 2023 Roadmap
????
G. 2024 Roadmap
????
H. Known Issues
While Google Cloud Spanner is generally a highly reliable and stable relational database service, there are a few known issues that users may encounter. Some of the known issues are:
High cost: Google Cloud Spanner can be expensive compared to other cloud-based databases, especially for applications that require high scalability or long-term storage.
Limited integration with third-party tools: Google Cloud Spanner has limited integration with third-party tools and services compared to other cloud-based databases, which can make it challenging to use with certain development workflows.
Limited support for complex data types: Google Cloud Spanner has limited support for complex data types, such as arrays and nested structures, which can make it challenging to work with certain types of data.
Limited availability in some regions: Google Cloud Spanner may not be available in all regions, which can limit its usefulness for applications with specific geographic requirements.
Limits on transaction size: Google Cloud Spanner has limits on transaction size, which can impact applications that require large transactions.
Limits on concurrent writes: Google Cloud Spanner has limits on the number of concurrent writes, which can impact applications with high write loads.
Overall, while these issues may impact some users, Google Cloud Spanner remains a highly scalable, reliable, and flexible relational database service that is well-suited for modern cloud-based applications. By carefully designing and testing databases to work within the limits of the service, and by using the tools and services provided by Google Cloud, users can minimize the impact of these issues and build robust and reliable applications.
[x] Reviewed by Enterprise Architecture
[x] Reviewed by Application Development
[x] Reviewed by Data Architecture