Google BigQuery

  • Author: Ronald Fung

  • Creation Date: 7 June 2023

  • Next Modified Date: 7 June 2024


A. Introduction

BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data. Use built-in ML/AI and BI for insights at scale.


B. How is it used at Seagen

Google BigQuery is a cloud-based data warehouse designed to process large amounts of data quickly. Seagen can use BigQuery to perform advanced analytics on large datasets, including genomics data, clinical trial data, and real-world evidence.

Here are some ways Seagen can use Google BigQuery:

  1. Data integration: Seagen can use BigQuery to integrate multiple data sources, including internal and external data, to get a comprehensive view of their research data.

  2. Data analysis: Seagen can use BigQuery to run complex queries and perform advanced analytics on their data. For example, they can use BigQuery to identify correlations between different datasets, or to perform statistical analyses on large clinical trial datasets.

  3. Machine learning: Seagen can use BigQuery to build and train machine learning models on their data. This can help them identify patterns and insights that are not immediately apparent from the data.

  4. Data visualization: Seagen can use BigQuery to create interactive dashboards and visualizations of their data. This can help them communicate their research findings to stakeholders and make data-driven decisions.

Overall, Google BigQuery can help Seagen accelerate their research and make more informed decisions by providing them with powerful analytics and data processing capabilities.


C. Features

Flexibility, predictable pricing, and best price performance

BigQuery editions allow you to pick the right feature set for individual workload requirements with the ability to mix and match for the right price-performance. Compute capacity autoscaling adds fine-grained compute resources in real time to match the needs of your workload demands, and ensure you only pay for the compute capacity you use. With compressed storage pricing, you can reduce your storage costs while increasing your data footprint at the same time.

Built-in machine learning

BigQuery ML enables data scientists and data analysts to build and operationalize ML models on planet-scale structured, semi-structured, and now unstructured data directly inside BigQuery, using simple SQL—in a fraction of the time. Export BigQuery ML models for online prediction into Vertex AI or your own serving layer. Learn more about the models we currently support.

Analyze and share data across clouds

BigQuery Omni is a fully managed, multicloud analytics solution that allows for cost-effective and secure data analysis across clouds and shares results within a single pane of glass. Within BigQuery Analytics Hub, securely exchange data assets internally and across organizations without data movement and enhance analysis with commercial, public, and Google datasets.

Real-time analytics with streaming data pipelines

BigQuery has built-in capabilities that ingest streaming data and make it immediately available to query, along with native integrations to streaming products like Dataflow. Analyze large datasets interactively with BigQuery BI Engine, an in-memory analysis service that offers sub-second query response time and high concurrency. Accelerate query performance and reduce costs within your environment with BigQuery materialized views.

Unify, manage, and govern all types of data

Query all data types with BigQuery: structured, semi-structured, and unstructured. Use BigLake to explore and unify different data types and build advanced models. Centrally discover, manage, monitor, and govern data across data lakes, data warehouses, and data marts with consistent controls with Dataplex, an intelligent data fabric that enables organizations to provide access to trusted data.

Share insights with built-in business intelligence

With built-in business intelligence, create and share insights in a few clicks with Looker Studio or build data-rich experiences that go beyond BI with Looker. Analyze billions of rows of live BigQuery data in Google Sheets with familiar tools like pivot tables, charts, and formulas, to easily derive insights from big data with Connected Sheets.

Data governance and security

BigQuery’s integration with security and privacy services from Google Cloud provides strong security and fine-grained governance controls, down to the column level and row level. Rest assured knowing your data is encrypted at rest and in transit by default.

Geospatial analysis with BigQuery

BigQuery geospatial uniquely combines the serverless architecture of BigQuery with native support for geospatial analysis, so you can augment your analytics workflows with location intelligence. Simplify your analyses, see spatial data in fresh ways, and unlock entirely new lines of business with support for arbitrary points, lines, polygons, and multi-polygons in common geospatial data formats.

Real-time change data capture and replication

Synchronize data across heterogeneous databases, storage systems, and applications reliably and with minimal latency with Datastream. Datastream integrates with purpose-built and extensible Dataflow templates to pull change streams written to Cloud Storage, and create up-to-date replicated tables in BigQuery for real-time analytics.

Standard SQL

BigQuery supports a standard SQL dialect that is ANSI 2011 compliant, which reduces the need for code rewrites. BigQuery also provides ODBC and JDBC drivers at no cost to ensure your current applications can interact with its powerful engine.


D. Where Implemented

LeanIX


E. How it is tested

Google BigQuery provides several ways to test your queries and ensure they are accurate. Here are some ways to test Google BigQuery:

  1. Run sample queries: Google BigQuery provides a public dataset that you can use to run sample queries and test different features of the platform. You can also create your own sample datasets to test your queries.

  2. Use query validation tools: BigQuery includes a query validator that checks your SQL syntax and provides suggestions for improvements. This can help you catch errors before you run your query.

  3. Preview query results: BigQuery allows you to preview the first few rows of your query results before you run your query. This can help you ensure that your query is returning the expected results.

  4. Use query caching: BigQuery caches results from recent queries, which can speed up subsequent queries. This can help you test queries more quickly and efficiently.

  5. Use unit tests: To ensure the accuracy of your queries over time, you can create unit tests that check the output of your queries against expected results. This can help you catch errors and ensure that your queries are functioning correctly as your data changes.

Overall, testing Google BigQuery queries involves a combination of using built-in validation tools, previewing results, and creating your own sample datasets and unit tests. By testing your queries thoroughly, you can ensure that your data analysis is accurate and reliable.


F. 2023 Roadmap

????


G. 2024 Roadmap

????


H. Known Issues

While Google BigQuery is a powerful data warehouse with many features, there are some known issues that users should be aware of. Here are some of the most common known issues for Google BigQuery:

  1. Query performance: Although BigQuery is designed to handle large datasets, query performance can be slow for complex queries or large datasets. Users should ensure that their queries are optimized and that they are using the appropriate resources to improve query performance.

  2. Cost: BigQuery is a pay-per-query service, and costs can quickly add up for users with large datasets or frequent queries. Users should carefully monitor their usage and consider using cost-saving measures, such as partitioning data, to reduce costs.

  3. Data ingestion: Loading data into BigQuery can be slow, especially for large datasets. Users should ensure that their data is properly formatted and compressed to speed up the ingestion process.

  4. Data export: Exporting data from BigQuery can be slow and can be limited by the size of the export. Users should consider exporting data in smaller chunks or using third-party tools to speed up the process.

  5. Limited SQL support: BigQuery supports standard SQL, but some advanced SQL features are not supported. Users should be aware of these limitations and ensure that their queries are compatible with BigQuery’s SQL dialect.

Overall, Google BigQuery is a powerful data warehouse that can handle large datasets and complex queries. However, users should be aware of these known issues and take steps to optimize their queries and manage costs.


[x] Reviewed by Enterprise Architecture

[x] Reviewed by Application Development

[x] Reviewed by Data Architecture