Google Batch
Author: Ronald Fung
Creation Date: 8 June 2023
Next Modified Date: 8 June 2024
A. Introduction
Batch is a fully managed service that lets you schedule, queue, and execute batch processing workloads on Google Cloud resources. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale.
Using Batch, you don’t need to configure and manage third-party job schedulers, provision and deprovision resources, or request resources one zone at a time. To run a job, you specify parameters for the resources required for your workload, then Batch obtains resources and queues the job for execution. Batch provides native integration with other Google Cloud services to aid in the scheduling, execution, storage, and analysis of batch jobs, so you can focus on submitting a job and consuming the results.
Batch consists of the following components:
Job
: A scheduled program that runs a set of tasks to completion without any user interaction, typically for computational workloads. For example, a job might be a single shell script or a complex, multipart computation.A job is executed through one or more specific actions called tasks. Each Batch job consists of an array of one or more tasks that all run the same runnables, which are the executable script(s) and container(s) for your job. A job’s tasks can run in parallel or sequentially on the job’s resources.
Tasks
: Programmatic actions that are defined as part of a job and executed when the job runs. Each task is part of a job’s task group. The job’s runnables are run by each task in the job.Resources
: The infrastructure needed to run a job. Each Batch job runs on a regional managed instance group (MIG) of Compute Engine virtual machine (VM) instances based on the job’s specified requirements and location. If specified, a job might also use additional compute resources, like GPUs, or additional read/write storage resources, like local SSDs or a Cloud Storage bucket. Some of the factors that determine the number of VMs provisioned for a job include the compute resources required for each task and the job’s parallelism: whether you want tasks to run sequentially on one VM or simultaneously on multiple VMs.
In summary, Batch lets you create and run jobs that each automatically provision and utilize the resources required to execute its tasks.
B. How is it used at Seagen
Seagen can use Google Cloud Batch to manage and execute batch computing workloads, such as drug discovery simulations and genomics research, in a cost-effective and scalable way. Here are some steps to get started with Google Cloud Batch:
Create a Google Cloud account: Seagen can create a Google Cloud account in the Google Cloud Console. This will give them access to Google Cloud Batch and other Google Cloud services.
Create a batch job: Seagen can create a batch job in Google Cloud Batch, using a Docker container that contains the necessary software and dependencies for their batch computing workload.
Submit the batch job: Seagen can submit the batch job to Google Cloud Batch, which will automatically provision the necessary computing resources to execute the job, based on the job requirements and user-defined settings.
Monitor the batch job: Seagen can monitor the batch job using the Google Cloud Console, which provides real-time updates on the job status, resource utilization, and other relevant metrics.
Collect the results: Seagen can collect the results of the batch job using Google Cloud Storage or other storage solutions, and analyze the results to gain insights and make informed decisions.
Overall, by using Google Cloud Batch, Seagen can manage and execute their batch computing workloads in a cost-effective and scalable way, without having to worry about managing the underlying infrastructure. With its flexible and user-friendly interface, Google Cloud Batch is an excellent choice for businesses and individuals who need to execute batch computing workloads in the cloud.
C. Features
Support for containers or scripts
Run your scripts natively on Compute Engine VM instances or bring your containerized workload that will run to completion.
Leverage Google Cloud compute
Get the latest software and hardware available as a service to use with Batch.
Job priorities and retries
Define priorities for your job and establish automated retry strategies.
Pub/Sub notifications for Batch
Configure Pub/Sub with Batch to asynchronously communicate messages to subscribers.
Integrated logging and monitoring
Retrieve stderr and stdout logs directly to Cloud Logging. Audit logs help you answer questions about who did what, where, and when. Monitor metrics related to resources used in Cloud Monitoring.
Alternate methods to use Batch
Batch APIs can be called directly via gcloud, REST APIs, client libraries, or the Cloud Console. In addition, Batch can be used with an ecosystem of workflow engines.
Identity and access management
Control the access of resources and service with IAM permissions and VPC Service Controls.
D. Where Implemented
E. How it is tested
Testing Google Cloud Batch involves ensuring that the batch jobs are executing correctly and efficiently, and that the results are being generated and stored correctly. Here are some steps to test Google Cloud Batch:
Create a test batch job: Create a test batch job that mimics the production batch job as closely as possible, including the Docker container, the input data, and the expected output.
Submit the job: Submit the test batch job to Google Cloud Batch, using the gcloud command-line tool or the Google Cloud Console. Ensure that the job is running correctly and that the resources are being provisioned as expected.
Monitor the job: Monitor the test batch job using the Google Cloud Console, which provides real-time updates on the job status, resource utilization, and other relevant metrics. Analyze the data to identify any performance issues or bottlenecks.
Collect the results: Collect the results of the test batch job using Google Cloud Storage or other storage solutions, and verify that the results are correct and match the expected output.
Test scalability: Test the scalability of Google Cloud Batch by submitting multiple test batch jobs simultaneously, and monitoring the resource utilization and job status. Use Google Cloud Batch’s auto-scaling features to scale the resources up or down based on demand.
Overall, testing Google Cloud Batch involves creating a test batch job, submitting the job, monitoring the job, collecting the results, and testing scalability. By thoroughly testing Google Cloud Batch, users can ensure that their batch computing workloads are running correctly and efficiently, and that they are only paying for the resources they use. Additionally, users can reach out to Google Cloud support for help with any technical challenges they may encounter.
F. 2023 Roadmap
????
G. 2024 Roadmap
????
H. Known Issues
While Google Cloud Batch is a powerful and flexible tool for managing batch computing workloads in the cloud, there are some known issues that users may encounter. Here are some of the known issues for Google Cloud Batch:
Job status issues: Users may encounter issues with job status, such as jobs being stuck in a pending or running state, or jobs being terminated unexpectedly. This can occur if there are issues with the underlying infrastructure or if there are configuration issues with the batch job itself.
Resource utilization issues: Users may encounter issues with resource utilization, such as over-provisioning or under-provisioning of resources. This can occur if the batch job is not configured correctly or if there are issues with the auto-scaling features.
Networking issues: Users may encounter issues with networking in Google Cloud Batch, such as issues with load balancing or issues with pod-to-pod communication. This can occur if the network configuration is incorrect or if there are issues with the underlying networking infrastructure.
Security issues: Users may encounter security issues with Google Cloud Batch, such as issues with identity and access management or data encryption. This can occur if the security configuration is incorrect or if there are issues with the underlying security infrastructure.
Performance issues: Users may encounter performance issues with Google Cloud Batch, such as issues with job execution time or latency. This can occur if the Google Cloud Batch cluster is not configured correctly or if there are issues with the underlying infrastructure.
Overall, while these issues may impact some users, Google Cloud Batch remains a powerful and flexible tool for managing batch computing workloads in the cloud. By carefully monitoring their Google Cloud Batch jobs and reviewing their usage reports and logs, users can ensure that their Google Cloud Batch resources are secure and accessible, and that they are only paying for the resources they use. Additionally, users can reach out to Google Cloud support for help with any known issues or other technical challenges they may encounter.
[x] Reviewed by Enterprise Architecture
[x] Reviewed by Application Development
[x] Reviewed by Data Architecture