Azure Data Catalog

  • Author: Ronald Fung

  • Creation Date: 1 June 2023

  • Next Modified Date: 1 June 2024


A. Introduction

Important

New Azure Data Catalog accounts can no longer be created.

For data catalog features, please use the Microsoft Purview service, which offers unified data governance for your entire data estate.

If you are already using Azure Data Catalog, you’ll need to create a migration plan for your organization to move to Microsoft Purview by August 2025.

Azure Data Catalog is a fully managed cloud service that lets users discover the data sources they need and understand the data sources they find. At the same time, Data Catalog helps organizations get more value from their existing investments.

With Data Catalog, any user (analyst, data scientist, or developer) can discover, understand, and consume data sources in their data landscape. Data Catalog includes a crowdsourcing model of metadata and annotations, so everyone can contribute to making data discoverable and useable. It’s a single, central place for all of an organization’s users to contribute their knowledge and build a community and culture of data.


B. How is it used at Seagen

As a biopharma research company using Microsoft Azure, you can use Azure Data Catalog to discover and manage your data assets, making it easier for your team to find and use the right data. Here are some ways you can use Azure Data Catalog:

  1. Discover data assets: Azure Data Catalog allows you to discover your data assets by registering them in a central catalog, making it easier for your team to find and use the data they need.

  2. Collaborate: Azure Data Catalog enables your team to collaborate by sharing information about data assets, such as descriptions, tags, and annotations, and allowing users to rate and review data assets.

  3. Govern data: Azure Data Catalog helps you govern your data by providing a central location for registering data assets, ensuring that data is properly classified and labeled, and enabling you to track usage and access.

  4. Enhance data quality: Azure Data Catalog can help you enhance the quality of your data by enabling users to provide feedback and suggestions for improving data assets, such as correcting errors or updating metadata.

  5. Integration with Azure services: Azure Data Catalog integrates with other Azure services, such as Azure Data Factory and Azure Databricks, allowing you to easily discover and use data assets in your Azure environment.

  6. Improved productivity: Azure Data Catalog can improve productivity by reducing the time and effort required to find and use data assets, allowing your team to focus on more important tasks.

  7. Security: Azure Data Catalog provides built-in security features, such as role-based access control and integration with Azure Active Directory, ensuring that your data assets are properly secured and protected.

Overall, Azure Data Catalog provides a powerful and flexible tool for discovering and managing data assets, making it easier for your team to find and use the right data. By leveraging the security, scalability, and performance of the service, you can discover and govern your data assets more effectively, enhance data quality, and improve productivity for your biopharma research team.


C. Features

Azure Data Catalog is a cloud-based service that provides a central location for registering, discovering, and managing data assets across your organization. Here are some key features of Azure Data Catalog:

  1. Discover data assets: Azure Data Catalog allows you to discover your data assets by registering them in a central catalog, making it easier for your team to find and use the data they need.

  2. Collaboration: Azure Data Catalog enables your team to collaborate by sharing information about data assets, such as descriptions, tags, and annotations, and allowing users to rate and review data assets.

  3. Governance: Azure Data Catalog helps you govern your data by providing a central location for registering data assets, ensuring that data is properly classified and labeled, and enabling you to track usage and access.

  4. Integration: Azure Data Catalog integrates with other Azure services, such as Azure Data Factory and Azure Databricks, allowing you to easily discover and use data assets in your Azure environment.

  5. Enhanced productivity: Azure Data Catalog can improve productivity by reducing the time and effort required to find and use data assets, allowing your team to focus on more important tasks.

  6. Security: Azure Data Catalog provides built-in security features, such as role-based access control and integration with Azure Active Directory, ensuring that your data assets are properly secured and protected.

  7. Advanced search: Azure Data Catalog provides advanced search capabilities, allowing you to search for data assets based on keywords, tags, descriptions, and other metadata.

  8. Data lineage: Azure Data Catalog provides data lineage tracking, allowing you to track the origin and flow of data assets across your organization.

  9. Customizable metadata: Azure Data Catalog allows you to customize metadata for your data assets, enabling you to add custom attributes and classifications that are specific to your organization.

Overall, Azure Data Catalog provides a powerful and flexible tool for discovering and managing data assets, making it easier for your team to find and use the right data. By leveraging the security, scalability, and performance of the service, you can discover and govern your data assets more effectively, enhance data quality, and improve productivity for your organization.


D. Where Implemented

LeanIX


E. How it is tested

Testing Azure Data Catalog involves verifying that the service is properly configured and that data assets can be discovered and managed effectively. Here are some steps you can take to test Azure Data Catalog:

  1. Verify configuration: Verify that Azure Data Catalog is properly configured and integrated with your Azure account and resources.

  2. Test data discovery: Test Azure Data Catalog by registering data assets, such as databases, tables, and files, and verifying that they can be discovered and searched for effectively.

  3. Test collaboration: Test the collaboration capabilities of Azure Data Catalog by sharing information about data assets, such as descriptions, tags, and annotations, and verifying that users can rate and review data assets.

  4. Test governance: Test the governance capabilities of Azure Data Catalog by ensuring that data is properly classified and labeled, and that usage and access is tracked effectively.

  5. Test integration: Test the integration capabilities of Azure Data Catalog by integrating it with other Azure services, such as Azure Data Factory and Azure Databricks, and verifying that data assets can be discovered and used effectively.

  6. Test productivity: Test the productivity benefits of Azure Data Catalog by verifying that the service reduces the time and effort required to find and use data assets.

  7. Test security: Test the security capabilities of Azure Data Catalog by ensuring that data assets are properly secured and protected, and that access is controlled through role-based access control and integration with Azure Active Directory.

Overall, testing Azure Data Catalog involves verifying that the service is properly configured and that data assets can be discovered and managed effectively. By testing Azure Data Catalog, you can ensure that you are effectively using the service to manage your data assets, and that you are benefiting from the security, scalability, and performance it provides.


F. 2023 Roadmap

????


G. 2024 Roadmap

????


H. Known Issues

As with any software or service, there may be known issues or limitations that users should be aware of when using Azure Data Catalog. Here are some of the known issues for Azure Data Catalog:

  1. Limited customization: Azure Data Catalog has limited customization options, which can limit the ability of users to configure the service to their specific needs.

  2. Limited durability: Azure Data Catalog does not provide persistent storage options, which can limit the ability of users to store and manage data across multiple catalogs.

  3. Limited integration: Azure Data Catalog has limited integration with third-party tools and services, which can limit the ability of users to incorporate it into their existing workflows.

  4. Limited monitoring and logging: Azure Data Catalog has limited monitoring and logging capabilities, which can limit the ability of users to monitor and troubleshoot their catalog.

  5. Cost: Azure Data Catalog can be expensive for users with limited budgets, particularly if they manage large volumes of data or use the service frequently.

  6. Security and compliance concerns: Users must ensure that they are properly securing and protecting their data when using Azure Data Catalog, particularly when managing data with sensitive data or data subject to regulatory compliance requirements.

Overall, while Azure Data Catalog offers a powerful and flexible tool for discovering and managing data assets, users must be aware of these known issues and take steps to mitigate their impact. This may include carefully configuring the service to meet the specific needs of their data, carefully monitoring the performance and cost of the service to ensure that it is a good fit for their data requirements, and carefully integrating the service into their existing workflows to ensure that it is effectively utilized. By taking these steps, users can ensure that they are effectively using Azure Data Catalog to manage their data assets, and that they are benefiting from the security, scalability, and performance it provides.


[x] Reviewed by Enterprise Architecture

[x] Reviewed by Application Development

[x] Reviewed by Data Architecture