Azure Cognitive Search
Author: Ronald Fung
Creation Date: May 12, 2023
Next Modified Date: May 12, 2024
A. Introduction
Azure Cognitive Search (formerly known as “Azure Search”) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you’ll work with the following capabilities:
A search engine for full text search over a search index containing user-owned content Rich indexing, with lexical analysis and optional AI enrichment for content extraction and transformation Rich query syntax for text search, fuzzy search, autocomplete, geo-search and more Programmability through REST APIs and client libraries in Azure SDKs Azure integration at the data layer, machine learning layer, and AI (Cognitive Services)
B. How is it used at Seagen
As a biopharma research company, Azure Cognitive Search can help you quickly and easily search and analyze large amounts of data, including structured and unstructured data. Here are some ways you can use Azure Cognitive Search:
Search and analyze data: Azure Cognitive Search can be used to search and analyze structured and unstructured data such as medical research articles, clinical trials, patents, and other scientific data. You can use the search capabilities to quickly find relevant information and insights.
Natural language processing: Azure Cognitive Search includes natural language processing capabilities that can help you extract insights from unstructured data such as research papers, clinical notes, and other documents. This can help you identify patterns and trends that may not be immediately apparent.
Customizable search experience: Azure Cognitive Search allows you to customize the search experience to meet your needs. You can configure search rules, scoring profiles, and filters to ensure that you are getting the most relevant results.
Security and compliance: Azure Cognitive Search includes built-in security and compliance features that can help you ensure that your data is protected. You can use Azure Active Directory to manage access to search indexes and use Azure Key Vault to securely store your search keys.
Integration with other systems: Azure Cognitive Search can be integrated with other systems and applications used in your research process. You can use Azure Functions or Azure Logic Apps to trigger actions or exchange data between different systems, such as ingesting new data into the search index.
C. Features
Architecturally, a search service sits between the external data stores that contain your un-indexed data, and your client app that sends query requests to a search index and handles the response.
In your client app, the search experience is defined using APIs from Azure Cognitive Search, and can include relevance tuning, semantic ranking, autocomplete, synonym matching, fuzzy matching, pattern matching, filter, and sort.
Across the Azure platform, Cognitive Search can integrate with other Azure services in the form of indexers that automate data ingestion/retrieval from Azure data sources, and skillsets that incorporate consumable AI from Cognitive Services, such as image and natural language processing, or custom AI that you create in Azure Machine Learning or wrap inside Azure Functions.
Inside a search service
On the search service itself, the two primary workloads are indexing and querying.
Indexing is an intake process that loads content into your search service and makes it searchable. Internally, inbound text is processed into tokens and stored in inverted indexes for fast scans. You can upload JSON documents, or use an indexer to serialize your data into JSON.
AI enrichment through cognitive skills is an extension of indexing. If your content needs image or language analysis before it can be indexed, AI enrichment can extract text embedded in application files, translate text, and also infer text and structure from non-text files by analyzing the content.
Querying can happen once an index is populated with searchable text, when your client app sends query requests to a search service and handles responses. All query execution is over a search index that you control.
Semantic search is an extension of query execution. It adds language understanding to search results processing, promoting the most semantically relevant results to the top.
Why use Cognitive Search?
Azure Cognitive Search is well suited for the following application scenarios:
Consolidate heterogeneous content into a private, user-defined search index.
Offload indexing and query workloads onto a dedicated search service.
= Easily implement search-related features: relevance tuning, faceted navigation, filters (including geo-spatial search), synonym mapping, and autocomplete.
Transform large undifferentiated text or image files, or application files stored in Azure Blob Storage or Azure Cosmos DB, into searchable chunks. This is achieved during indexing through cognitive skills that add external processing.
Add linguistic or custom text analysis. If you have non-English content, Azure Cognitive Search supports both Lucene analyzers and Microsoft’s natural language processors. You can also configure analyzers to achieve specialized processing of raw content, such as filtering out diacritics, or recognizing and preserving patterns in strings.
For more information about specific functionality, see Features of Azure Cognitive Search
D. Where implemented
E. How it is tested
Testing Azure Cognitive Search involves ensuring that the search service is functioning correctly, securely, and meeting the needs of all stakeholders involved in the project. Here are some steps to follow to test Azure Cognitive Search:
Define the scope and requirements: Define the scope of the project and the requirements of all stakeholders involved in the project. This will help ensure that Azure Cognitive Search is designed to meet the needs of all stakeholders.
Develop test cases: Develop test cases that cover all aspects of Azure Cognitive Search functionality, including indexing, querying, management, and security. The test cases should be designed to meet the needs of the organization, including scalability and resilience.
Conduct unit testing: Test the individual components of Azure Cognitive Search to ensure that they are functioning correctly. This may involve using tools like Postman for automated testing.
Conduct integration testing: Test Azure Cognitive Search in an integrated environment to ensure that it works correctly with other systems and applications. This may involve testing Azure Cognitive Search with different operating systems, browsers, and devices.
Conduct user acceptance testing: Test Azure Cognitive Search with end-users to ensure that it meets their needs and is easy to use. This may involve conducting surveys, interviews, or focus groups to gather feedback from users.
Automate testing: Automate testing of Azure Cognitive Search to ensure that it is functioning correctly and meeting the needs of all stakeholders. This may involve using tools like Azure DevOps to set up automated testing pipelines.
Monitor performance: Monitor the performance of Azure Cognitive Search in production to ensure that it is meeting the needs of all stakeholders. This may involve setting up monitoring tools, such as Azure Monitor, to track usage and identify performance issues.
Address issues: Address any issues that are identified during testing and make necessary changes to ensure that Azure Cognitive Search is functioning correctly and meeting the needs of all stakeholders.
By following these steps, you can ensure that Azure Cognitive Search is tested thoroughly and meets the needs of all stakeholders involved in the project. This can help improve the quality of Azure Cognitive Search and ensure that it functions correctly in a production environment.
F. 2023 Roadmap
May 2023
Support for up to 30 shards for clustered Azure Cache for Redis instances
Azure Cache for Redis now supports clustered caches with upto 30 shards which means your applications can store more data and scale better with your workloads.
For more information, see Configure clustering for Azure Cache for Redis instance.
March 2023
In-place scale up and scale out for the Enterprise tiers (preview)
The Enterprise and Enterprise Flash tiers now support the ability to scale cache instances up and out without requiring downtime or data loss. Scale up and scale out actions can both occur in the same operation.
For more information, see Scale an Azure Cache for Redis instance
Support for RedisJSON in active geo-replicated caches (preview)
Cache instances using active geo-replication now support the RedisJSON module.
For more information, see Configure active geo-replication.
Flush operation for active geo-replicated caches (preview)
Caches using active geo-replication now include a built-in flush operation that can be initiated at the control plane level. Use the flush operation with your cache instead of the FLUSH ALL and FLUSH DB operations, which are blocked by design for active geo-replicated caches.
For more information, see Flush operation
Customer managed key (CMK) disk encryption (preview)
Redis data that is saved on disk can now be encrypted using customer managed keys (CMK) in the Enterprise and Enterprise Flash tiers. Using CMK adds another layer of control to the default disk encryption.
For more information, see Enable disk encryption
Connection event audit logs (preview)
Enterprise and Enterprise Flash tier caches can now log all connection, disconnection, and authentication events through diagnostic settings. Logging this information helps in security audits. You can also monitor who has access to your cache resource.
For more information, see Enabling connection audit logs
G. 2024 Roadmap
????
H. Known Issues
There are several known issues that can impact Azure Cognitive Search. Here are some of the most common issues to be aware of:
Configuration issues: Configuration issues can arise when setting up Azure Cognitive Search. It is important to ensure that all configurations are set up correctly to avoid issues with deployment, management, and security of the solution.
Performance issues: If the search service is not properly sized, it can impact performance and availability, causing issues with the speed and reliability of Azure Cognitive Search.
Integration issues: Integration issues can arise when integrating Azure Cognitive Search with other systems and applications. It is important to ensure that Azure Cognitive Search is designed to work seamlessly with other systems and applications to avoid integration issues.
Security issues: Security is a critical concern when it comes to Azure Cognitive Search. It is important to ensure that Azure Cognitive Search is secured and that access to the solution is restricted to authorized personnel.
Accuracy issues: In some cases, Azure Cognitive Search may not be accurate or may not apply to a specific use case. It is important to review Azure Cognitive Search carefully and validate it before taking action.
Compatibility issues: Azure Cognitive Search may not be compatible with all platforms, devices, or languages. It is important to ensure that Azure Cognitive Search is compatible with the organization’s existing infrastructure before implementation.
Testing issues: Testing issues can arise when testing Azure Cognitive Search. It is important to ensure that testing is carried out thoroughly and that all aspects of Azure Cognitive Search functionality are tested.
Overall, Azure Cognitive Search requires careful planning and management to ensure that it is functioning correctly and meeting the needs of all stakeholders involved in the project. By being aware of these known issues and taking steps to address them, you can improve the quality of Azure Cognitive Search and ensure the success of your project.
[x] Reviewed by Enterprise Architecture
[x] Reviewed by Application Development
[x] Reviewed by Data Architecture