Amazon Kendra supports a variety of document formats, such as Microsoft Word, PDF, and text from various data sources. In this post, we focus on extending the document support in Amazon Kendra to make images searchable by their displayed content. Images can often be searched using supplemented metadata such as keywords. However, it takes a lot of manual effort to add detailed metadata to potentially thousands of images. Generative AI can be helpful in generating the metadata automatically. By generating textual captions, the Generative AI caption predictions offer descriptive metadata for images. The Amazon Kendra index can then be enriched with the generated metadata during document ingestion to enable searching the images without any manual effort.
In this post, we show how to use CDE in Amazon Kendra using a Generative AI model deployed on Amazon SageMaker. We demonstrate CDE using simple examples and provide a step-by-step guide for you to experience CDE in an Amazon Kendra index in your own AWS account. It allows users to quickly and easily find the images they need without having to manually tag or categorize them. This solution can also be customized and scaled to meet the needs of different applications and industries.
Generative AI models learn to recognize objects and features within the images, and then generate descriptions of those objects and features in natural language. The state-of-the-art models use an encoder-decoder architecture, where the image information is encoded in the intermediate layers of the neural network and decoded into textual descriptions. These can be considered as two distinct stages: feature extraction from images and textual caption generation. In the feature extraction stage (encoder), the Generative AI model processes the image to extract relevant visual features, such as object shapes, colors, and textures. In the caption generation stage (decoder), the model generates a natural language description of the image based on the extracted visual features.
An example of a customized image search is Enterprise Resource Planning (ERP). In ERP, image data collected from different stages of logistics or supply chain management could include tax receipts, vendor orders, payslips, and more, which need to be automatically categorized for the purview of different teams within the organization. Another example is to use medical scans and doctor diagnoses to predict new medical images for automatic classification. The vision model extracts features from the MRI, CT, or X-ray images and the text model captions it with the medical diagnoses.
We ingest images from Amazon Simple Storage Service (Amazon S3) into Amazon Kendra. During ingestion to Amazon Kendra, the Generative AI model hosted on SageMaker is invoked to generate an image description. Additionally, text visible in an image is extracted by Amazon Textract. The image description and the extracted text are stored as metadata and made available to the Amazon Kendra search index. After ingestion, images can be searched via the Amazon Kendra search console, API, or SDK.
In this post, we saw how Amazon Kendra and Generative AI can be combined to automate the creation of meaningful metadata for images. State-of-the-art Generative AI models are extremely useful for generating text captions describing the content of an image. This has several industry use cases, ranging from healthcare and life sciences, retail and ecommerce, digital asset platforms, and media. Image captioning is also crucial for building a more inclusive digital world and redesigning the internet, metaverse, and immersive technologies to cater to the needs of visually challenged sections of society.
Image search enabled through captions enables digital content to be easily searchable without manual effort for these applications, and removes duplication efforts. The CloudFormation template we provided makes it straightforward to deploy this solution to enable image search using Amazon Kendra. A simple architecture of images stored in Amazon S3 and Generative AI to create textual descriptions of the images can be used with CDE in Amazon Kendra to power this solution.
In recent times, biometrics is the best alternative for the token-based and knowledge-based security systems. Out of the existing biometric modalities, the vascular biometric modalities are preferred for authenticating the person, because of its uniqueness among all individuals. This paper proposes a multimodal biometric system using vascular patterns of the hand such as finger vein and palm vein images. Initially, the input palm vein and finger vein images are pre-processed so as to make them suitable for further processing. Subsequently, the features from palm and finger vein images are extracted using a modified two-dimensional Gabor filter and a gradient-based techniques. These extracted features are matched using the Euclidean distance metric, and they are fused at the score level using fuzzy logic technique. The proposed technique is tested on the standard databases of finger vein and palm vein images. This method provides lower false acceptance rate, false rejection rate and high accuracy of 99.5% when compared with the existing techniques, indicating the effectiveness of the proposed system.
Abstract: Identification of retinal diseases is a very significant area of ophthalmology. Regular procedures are extremely specific, which rely on manual observation and highly prone to error. Hence, it is extremely fundamental to set up an automatic system for screening of vision threatening diseases like diabetic retinopathy (DR) and diabetic maculopathy (DM). Patients who are suffering from DR are at high risk to have DM which may lead to blindness, if not detected and treated appropriately at the appropriate time. Automatic analysis of retinal images requires knowledge and the properties of anatomical structures and retinal lesions. Thus, locating fovea plays a vital role in the analysis of retinal images. In recent times image processing has become a very effective tool for the detection and analysis of abnormalities in retinal images. This survey paper depicts the fundamental terminology related to automatic detection of macula and fovea. Literature review of various methods used for finding fovea in retinal fundus images is discussed. Detection issues involved in fovea are also discussed in this paper.
Melanoma is one of the fatal type skin cancers. The mortality rate is more severe than the present skin related diseases. Humans are prone to various diseases and are easily infected by them due to their adaptive and constrained lifestyle and melanoma is not an exception. Melanoma spreads faster and is less responsive to treatment in its later stages. Thus, harnessing the disease rate becomes more enigmatic and initial diagnosis is the need. Melanoma and nevus are akin and show analogous symptoms and traits. In order to overcome this, we use the technique of image processing to segregate Melanoma and nevus. The input image is processed in such way that the noise in the image (skin lesion) is removed using median filter and is segmented using improved K-means clustering. From the lesion necessary textural and chromatic features are extracted and a unique feature vector is formulated. Melanoma and Nevus are segregated using both Adaptive Neuro-Fuzzy inference System (ANFIS) and Feed-Forward Neural Network (FFNN). The skin images from DERMIS dataset is used in this work and it has 1023 skin images including 104 melanoma and 917 nevus images. Our proposed methodology provides efficient results having 97.3% and 96.8% accuracy for ANFIS and FFNN classifiers.
We are pleased to release (for free download) poster size images of Jagadguru Shankaracharya Sri Sri Bharati Tirtha Mahasannidhanam and Jagadguru Shankaracharya Sri Sri Vidhushekhara Bharati Sannidhanam.
A meme is an part of media created to share an opinion or emotion across the internet. Due to its popularity, memes have become the new forms of communication on social media. However, due to its nature, they are being used in harmful ways such as trolling and cyberbullying progressively. Various data modelling methods create different possibilities in feature extraction and turning them into beneficial information. The variety of modalities included in data plays a significant part in predicting the results. We try to explore the significance of visual features of images in classifying memes. Memes are a blend of both image and text, where the text is embedded into the image. We try to incorporate the memes as troll and non-trolling memes based on the images and the text on them. However, the images are to be analysed and combined with the text to increase performance. Our work illustrates different textual analysis methods and contrasting multimodal methods ranging from simple merging to cross attention to utilising both worlds' - best visual and textual features. The fine-tuned cross-lingual language model, XLM, performed the best in textual analysis, and the multimodal transformer performs the best in multimodal analysis.
Medical image science comprises techniques and processes intended to create images of a person for medical purposes. It reaches across disciplines such as radiology, endoscopy, microscopy, image processing and visualization.
Real Estate Image Tagging is one of the essential use-cases to both enrich the property information and enhance the consumer experience. An Image Tagger not only automates the process of validating listing images but also organizes the images for effective listing representation.
In an effort to enhance the realtor.com consumer experience using a photo-centric approach, we trained and developed an in-house listing image tagger that could tag the listing images into 16 different categories. As the first iteration of advanced image tagging initiative, this extended image tagger uses a state-of-the-art transfer learning technique for the purpose of multi-class image classification.
760c119bf3