Photo Retrieval

0 views

Skip to first unread message

Lucretia

unread,

Aug 5, 2024, 2:20:01 AM8/5/24

to riasticesstam

Welcometo the HP Forums, this is a great location to get assistance! I read your post and see that you are unable to retrieve the scanned photos in HP Smart app. I would like to help you resolve this issue.

An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.

Image search is a specialized data search used to find images. To search for images, a user may provide query terms such as keyword, image file/link, or click on some image, and the system will return images "similar" to the query. The similarity used for search criteria could be meta tags, color distribution in images, region/shape attributes, etc.

It is crucial to understand the scope and nature of image data in order to determine the complexity of image search system design. The design is also largely influenced by factors such as the diversity of user-base and expected user traffic for a search system. Along this dimension, search data can be classified into the following categories:

I think the first method is more organized. But i think the second method is standard(keeping all uploads in the same dir), but i wonder if it would be slower when retrieving an image if there are thousands of image in the same directory

so many different ways to do this.Yes disk space will be a problem. but for now i am concerned with retrieval time. When i have to output an image to the browser, if that image is in a directory with 10,000 other images, i am worried on how slow that could get.

The number of files in a directory should have no effect at all on the time required to read a file's data - but it can massively affect the amount of time needed to find the file before you can start to read it.

The exact breakpoints where the major issues start up will vary from filesystem type to filesystem type, but, in general, if you're talking about a few hundred files, you don't much need to worry about it. If you're talking about a few thousand, it's worth thinking about and maybe doing a little benchmarking to see how your filesystem and hardware handle it. If you're talking about tens of thousands of files, then you really need to start breaking things up. (I once had a Linux/e2fs print server where CUPS wasn't deleting its job control files after it finished printing and it got up around 100,000 files in one directory. Just getting a directory listing took over half an hour before it even started to display any filenames.)

Separating them by user name may not be the best choice, though, since you'll likely have a lot of users uploading very few images and perhaps a couple who upload hundreds or thousands of images, potentially creating access time issues in those users' storage directories. The bigger problem in that scenario is that you'd likely end up (assuming a successful site) with thousands or tens of thousands of users and a large number of subdirectories is just as bad as a large number of files for slowing down access to your data.

Since you're going to have a timestamp on them, what I would probably do is put them into subdirectories based on the last three digits of the timestamp. That will distribute the files relatively evenly across 1000 subdirectories and should keep the number of files in each directory reasonably small. (Using the first three digits would cause one directory to be filled before moving to the next instead of distributing them evenly.) If you're still ending up with too many files in each subdirectory (which would likely mean you're dealing with several million uploaded images), you could add a second level for the previous three digits, so upload-1234567890.jpg would end up at /567/890/upload-1234567890.jpg.

The answer to that is "maybe". It's possible the file retrieval may be fine, but if you need to do any maintenance on the folder, it would be a huge headache as processes attempt ot enumerate the directory listings.

What would improve the situation would be a number of sub directories under the images folder (or two levels, depending on how many images you're looking at storing), so you have a hierarchy like this:

...and then store files based on their first letter (so all images with names starting 'a' go into the folder 'a'). You could have this as a two or three letter suffix (aa, ab, ac, ad ..., ba, bb, bc ..., zx, zy, zz) and possibly have a hierarchy under that as well so you split files across a number of folders dependent on the first four characters of the name.

You might want to consider a mix of your option (1) and splitting images over a hierarchy as I've described above. That would ensure that if a single user does upload lots of files, then you're covered. Similarly, if you're looking at a lot of user directories, the same principle applies to ensure you don't have 1,000,000 user directories under a single parent.

you really don't want to have folders and folders full of files. Managing these folders takes forever, and changing the naming/dividing scheme later is a nightmare. Furthermore, if you run out of diskspace you have a problem. Also for load balancing, having one harddisk full with files is not efficient

It depends on the file system. For example, FAT16 tends to be quite slow if you have more than 512 files in a directory. FAT32 and NTFS do not have the same limitations but also run much more slowly if you have an extremely large amount of files. Even if you're running one of the more robust Linux file systems, you're still going to be able to parse directories more quickly if they're smaller.

Depending on the host OS, having too many files in one directory could cause some headaches and compatibility problems. Also, depending on how you are getting the image list, it could cause performance issues.

I wonder if I can retrieve deleted photos on Canon Shutter Count? My phone together with the SD Card was stolen last year and all the photos I have taken from the DSLR are all stored in it. Can anyone help me retrieve these important memories I had for my late Grandma? I miss her so much and that's the only memory I had from her.

Hope to receive help.

If you just explain a bit more we might be able to help. If you are looking to recover images from a DSLR apart from the SD Card, that is not possible. DSLR's have no internal storage, and no "memory" of previous images taken.

I know this is advice too late to help on this significant loss but never, never use your camera as a storage device which means the SD. Always u/l them to your computer or laptop as soon as possible.

As a life long photographer I securely kept photos of clients for six months guaranteed. However some have been on backup for six years or more. HD space got real cheap! A while ago a bride form a wedding I shot for her came to me with an unfortunate happening in her family's life. Their house burnt down and of course they lost all the family photos. She asked if I possibly still had her wedding. I did although it was six years ago. Almost made both of us cry.

Thanks for your response @ebiggs1, unfortunately, I don't get the chance to save them in the cloud or in any of my social media accounts. I fully understand your comment. Wish I had saved these memories before my phone was stolen, but it's too late now. Can't find any way to retrieve them back.

Multimodal embedding is the process of generating a numerical representation of an image that captures its features and characteristics in a vector format. These vectors encode the content and context of an image in a way that is compatible with text search over the same vector space.

Image retrieval systems have traditionally used features extracted from the images, such as content labels, tags, and image descriptors, to compare images and rank them by similarity. However, vector similarity search is gaining more popularity due to a number of benefits over traditional keyword-based search and is becoming a vital component in popular content search services.

Keyword search is the most basic and traditional method of information retrieval. In that approach, the search engine looks for the exact match of the keywords or phrases entered by the user in the search query and compares it with the labels and tags provided for the images. The search engine then returns images that contain those exact keywords as content tags and image labels. Keyword search relies heavily on the user's ability to use relevant and specific search terms.

Vector search searches large collections of vectors in high-dimensional space to find vectors that are similar to a given query. Vector search looks for semantic similarities by capturing the context and meaning of the search query. This approach is often more efficient than traditional image retrieval techniques, as it can reduce search space and improve the accuracy of the results.

Each dimension of the vector corresponds to a different feature or attribute of the content, such as its semantic meaning, syntactic role, or context in which it commonly appears. In Azure AI Vision, image and text vector embeddings have 1024 dimensions.

Vector embeddings can only be compared and matched if they're from the same model type. Images vectorized by one model won't be searchable through a different model. The latest Image Analysis API offers two models, version 2023-04-15 which supports text search in many languages, and the legacy 2022-04-11 model which supports only English.

Vectorize Images and Text: the Multimodal embeddings APIs, VectorizeImage and VectorizeText, can be used to extract feature vectors out of an image or text respectively. The APIs return a single feature vector representing the entire input.