Enhancing Image Search with Vector Similarity

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our

5. Consists of the weights and biases from the previous layers for the classification process to take place.

6. Outputs a probability distribution over classes.

Indexing image vectors in Elasticsearch

Once the image vectors have been obtained, the next step is to index these vectors in Elasticsearch for future searching. Elasticsearch provides a special field type, the dense_vector field, to handle the storage of these high-dimensional vectors.

A dense_vector field is defined as an array of numeric values, typically floating-point numbers, with a specified number of dimensions (dims). The maximum number of dimensions allowed for indexed vectors is currently 2,048, though this may be further increased in the future. It’s essential to note that each dense_vector field is single-valued, meaning that it is not possible to store multiple values in one such field.

In the context of image search, each image (now represented as a vector) is indexed into an Elasticsearch document. This vector can be one per document or multiple vectors per document. The vector representing the image is stored in a dense_vector field within the document. Additionally, other relevant information or metadata about the image can be stored in other fields within the same document.

The full example code can be found in the Jupyter Notebook available in the chapter 5 folder of this book’s GitHub repository at https://github.com/PacktPublishing/VectorSearch-for-Practitioners-with-Elastic/tree/main/chapter5, but we’ll discuss the relevant parts here.

First, we will initialize a pre-trained model using the SentenceTransformer library.

The clip-ViT-B-32-multilingual-v1 model is discussed in detail later in this chapter:

model = SentenceTransformer('clip-ViT-B-32-multilingual-v1')

Next, we will prepare the image transformation function:

transform = transforms.Compose([
   transforms.Resize(224),
   transforms.CenterCrop(224),
   lambda image: image.convert("RGB"),
   transforms.ToTensor(),
   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

Transforms.Compose() combines all the following transformations:

transforms.Resize(224): Resizes the shorter side of the image to 224 pixels while maintaining the aspect ratio.
transforms.CenterCrop(224): Crops the center of the image so that the resultant image has dimensions of 224x224 pixels.
lambda image: image.convert("RGB"): This is a transformation that converts the image to the RGB format. This is useful for grayscale images or images with an alpha channel, as deep learning models typically expect RGB inputs.
transforms.ToTensor(): Converts the image (in the PIL image format) into a PyTorch tensor. This will change the data from a range of [0, 255] in the PIL image format to a float in a range [0.0, 1.0].
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)): Normalizes the tensor image with a given mean and standard deviation for each channel. In this case, the mean and standard deviation for all three channels (R, G, B) are 0.5. This normalization will transform the data range from [0.0, 1.0] to [-1.0, 1.0].

We can use the following code to apply the transform to an image file and then generate an image vector using the model. See the Python notebook for this chapter to run against actual image files:

from PIL import Image
img = Image.open("image_file.jpg")
image = transform(img).unsqueeze(0)
image_vector = model.encode(image)

The vector and other associated data can then be indexed into Elasticsearch for use with kNN search:

# Create document
   document = {'_index': index_name,
               '_source': {"filename": filename,
                           "image_vector": vector

See the complete code in the chapter 5 folder of this book’s GitHub repository.

With vectors generated and indexed into Elasticsearch, we can move on to searching for similar images.

k-Nearest Neighbor (kNN) search

With the vectors now indexed in Elasticsearch, the next step is to make use of kNN search. You can refer back to Chapter 2, Getting Started with Vector Search in Elastic, for a full discussion on kNN and HNSW search.

As with text-based vector search, when performing vector search with images, we first need to convert our query image to a vector. The process is the same as we used to convert images to vectors at index time.

We convert the image to a vector and include that vector in the query_vector parameter of the knn search function:

knn = {
   "field": "image_vector",
   "query_vector": search_image_vector[0],
   "k": 1,
   "num_candidates": 10
 }

Here, we specify the following:

field: The field in the index that contains vector representations of images we are searching against
query_vector: The vector representation of our query image
k: We want only one closest image
num_candidates: The number of approximate nearest neighbor candidates on each shard to search against

With an understanding of how to convert an image to a vector representation and perform an approximate nearest neighbor search, let’s discuss some of the challenges.

Challenges and limitations with image search

While vector search with images offers powerful capabilities for image retrieval, it also comes with certain challenges and limitations. One of the main challenges is the high dimensionality of image vectors, which can lead to computational inefficiencies and difficulties in visualizing and interpreting the data.

Additionally, while pre-trained models for feature extraction can capture a wide range of features, they may not always align with the specific features that are relevant to a particular use case. This can lead to suboptimal search results. One potential solution, not limited to image search, is to use transfer learning to fine-tune the feature extraction model on a specific task, although this requires additional data and computational resources.

Conclusion

In conclusion, vector similarity search revolutionizes image retrieval by harnessing advanced algorithms and machine learning. From e-commerce to digital forensics, its impact is profound, enhancing user experiences and content discovery. Leveraging techniques like k-Nearest Neighbor search and Elasticsearch's dense vector field, image search becomes more efficient and scalable. Despite challenges, such as high dimensionality and feature alignment, ongoing advancements promise even greater insights into visual data. As technology evolves, so does our ability to navigate and understand the vast landscape of images, ensuring a future of enhanced digital interactions and insights.

Author Bio

Bahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.

Jeff Vestal has a rich background spanning over a decade in financial trading firms and extensive experience with Elasticsearch. He offers a unique blend of operational acumen, engineering skills, and machine learning expertise. As a Principal Customer Enterprise Architect, he excels at crafting innovative solutions, leveraging Elasticsearch's advanced search capabilities, machine learning features, and generative AI integrations, adeptly guiding users to transform complex data challenges into actionable insights.