Sam Partee on Retrieval Augmented Generation (RAG)

09 Feb 2024 (1 year ago)

Redis Vector Database

Sam Party, a principal applied AI engineer at Redis, discussed the integration of Redis' Vector database offering with various frameworks and customer use cases at the QCon San Francisco Conference.
Redis is particularly suitable for use cases that require real-time processing, such as long-term memory for large language models and semantic caching.
Redis provides two algorithms for vector search: K-nearest neighbors (KNN) brute force search and hierarchical navigable small world (HNSW) approximate nearest neighbors search.
Reddis supports both hashes and JSON documents for storing data.
Vector searches in Reddis can be either plain vector searches or range queries.
Hybrid searches, or filtered searches, combine vector search with other types of search features like text search, tag filters, geographic search, and polygon search.

There are two main approaches to representing documents in vector space: using an LLM to summarize the entire document, or splitting the document into sentences and using vector search to find the relevant sentence and its surrounding context.
The speaker advocates for trying various techniques, including traditional machine learning methods, to find the best approach for semantic search.
Using sentence-by-sentence embeddings with large language models (LLMs) may not provide enough uniqueness for every sentence, especially if the query contains a lot of semantic information.
The speaker suggests using LLMs to create paper summaries, as they can pack more information than random sections of a paper.
A technique called "hypothetical document embeddings" (Hyde) is discussed, which involves using a hallucinated answer from an LLM to search for the right context or answer in a database.
The speaker emphasizes the effectiveness of using generated reviews to search for hotel reviews, as it returns more relevant results compared to searching for specific features or amenities.

On-premise AI training has a high barrier to adoption compared to using APIs like OpenAI's.
The barrier to entry for using Triton's HPS API is much higher compared to OpenAI's API.
Acquiring data center GPUs, particularly those with CUDA capabilities, is challenging due to high demand.
AMD chips are an alternative to CUDA-enabled GPUs, but CUDA still dominates the AI industry.
Cloud platforms like Google Cloud, Lambda, and Hugging Face also face GPU shortages.