Mente

llm.htm

Notes on LLMs

Key Terms:

Embeddings

Embeddings are ways of storing similar words together as low-level numbers. It uses a pre-trained neural network to process some text and then output an array of numbers e.g., [-0.5,1.0] etc...

Similar words are closer in the matrix space.

We normally store these embeddings into a vector DB. An example would be:

CREATE TABLE embeddings AS (
    text string,
    array json_agg[int]
)

To query this table, we can get embeddings from our search word and take the dot product of our embeddings and the embeddings column in that table. Then we just get the top K values.

Embeddings were popularized by Google in 2013 with statements such as “king - man + woman = queen.” The gist of it, as you may know, is that we can express words as vectors that encode their semantics in a meaningful way.

Some embedding models are: skip-gram and bag of words.

Retrievers

It's job is to find relevant documents or pieces of information that can help answer a query. It takes the input query and searches a DB to retrieve info that might be useful to generate the response

Types:

Storing embeddings

To store embeddings you can use Postgres with pgvector vs. more advanced Vector-DBs. These DBs are queried with dot-products between your search term and the embedding space

E.g.,

CREATE TABLE documents (  
  id SERIAL PRIMARY KEY,
  document bytea
  ...
)
CREATE TABLE embeddings (
  id SERIAL PRIMARY KEY, 
  document_id INT NOT NULL, 
  chunk VARCHAR NOT NULL, 
  embeddings vector(384), 
  ...
);

Lang Chain

RAG

What it is:

Tools:

How to use RAG?

RAG Stack

Optimizing RAGs

Chunk sizes:

Metadata filtering: