Last Sunday, I demonstrated how trivial it is to run an LLM at home (here) and mentioned that we can do better. Indeed, one problem with LLMs (large language models) is that they may not have been trained on niche information and, by nature, will not know anything after their training cutoff date. For instance, the Gemma2, which I used in my demonstration, has a cutoff date of September 2023. In the scope of my hobby, the former problem prevails. As a pocket computer/programmable calculator aficionado, I am often despaired by the lack of knowledge these models encode.
For instance, let’s ask Gemma2 what it can tell us about the PC-850V. So that you know, it is a pocket computer built by SHARP and is one of the latest models produced. One of the reasons I am interested in this system is its capability to be programmed in C language (without using a cross-development environment). The following picture shows Gemma2’s answer. It has it wrong and tells us it is a NEC home computer. As I said, welcome to my hobbyist life.

To mitigate these problems with LLMs, one can use the RAG technique. RAG stands for Retrieval-Augmented Generation, a method used in natural language processing (a branch of AI). This is how it works. New or relevant information if extracted from some source (text, DB, etc.). This is the retrieval. Then, this information is used to augment the input we will use with our LLM. Finally, the LLM generates answers by combining its intrinsic knowledge with the retrieved one. It’s like if you hand over a set of documents to your assistant to brush up on the topic of interest. Simple.


In my demonstration, I used FAISS (Facebook AI Similarity Search), an open-source library developed by Facebook AI Research for similarity search of dense vectors. It is designed to handle large datasets, enabling fast indexing and querying of high-dimensional data. It will help us generate new information from our documents. As we will see later, the quality of these documents is critical. This means, for example, that we need to extract text from PDF files. Such a script is captured in the following picture. Once the documents are processed, we obtain an index of the embeddings. These vector representations map discrete data like words or images into continuous numerical spaces to capture their semantic or contextual relationships. This is not for human consumption and looks like a binary file. The LLM will use it next.


The next Python script (shown above) combines it and runs Ollama as we did. The main difference is that it adds the retrieved data to the context (augmentation). Finally, Gemma2 generates the answer, which is supposed to be better. The answer to the same question, “What can it tell us about the PC-850V?” is shown below. Better, and of course, far from perfect. But that is because with statistical AI, the garbage in, garbage out expression is all the more relevant! In addition, in this example, I used a single PDF file.




Before concluding, a side note: I used issues of the L’Ordinateur de Poche French magazine dedicated to pocket computers and programable calculators in one test. It is a treasure trove of technical information. As shown below, I asked a question in French: “que peux-tu me dire sur la HP-41?” and Gemma2 answered in French, as expected. Alas, the text extraction from the PDF files is terrible. But even in this case, the answer is not that bad. In conclusion, I hope these two posts show that using LLMs is practical and is for everyone!



I’m pretty excited to set up an LLM for myself, been meaning to since I first saw a video about it on YouTube. Your blog has me even more excited now. I want to load it up with all the BASIC and MBASIC books I downloaded from web archive and get to creating the world’s most sophisticated hangman game. Even the world’s best players won’t be able to defeat it.