Resume Automation 1/2
I have been wanting to play with LLMs for a while now but haven't had the opportunity to come up with a good project to use them.I then remembered the repetitiveness of adapting my resume and motivation letter to each job description. What if I could build a system that loads job offers and adapts a CV and cover letter template?
LLMs are particularly suited for such tasks, both for their interpretative abilities and content generation. We all know LLMs and their potential now with renowned applications such as OpenAI's ChatGPT, Anthropic's Claude, or Meta's Llama which are now the base for countless applications ⇒
https://theresanaiforthat.com/.
At the time of this article (2024), LLMs and more generally GenAI have already disrupted sectors and, as they keep on improving, are even threatening entire job sectors. LLMs can code entire applications with scary accuracy (especially in Python).

https://evalplus.github.io/leaderboard.html
So, since I am soon obsolete and out of a job, a resume automation system might come in handy.
RAG
General LLMs are good at producing content but ultimately do not know anything about me. I first need to feed information about me and my professional experience to the models so that when it comes to hand-picking the most suited skills and experiences for a specific job description, the system knows where to get it.
For that, I could paste my CV into the prompt as plain text and then pass in a query to extract the relevant information. This would work, but most commonly in the field, we use
RAG(
Retrieval-Augmented Generation).
RAG (Retrieval-Augmented Generation) combines two key AI abilities: finding information and creating responses. Think of it like having a smart assistant who first looks up facts in a huge library (retrieval), then uses those facts to give you a well-informed answer (generation). Unlike older AI models that relied on just one approach, RAG blends both skills to provide more accurate and relevant responses.

RAG with langchain
LangChain is a software framework that helps facilitate the integration of large language models into applications. The framework is open-source and has python and javascript client.
RAG always follows the same steps:

Langchain has an extensive number of LLM you can plug in including OpenAI, Aphrodite, MistralAI models and more (full list here).
However, one must pay to use those hence my choice to turn toward self-hosted LLM.
I turned to Ollama, a tool that allows you to run open-source large language models.
First, follow these instructions to set up and run a local Ollama instance:
- Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)
- Fetch available LLM model via ollama pull
- View a list of available models via the model library e.g., ollama pull llama3
- This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.
- On Linux (or WSL), the models will be stored at /usr/share/ollama/.ollama/models
Specify the exact version of the model of interest as such ollama pull vicuna:13b-v1.5-16k-q4_0 (View the various tags for the Vicuna model in this instance)
To view all pulled models, use ollama list
To chat directly with a model from the command line, use ollama run
View the Ollama documentation for more commands. Run ollama help in the terminal to see available commands too.
data_directory = "rag_docs"
db_path = "chroma"
model_name = "llama3.1:8b"
Load
Here we want to load several PDFs present in the
rag_docs
folder. Those pdfs are mutliple CVs and motivation letter I have written as well as a pdf version of my linkedin account.
from langchain_community.document_loaders import PyPDFLoader
import os
docs = []
for file in os.listdir(data_directory):
if file.endswith('.pdf'):
pdf_path = os.path.join(data_directory, file)
loader = PyPDFLoader(pdf_path)
docs.extend(loader.load())
Split
Text splitters
break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won’t fit in a model’s finite context window.
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
Store
We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.
from langchain_community.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma
vectordb = Chroma.from_documents(
documents=splits,
persist_directory=db_path,
embedding = OllamaEmbeddings(model=model_name)
)
Let’s test it
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
retriever = vectordb.as_retriever()
llm_model = ChatOllama(model="llama3.1:8b")
system_prompt = (
"""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context, a set of resumes
and motivation letters from Corentin, to answer the question.
If you don't know the answer, say that you don't know.
Use three sentences maximum and keep the answer concise.
nn
{context}""")
prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
("human", "{input}"),
]
)
question_answer_chain = create_stuff_documents_chain(llm_model, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
results = rag_chain.invoke({"input": "List 5 technical skills from Corentin?"})
print(results['answer'])
Based on the provided context, here are 5 technical skills listed for Corentin:
1. Time series analysis (trend analysis, smoothing, fault detection)
2. Open data mining coupled with GIS
3. Energy demand prediction using ANN (Artificial Neural Networks)
4. Genetic algorithms for renewable energies optimization
5. Backend development using Java and BIM models
RAG with kotaemon
Another (and maybe simpler) way to quickly implement a RAG is to use the open-source kotaemon .
Koteamon is an open-source tool for building document-based Q&A applications. It features a Gradio web interface, supports multiple document formats (PDF, DOC), and uses RAG technology to provide sourced answers with document previews. The framework is extensible and includes built-in vector storage for efficient document retrieval.


Install With Docker
lite
&
full
version of Docker images exists. With
full
, the extra packages of
unstructured
will be installed as well, it can support additional file types (
.doc
,
.docx
, …) but the cost is larger docker image size. For most users, the
lite
image should work well in most cases.
- To use the
lite
version.docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 -p 7860:7860 -it --rm ghcr.io/cinnamon/kotaemon:main-lite
- To use the
full
version.docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 -p 7860:7860 -it --rm ghcr.io/cinnamon/kotaemon:main-full
Once everything is set up correctly, you can go to
http://localhost:7860/
to access the WebUI.
Kotaemon uses GHCR to store docker images, all images can be found here.
Note: to tap in the DB with python, copy paste it onto your local directory:
sudo docker cp kotaemon:/app/ktem_app_data/user_data /mnt/c/Users/Corentin/OneDrive/website/projects/LLM Scraping