Why I stopped sending PII to OpenAI — and what I ran instead
There's a question MOST people don't ask and the rest of us just ignore: 'what is happening to the data we are providing to AI?'. This data is even highly personalized for some people who use AI as their personal assistant or too important for the ones using it as a working buddy. I thought about what I want to use it for and can I use it for that purpose - then I asked myself: "can I just run it locally?"
This is the story of what happens after that question.
The background
It is the enterprise nightmare to wake up and see 50+ notifications. A pile of information requires to be structured, tidied up and understood. This pile led me to decide to hire an "assistant" but no one said it should be a living, breathing one! But there was a problem.. chatgpt could not be the assistant I seek, because of a simple reason - data should remain personal, not become a shared asset. This problem was my starting point to solve my issue and keep the sovereignty of my data while using the benefits of the beautiful realm of LLMs. This is a starting point of countless tests and tries to identify the best assistant for my needs. Lucky for me - there are many local models available open source with quite decent benchmarks.
What I replaced it with
Before I move further, here I have to thank projects like hugging face, llama, ollama, qwen, gemma and many more to make this possible.
Firstly, I replaced the OpenAI with a small, locally hosted orchestration logic running on node.js powered by quantized Llama 3.1 8B. Orchestration logic stepped up the game with capability to read my inbox by itself and removed the manual labor of copying and pasting. Small victory of making the tool work was the beginning of many other possibilities, optimization cycle with the available set of local llms vs the performance I could extract from my device. Llama 3.1 worked well in terms of the output quality, yet the time spent for it to crawl entire outlook was more than a cup of tea - that had to change.
Constraints
- Lack of acceptable amount of vRAM on my workstation, which affects the performance significantly
- Sweet spot on the context size vs length of the email bodies in certain cases
- Formatting noise on the email bodies and unnecessary email parts requiring stripping
- Model capabilities about understanding the text context and processing it to a meaningful highlight
Architectural decisions
The interesting part isn't the model choice. It's the architectural shape:
- This conceptual pipeline Export → Strip Noise → Process with LLM inference → aggregate → render makes a running product called
alfonso - Export of inbox data using a bash script and local folder for mitigating my enterprise hassle to get an entra-id, yet using the standard capability of the outlook
- Using Ollama and its
format: <schema>function for grammar-constrained generation as smaller models are more prone to error - Per-batch failure handling so a single bad egg doesn't tank the whole process
At the end, model stack has been locked on Phi-4-mini for rudimental tasks and routing, granite:7b-a1b-h for lifting the heavy weight and Llama 3.1 8b as the fallback model.
What is the next step
Now that MVP is up and running, my data stays mine. Next: add Jira, then enrich the daily insights cockpit. For the ground work, I started building a cli wrapper for jira to integrate this tool for ultimate productivity tool for my enterprise dreams!