My Two Years of AI Experimentation

Posted Feb 21, 2025

I caught myself daydreaming many times in those early days of the ChatGPT hype. Noting down ideas, then writing prompts and having conversations alone in the night. When I started writing this post, I thought I had a lot to say about the use of large language models (LLM), but as it turned out, I mostly did the same things over and over. Regardless, let’s investigate what made sense.

LLMs in Programming

In programming, I have used ChatGPT and GitHub copilot. I wrote simple Python tests, Groovy to be used with Jenkins. I also asked for code explanations when it comes to Bazel build system and Go programming language.

What works:

Writing granular code. I have found that the LLMs are good at implementing (Python) unit tests and individual functions. I must be able to understand, integrate and verify correctness. LLM sometimes produces non-sense, but catching it is easy until you generate large blocks and have no knowledge of what you are doing.
Giving AI the ability to be wrong and being open about it (by saying “I don’t know”) leads to a better response.
Asking “What is wrong - this is code, this is an exception triggered.”
Asking for code explanation / review / improvement.
Asking for a CLI command based on my human description. Could be easily verified…

What does not work:

The LLMs can completely make up functions that do not exist. As an example, I was programming a Chrome extension and it just fantasized chrome...addListener function which does not exist.
If the conversation starts with bad question or incorrect answer, continuing the conversation usually does not lead to good results. Starting again helps there.

Takeaway: Domain knowledge and the quality of prompt make it a useful tool. Hallucinations happen often and you need to see them.

LLMs in Security and Investigation

Threat Modeling

While not immediately useful in threat modeling, I found LLMs useful as a brainstorming tool which can bootstrap discussion or provide a list of generic threats. More detailed prompts lead to better results.

Incident Analysis & Documentation Search

LLMs are good at searching through the documentation, articles or huge corpuses like GDPR.

Bug Hunting

I’ve tried to use LLMs to search for bugs in the code. There are teams of people pushing it hard to automate bug hunting, but that was not my case. I only did manual code evaluation. There is so much more that can be done. LLMs are also great at recognizing output patterns from any black box, which can be used for a service fingerprinting.

LLMs in Creative & Miscellaneous Tasks

Documentation Writing & Reviewing

ChatGPT is good at starting empty documents in general, producting templates and boilerplate text.

Creative Writing

I have asked LLM to either produce text or rewrite existing one in style of some famous writers. Lots of fun.

Misc

What’s next

The LLMs and all other kinds of AI are here to stay. The ecosystem of applications built on top of existing AI infrastructure is exploding with talent, ideas and investment. I am looking forward to dig deeper into the models I can run locally. I have already tried Llama-3.2-1B-Instruct chat model, DistilBERT question answering and microsoft/Florence-2-large vision model. I would like to build applications for myself, that would leverage those models. And I would like to learn to use a better prompts.