My Two Years of AI Experimentation
Posted

I caught myself daydreaming many times in those early days of the ChatGPT hype. Noting down ideas, then writing prompts and having conversations alone in the night. When I started writing this post, I thought I had a lot to say about the use of large language models (LLM), but as it turned out, I mostly did the same things over and over. Regardless, let’s investigate what made sense.
LLMs in Programming
In programming, I have used ChatGPT and GitHub copilot. I wrote simple Python tests, Groovy to be used with Jenkins. I also asked for code explanations when it comes to Bazel build system and Go programming language.
What works:
- Writing granular code. I have found that the LLMs are good at implementing (Python) unit tests and individual functions. I must be able to understand, integrate and verify correctness. LLM sometimes produces non-sense, but catching it is easy until you generate large blocks and have no knowledge of what you are doing.
- Giving AI the ability to be wrong and being open about it (by saying “I don’t know”) leads to a better response.
- Asking “What is wrong - this is code, this is an exception triggered.”
- Asking for code explanation / review / improvement.
- Asking for a CLI command based on my human description. Could be easily verified…
What does not work:
- The LLMs can completely make up functions that do not exist. As an example, I was programming a Chrome extension and it just fantasized
chrome...addListener
function which does not exist. - If the conversation starts with bad question or incorrect answer, continuing the conversation usually does not lead to good results. Starting again helps there.
Takeaway: Domain knowledge and the quality of prompt make it a useful tool. Hallucinations happen often and you need to see them.
LLMs in Security and Investigation
Threat Modeling
While not immediately useful in threat modeling, I found LLMs useful as a brainstorming tool which can bootstrap discussion or provide a list of generic threats. More detailed prompts lead to better results.
- Create a threat model using STRIDE methodology.
- Find a correct threat category according to STRIDE for a given threat.
Incident Analysis & Documentation Search
LLMs are good at searching through the documentation, articles or huge corpuses like GDPR.
- “ClusterManager.FetchClusterUpgradeInfo” - where does this come from?
- What is “roles/storage.insightsCollectorService” doing?
- Which GDPR article applies to this?
Bug Hunting
I’ve tried to use LLMs to search for bugs in the code. There are teams of people pushing it hard to automate bug hunting, but that was not my case. I only did manual code evaluation. There is so much more that can be done. LLMs are also great at recognizing output patterns from any black box, which can be used for a service fingerprinting.
LLMs in Creative & Miscellaneous Tasks
Documentation Writing & Reviewing
ChatGPT is good at starting empty documents in general, producting templates and boilerplate text.
- Create a “Lightweight risk assessment with threat modeling” document.
- Come-up with a list of titles for a documentation.
- Give me 1, 2 and 3 sentence long summaries of an article.
- Give an example of a SARIF file content that uses invocation.workingDirectory.uri property?
Creative Writing
I have asked LLM to either produce text or rewrite existing one in style of some famous writers. Lots of fun.
- Rewrite this text in a style of Hunter Thompson gonzo journalism. (Exempt from this article.)
- Create a detective short story consisting of only up to 7 sentences.
- If the Lewis Mumford is alive today and wants to publish an updated version of his book Technics & Civilization, what would be the content of the new chapters about?
- Uncommon intriguing words
Misc
- Use scenario planning to produce 2 to 3 scenarios for the Bitcoin.
- Produce a STEEP analysis of a country.
- Identify a movie by describing its plot.
- What does the ‘20 error mean on an Ikea dishwasher?
What’s next
The LLMs and all other kinds of AI are here to stay. The ecosystem of applications built on top of existing AI infrastructure is exploding with talent, ideas and investment. I am looking forward to dig deeper into the models I can run locally. I have already tried Llama-3.2-1B-Instruct chat model, DistilBERT question answering and microsoft/Florence-2-large vision model. I would like to build applications for myself, that would leverage those models. And I would like to learn to use a better prompts.