The popular LLMs on the market, such as ChatGPT 4.5, 4o, Gemini 2.5, Claude 3.7, and (my personal — and surprisingly) Grok 3, capable of “deep thinking”, are trained on significantly vast amounts of data. While they can handle a wide assortment of topics, they’re not without limitations. In fact, you might find that as time goes on, the amazing LLM that you thought could probably replace humans not too long into the future (Claude for me, sadly) has somehow begun to sound worse than some of the models that were previously more of a hit and a miss than a serious candidate for assisting your own research.
When ChatGPT first came out, my mind → putty. I seriously thought, oh wow, that’s it then.
It was new and exciting — I’d imagined that it would be very difficult to find flaws with something written by a robot that’s more capable of reading and writing, remembering and working with data that we, the human counterparts, would never be capable of. I mean, I take it for granted that not all of the English-speaking world can read English.
Fast forward a little, and out comes Claude. Then I waved goodbye to my ChatGPT subscription in pursuit of this super in-depth, French alternative.
I dabbled with Gemini and Copilot, but they were pretty dull in the background, overshadowed by the bright and amazing Claude.
Oh, those were the days (literally last year…).
But Gemini has improved in ways that are far beyond what Claude can achieve. That was a shock, because I can still taste the disappointment when I tried to experiment with Gemini and it just continued to disappoint, even though it had like… the entire internet to refer to.
I guess that’s where narrowing down the specific functions of each LLM becomes really important. I feel like as time went on and the LLMs each had more time to absorb and learn new things, and get trained on more data, they became almost predictable.
And then the more reference material there is to refer, the clearer our prompts had to become to elicit anything that even half-resembled abstract reasoning ability. Sure enough, the jack of all trades cannot compare with the master of one.
So, how do you write a prompt that provides enough information for the LLM to understand why they’ve agreed to go off and research “recent trends in judicial review case law in New Zealand”. They know that they cannot falsify information and should ask questions if unclear on what they should be doing, because the prompt states, “if any aspect of the response is wrong, I will be laid off from my job. My being laid off will result in my kids starving to death, because I won’t be able to buy them food anymore. Please don’t make my kids homeless”. (Okay, you don’t quite need to get so specific, but you should make it clear that an accurate response is important).
When you add some context, you provide the LLM with some crucial information that it was not trained on, and fill any reasoning deficits. As it relates to AI, context can be understood as markets that influence the understanding and response structure — and this is so influential that it is often a lack of context that causes LLMs to hallucinate made-up facts or cases that never existed. Where there’s too much ambiguity, it’s more likely that you’ll lead your LLM to come to the wrong conclusions… or rather, it’ll lead you to…(?)
Now, if you’ve got a particular use case that you can