May I ask which one you are using? For troubleshooting, Google Gemini with Gemini 3.1 Pro and its web search rarely gives me an incorrect answer. Even when I tried to find where I should click on the electronic tax office form, Gemini gave me correct answers.
Grok in this case. Also tried DeepSeek, Gemini, etc. None were paid versions, though.
Apart from that, I'm not sure if tasks like these are the ones where LLMs shine (although they are marketed differently).
That's one of my main issues: they are being marketed as Intelligent Universal Jesus Software, yet the majority of them really aren't that good.
There's a lot to the system prompt and how the model was tuned. I found out that smaller models like Haiku or Gemini Flash are tuned more for instruction following to compensate their dumbness. This means that if some RPC call (for ex. web_search) fails and there's no clear room in the directives to give a fallback answer, they try to use their limited knowledge to provide any answer.
The day it answers: "Scottie, I have no idea! Try the other guy, he's good at that stuff!" is the day I will praise AI.
I think that better uses are translations: language to language, business domain to code, code to code, multimodal data extraction (not only OCR but handwritten text), and text‑corpus labelling (not limited by short context sizes of transformers such as BERT). Those tasks can be done locally on Apple Silicon powered machine with 20W peak usage.
For language stuff, it's fantastic. Transcribing audio, video, generating SRT files in multiple languages... it's super quick, accurate, and easy. I'm still waiting for my Star Trek Universal Translator comm badge, though.
Haven't used it for coding, and for the most part I don't want to - yet.
I'm still waiting for the AI Hype bubble to pop, and then I think we'll see many new amazing things that do not require massive data centers, nuke plants worth of energy, and 300GB of RAM.
I did the same with "Web 2.0"... First there was tons of hype, then the bubble burst, and a few years later I really dived in. But then of course the control and censorship also kicked in/ramped up in a huge way, so... We'll see!