Relying on LLMs is nearly impossible when AI vendors keep changing things

Read more at:

Turns out, “the implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session. After a session crossed the idle threshold once, each request for the rest of that process told the API to keep only the most recent block of reasoning and discard everything before it. This compounded: if you sent a follow-up message while Claude was in the middle of a tool use, that started a new turn under the broken flag, so even the reasoning from the current turn was dropped. Claude would continue executing, but increasingly without memory of why it had chosen to do what it was doing. This surfaced as the forgetfulness, repetition, and odd tool choices people reported. …We believe this is what drove the separate reports of usage limits draining faster than expected.”

And with Claude Opus 4.7, the vendor noted, it “has a notable behavioral quirk” of being “quite verbose. This makes it smarter on hard problems, but it also produces more output tokens.”

To be clear, I’m not suggesting Anthropic was doing anything especially poorly. Indeed, these are the kinds of problems all genAi companies face, and I applaud Anthropic’s transparency in publishing its reasoning openly.. (Anthropic executives do seem to be trying to portray themselves as more ethical and responsible than many of their rivals.)

Source link

Relying on LLMs is nearly impossible when AI vendors keep changing things – Computerworld

5 Of The Best PlayStation 5 Accessories You Can Buy At Best Buy

Should You Keep Bluetooth On At All Times?

Are we ready to give AI agents the keys to the cloud? Cloudflare thinks so

How Long Do Raspberry Pis Last? Here’s What Users Say

Leave a reply Cancel reply

Relying on LLMs is nearly impossible when AI vendors keep changing things – Computerworld

5 Of The Best PlayStation 5 Accessories You Can Buy At Best Buy

Should You Keep Bluetooth On At All Times?

Are we ready to give AI agents the keys to the cloud? Cloudflare thinks so

How Long Do Raspberry Pis Last? Here’s What Users Say

Windows shell spoofing vulnerability puts sensitive data at risk – Computerworld

AI agents can bypass guardrails and put credentials at risk, Okta study finds – Computerworld

Apple breaks records, admits it can’t make Macs fast enough – Computerworld

Leave a reply Cancel reply