Large language models are rubbish at elementary level math

news image
“9.11 and 9.9, which one is bigger?” Questions as simple as this confuse large language models including OpenAI’s GPT-4o, Moonshot-created Kimi, and ByteDance’s Doubao, according to a post by local media Yicai. Chatbots from China’s Baidu and Tencent generate the correct answer despite using different methods…
阅读更多(Read More)

作者 gocpmall