Last month a BBC study found that “AI” assistants are terrible at providing accurate news synopses. The BBC’s study found that modern language learning model assistants introduced factual errors a whopping 51 percent of the time. 19 percent of the responses introduced factually inaccurate “statements, numbers and dates,” and 13 percent either altered subject quotes or made up quotes entirely.
This month a study from the Tow Center for Digital Journalism found that modern “AI” is also terrible at accurate citations. Researchers asked most modern “AI” chatbots basic questions about news articles and found that they provided incorrect answers to more than 60 percent of queries.
It should be noted they weren’t making particularly onerous demands or asking the chatbots to interpret anything. Researchers randomly selected ten articles from each publisher, then asked chatbot from various major companies to identify the corresponding article’s headline, original publisher, publication date, and URL. They ran sixteen hundred queries across eight major chatbots.
Some AI assistants, like Elon Musk’s Grok, were particularly awful, providing incorrect answers to 94 percent of the queries about news articles. Researchers also amusingly found that premium chatbots were routinely more confident in the false answers they provided:
“This contradiction stems primarily from their tendency to provide definitive, but wrong, answers rather than declining to answer the question directly. The fundamental concern extends beyond the chatbots’ factual errors to their authoritative conversational tone, which can make it difficult for users to distinguish between accurate and inaccurate information. “
The study also found that most major chatbots either failed to include accurate citations to the information they were using, or provided inaccurate citations a huge portion of the time:
“The generative search tools we tested had a common tendency to cite the wrong article. For instance, DeepSeek misattributed the source of the excerpts provided in our queries 115 out of 200 times. This means that news publishers’ content was most often being credited to the wrong source.”
So the BBC study showed modern AI sucks at generating news synopses (something Apple found out when it had to pull Apple Intelligence news headlines offline because the system was dangerously unreliable). The Tow study showed that these same systems stink at citations and conveying exactly how it’s gleaning its (often false) information on news.
That’s not to say that automation doesn’t have its uses, or that it won’t improve over time. But again, this level of clumsy errors is not what the public is being sold by these companies. Giant companies like Google, Meta, OpenAI, and Elon Musk’ Nazi Emporium have sold AI as just a few quick breaths and another few billion away from amazing levels of sentience, yet they can’t perform rudimentary tasks.
Companies are rushing undercooked product to market and overselling its real-world capabilities to make money. Other companies in media are then rushing to adopt this undercooked automation not to improve journalism quality or worker efficiency, but to cut corners, save money, undermine labor, and, in the case of outlets like the LA Times, to entrench and normalize the bias of affluent ownership.
As a result, “AI’s” introduction into our already broken, clickbait-obsessed, ad-engagement driven U.S. journalism industry has been a hot mess, resulting in no limit of inaccuracies, oodles of plagiarism, and more work than ever for already overworked and underpaid human journalists and editors. And this is before you even get to these technologies’ outsized energy and resource consumption.
Layer that on top of a concerted effort by authoritarians and corporate power to undermine journalism and informed consensus to their own financial benefit, and you can start to maybe see just the faint outline of a problem. AI needs careful implementation as the kinks are worked out, not this mad, mindless collective dash to the trough by folks utterly disinterested in any broader real world impact.