Trust me, bro: why asking an LLM about your market is not the same as researching it

A different kind of wrong

Most of the conversation about AI in research focuses on hallucination: the model confidently states something that is not true, with no source to back it up. That is a real problem, and there is a lot of useful work being done on how deterministic pipelines and source-anchored retrieval prevent it.

But there is a second failure mode that receives less attention, and it is in some ways more insidious. It does not show up as a confident fiction. It shows up as a confident answer that sounds completely plausible, is internally coherent, and happens to be systematically skewed toward what the internet finds appealing.

Recent research from Esade, the University of Sydney, and NYU Stern tested seven leading LLMs across thousands of simulations, asking them to choose between competing strategic directions. The results were consistent across models and across contexts. The models almost universally preferred differentiation over cost leadership, long-term thinking over short-term urgency, collaboration over competition, and augmentation over automation. Not because the evidence pointed that way. Because those positions carry positive emotional weight in the training data. The models were, in effect, optimising for social desirability rather than analytical accuracy.

The researchers called this “trendslop.” It is the right word for it.

Why this matters more for research than for strategy

The HBR paper focused on strategic advice, which is a sufficiently alarming application. But the same mechanism applies directly to market research, and it may be even more damaging there.

When a researcher asks an LLM to summarise a market, characterise consumer sentiment, or describe what a category of buyers actually cares about, the model is not pulling from a monitored set of current, organic conversations. It is generating the most statistically likely response based on its training corpus. And that corpus skews heavily toward the kind of content that gets written and shared: business journalism, social media, startup culture blogs, consumer trend reports, TED Talk transcripts, and the accumulated output of a thousand marketing agencies writing about what consumers want.

That content has consistent biases. It overrepresents articulate, digitally active consumers. It overrepresents premium and innovation-focused narratives. It underrepresents the unglamorous majority of purchasing decisions, which are driven by price, habit, availability, and inertia. It reflects what people say they value in public rather than what shapes their behaviour in practice.

So when you ask an LLM what consumers in your category care about, you are likely to get back a confident summary that sounds like a trend report from 2023: sustainability, authenticity, personalisation, community, purpose-driven brands. Not because your specific consumers said any of those things, but because those words have high positive valence across the training distribution. The model is giving you the socially acceptable answer, dressed up as analysis.

The context problem does not fix itself

The obvious response is to provide more context. Tell the model about the specific category, the specific audience, the specific geography. Give it your brief.

The NYU research tested this directly. Providing context, including detailed industry-specific scenarios ranging from tech startups to hospitals to construction firms, shifted the bias by an average of eleven percentage points. The underlying preferences were still there. The model still leaned toward whatever the more appealing-sounding option was. It just leaned slightly less hard.

This is not a prompting problem. Better prompts do not dissolve a prior. They modulate it. The model has absorbed the internet’s worldview, and that worldview is going to shape its outputs regardless of how carefully you frame your question. Adding context helps at the margins. It does not change the structural problem.

There is also a subtler issue. The more context you provide, the more convincingly tailored the output sounds. A model given a detailed brief about a niche B2B software category will produce a response that feels specific to that category. It will use the right vocabulary. It will reference plausible dynamics. But the underlying shape of what it tells you, which concerns are foregrounded, which audience behaviours are emphasised, which directions it finds promising, will still be influenced by what the training corpus said about software markets generally, which is to say, by what gets written about software markets by people who write about software markets.

The output sounds custom. The priors are not.

What grounded research looks like by comparison

The reason this matters is not aesthetic. It is structural.

We have been building a tool that runs live web searches to map competitive landscapes, then passes the retrieved results through deterministic filters before any LLM touches them. Testing it against Tripletex, a Norwegian accounting and ERP platform, produced five competitors across three tiers: two direct, two adjacent, one peripheral. Each came with a confidence score and specific signals pulled from actual search results ("merger with Accountor Software," "native Altinn integration"), traceable back to the sources that produced them. Seventy-eight results were filtered and logged with reasons. Nothing was invented.

Ask a general-purpose LLM the same question and you get a different kind of answer. Visma would almost certainly appear. 24SevenOffice probably would too. But Accountor Software, a Finnish provider expanding into Norway via acquisition, would likely be missing. It is not famous enough to dominate the training corpus. Enerpize, a global player with no clear Norwegian presence, might appear or might not depending on what the model happens to associate with Nordic ERP. More importantly, you would have no way to know what was excluded and why, because nothing was excluded deliberately. The model just answered from whatever it knew, weighted by whatever its training data happened to emphasise.

That is the gap. Not hallucination, though hallucination is also a risk. The deeper problem is that the LLM’s answer reflects the distribution of content written about a topic, not the actual landscape of the market. Well-documented companies get included. Quieter regional players get missed. The output is shaped by what has been written about, which is not the same as what exists.

The same dynamic runs through consumer research. When a consumer complains about a product on a forum at eleven o’clock on a Tuesday night, they are not performing for an audience. They are not calibrating their language to what sounds good. They are saying what actually happened to them, in their own words, with the specific detail that only comes from direct experience. A forum thread where seventeen people describe the same confusion around a particular workflow step is a different kind of signal from a model’s summary that “users value clarity and transparency.” Both might be directionally true. Only one tells you something you can act on, trace back, and defend.

The gap between those two things is not a gap between AI and non-AI. It is a gap between a system that retrieves what people actually said and a system that predicts what the internet would say if you asked it.

The hybrid trap in research

The NYU paper identified a failure mode it called the “hybrid trap.” When LLMs are not forced to choose between options, they frequently recommend doing both: pursue differentiation and cost leadership, adopt radical and incremental innovation simultaneously. This sounds sophisticated. It is usually strategic confusion dressed up as balance.

The same trap exists in research. When you ask an LLM to characterise consumer sentiment in a category, it tends to surface a tidy set of concerns that covers multiple dimensions without prioritising between them. Price sensitivity and premium aspiration. Convenience and quality. Digital engagement and human connection. These tensions are real. But presenting them in parallel, with equal weight, is not insight. It is a hedge.

A research output that tells you consumers want both value and experience, both simplicity and richness, both speed and depth has not told you anything. It has given you the full list of things consumers ever want, which you already knew. What you needed was a prioritised, specific account of what this consumer group, in this context, values most and compromises on most. That requires evidence from actual conversations, not from a model’s attempt to be comprehensively correct.

The traceability gap is still there

Even setting aside the bias problem, the traceability problem remains. When an LLM tells you that consumers in a category are concerned about environmental impact, you cannot ask it which consumers, in which conversations, using which words. There is no source. There is no audit trail. There is no way to go back to the raw material and check whether the characterisation is accurate, how prevalent the concern actually is, or whether it is concentrated in a specific segment.

This matters for the same reasons it always has: a finding you cannot trace is a finding you cannot defend. But it matters in a new way given the trendslop problem. If the model’s outputs are systematically biased toward culturally appealing themes, and those outputs are also untraceable, you have no way to know whether a given finding reflects your data or the model’s priors. The two failure modes compound each other. The bias is invisible because there is no source to check it against.

Source-linked research does not just protect you in the meeting room when someone challenges a finding. It protects you from believing findings that were never really findings in the first place.

What research is actually for

Research exists to tell you something specific that you did not already know, something specific enough to affect a decision. Not a general characterisation of what people in a broad category tend to value. Not a culturally plausible account of the tensions in a market. Not a summary of what the internet has said about your sector over the past several years.

The reason a researcher spends time in the field, monitors forums, codes transcripts, and builds a picture from raw, organic, unprompted signal is precisely because the shortcut does not work. The shortcut gives you what sounds right. The work gives you what is right, for your specific audience, in your specific context, right now.

That distinction has always existed. LLMs make it harder to see because they are very good at producing outputs that sound like research. Fluent, structured, confident, plausible. But plausible is not the standard. Traceable is the standard. Specific is the standard. Defensible is the standard.

“Trust me, bro” has never been a methodology. It is just faster now.

Mimir monitors organic consumer conversation continuously, with every signal linked to its source. Start for free.