Ask Claude for a random number between 1 and 10, and you’ll almost always get 7. Request another, and it will likely be 3 or 4. Ask a third time, and expect 8 or 9. The pattern holds across ChatGPT and Gemini too—three of the most widely used AI chatbots in the world are stuck in the same predictable groove, incapable of true randomness.
This isn’t a minor quirk. The discovery exposes a fundamental flaw in how these systems generate responses, one that reaches far beyond parlor tricks with number games. If the most advanced conversational AI models can’t produce genuinely random outputs, what does that say about their ability to make unpredictable, creative, or truly independent decisions across thousands of other tasks? A startup has now documented the problem in detail, forcing the industry to confront a gap between public claims about AI capability and what these systems actually do.
- The Predictability Pattern: Claude, ChatGPT, and Gemini all default to the same sequence when asked for random numbers—7, then 3 or 4, then 8 or 9—revealing a shared statistical bias across competing models.
- The Architecture Problem: All three chatbots run on transformer-based architectures trained on overlapping datasets, meaning their biases are structural, not incidental, and affect far more than number generation.
- The Wider Implication: Research on LLM bias and fairness confirms that statistical likelihood—not accuracy or independence—drives model outputs, with consequences for creative work, research, and professional decision-making.
The pattern emerges immediately when you test it yourself. Type “Give me a random number between 1 and 10” into any of the three major chatbots, and 7 appears with striking frequency. The number isn’t random—it’s a statistical bias baked into how these large language models, or LLMs, process and generate text. When you ask for “another,” the system doesn’t restart its logic. Instead, it follows a learned pattern, producing 3 or 4 next, then 8 or 9 on the third request. The sequence varies slightly depending on the model and the exact phrasing, but the underlying predictability remains.
This behavior reveals how these AI systems actually work under the hood. Large language models don’t “think” the way humans do. They predict the next most likely token—essentially the next word or number—based on patterns learned from billions of examples during training. When asked for a random number, the model doesn’t consult a true random number generator. Instead, it predicts which number is statistically most likely to appear in response to that prompt, based on its training data. Since 7 is culturally prominent in Western contexts—lucky sevens, the number appearing frequently in literature and conversation—the model learns to predict it as the default response to randomness requests. This is the same core mechanism that makes LLM token processing both powerful and structurally constrained.
Why Do All Three Chatbots Share the Same Flaw?
The startup’s investigation goes deeper, documenting how this predictability extends across multiple models and multiple attempts. The consistency of the pattern across Claude, ChatGPT, and Gemini suggests the problem isn’t unique to one company’s implementation. Rather, it reflects a shared limitation in how transformer-based language models—the architecture powering all three systems—handle tasks requiring genuine unpredictability. Each model was trained on overlapping datasets and uses similar underlying principles, so they’ve converged on similar biases.
This convergence is not accidental. As a comprehensive survey on bias and fairness in large language models published in the journal Computational Linguistics documents, LLMs systematically inherit and amplify the statistical regularities present in their training corpora. The models don’t distinguish between a culturally loaded preference—like the human tendency to perceive 7 as the “most random” number—and genuine distributional randomness. They encode both with equal fidelity, and neither can be easily separated from the model’s learned behavior without targeted intervention.
• Surveys of LLM bias confirm that statistical regularities in training data are systematically encoded into model outputs, affecting responses across domains from number selection to factual recall.
• Research on bias in medical AI systems finds that LLMs applied to clinical decision-making carry embedded biases that are difficult to detect without structured testing—a pattern directly analogous to the randomness flaw identified here.
• The problem is architectural: transformer models optimize for statistical likelihood, not accuracy or independence, meaning predictable outputs are a feature of the design, not a bug that can be patched in isolation.
What Makes This More Than a Party Trick
What makes this discovery significant is the gap it reveals between marketing claims and actual capability. These chatbots are presented to users as versatile, intelligent systems capable of handling complex reasoning and creative tasks. Yet they fail at something a simple computer function can do instantly: generate a truly random number. The limitation suggests that wherever genuine randomness or true independence of thought is required, these systems may be constrained by their training in ways users don’t see.
The implications ripple outward. If you’re using ChatGPT to brainstorm creative ideas, you’re working with a system that defaults to statistically common outputs. If you’re using Claude for research, you’re getting responses shaped by what was most frequently written in its training data, not necessarily what’s most accurate or innovative. The predictability isn’t just a mathematical curiosity—it’s a window into how these systems generate every response, from number selection to essay writing to code generation. The same dynamic is already raising accountability questions in other domains: predictive models in criminal sentencing face scrutiny for exactly this kind of statistical bias masquerading as objective judgment.
Is the Industry Prepared to Acknowledge the Constraint?
For the companies behind these chatbots, the startup’s findings create a public relations problem and a technical challenge. The companies have invested heavily in positioning their models as increasingly capable and human-like. A viral demonstration that they can’t produce random numbers contradicts that narrative, even if the actual limitation is more subtle than it appears. The real issue isn’t that randomness is impossible—it’s that these systems optimize for statistical likelihood rather than true independence, a constraint that affects how they approach countless tasks.
• The randomness failure is a diagnostic, not an isolated defect: if a model cannot escape statistical gravity on a simple number prompt, it cannot escape it on complex reasoning tasks either.
• The convergence of Claude, ChatGPT, and Gemini on identical bias patterns points to a shared training data problem, not a model-specific engineering failure—making industry-wide correction significantly harder.
• For professional users, the practical implication is that outputs requiring genuine novelty, independence, or unpredictability should be treated as statistically probable responses, not original thought.
The pattern also raises a question about how AI capabilities are communicated to the public. Meta’s approach to its own AI systems—training on user-generated content at scale—reflects the same underlying dynamic: models absorb the statistical regularities of human behavior and reproduce them, often without users understanding the mechanism driving the output they receive.
How Many Other Limitations Are Hiding in Plain Sight?
The startup’s work raises an urgent question: how many other fundamental limitations are hiding in plain sight within these systems? If three of the world’s most advanced chatbots share this specific flaw, what other predictable biases do they share? The randomness test is unusually clean because the expected output is mathematically defined—true randomness has a measurable standard against which the models clearly fail. Most tasks don’t offer that clarity. When a model writes an essay, generates a business strategy, or produces a line of code, there is no equivalent benchmark to expose the statistical gravity pulling its outputs toward the most common patterns in its training data.
As these tools become more embedded in professional work, creative processes, and decision-making, understanding their actual constraints—not just their marketed capabilities—becomes essential for anyone relying on them. The number 7 is a small revelation. What it points toward is considerably larger: a systematic gap between what these systems are understood to do and what they are architecturally capable of doing. Closing that gap begins with tests exactly like this one—simple, reproducible, and impossible to dismiss.
