Why human insight remains the foundation of credible UX research in the age of AI

Why human insight remains the foundation of credible UX research in the age of AI

AI Can Analyse Research. It Cannot Replace Participants
Why human insight remains the foundation of credible UX research in the age of AI

In 2000, Jakob Nielsen argued that testing with five users could uncover 85% of usability problems. This pivotal moment helped demystify UX research, shifting it from a costly exercise to a pragmatic necessity for building effective products. Research became the cornerstone of product development, focusing on a deep understanding of problems and users. As digital products multiplied, the demand for research grew proportionally.

Today, Artificial Intelligence has triggered a similarly transformative shift. UX is currently leading AI adoption; a Nielsen Norman Group study of one million conversations with Claude.ai found that UX professionals, representing less than 0.01% of the US workforce, generated 7.5% of the total volume in AI conversations. Common applications include converting interviews into transcripts instantly and tailoring consent forms, test plans, and screener questions. More advanced uses include reinforcement learning algorithms that modify survey flows in real time based on participant engagement, and Natural Language Processing (NLP) that identifies vague or contradictory responses during sessions, prompted participants to clarify input immediately.

As AI becomes embedded in workflows, the challenge is determining where automation improves research and where it risks weakening human insight.

AI in UX Research: A snapshot

In the study “Revolutionizing Survey Data Collection with AI-Powered Automation in Sample Selection and Response Quality,” Akintola concludes that AI-enhanced survey systems achieved a staggering 94.2% completion rate, compared to 78.6% in the control group. For UX researchers, AI also proves to be particularly helpful in diversifying linguistic variety, with a Type-Token Ratio of 0.72 compared to 0.58 in traditional surveys, according to the same study. 

Specialized AI agents can analyze hundreds of pages of transcripts and screenshots to generate detailed, multi-page reports based on specific research objectives and discussion guides. This cuts time drastically, especially that some of these agents can do the job quite well, while others fall into the usual AI trap of filling gaps with fabricated or unsupported information.

The Limits of Automated Insight

When it comes to research participants, the story completely changes. AI can assist in predictive sampling using machine learning to analyze historical behavioral and demographic data and identify and recruit high-quality respondents who are most likely to provide consistent and complete data. These tools can also support the initial stages of defining user characteristics and mapping expected user journeys to better understand the research context. Nonetheless, this cannot replace real human participants. AI is rapidly improving the analytical layer of UX research, yet it does not resolve the most fundamental determinant of insight quality: the unexpected nuances that researchers learn from. 

Despite technological progress, four persistent challenges continue to shape the credibility of research insights: 

  1. The layered nature of human behaviour that extends beyond observable data, 
  2. The growing reliance on self-service research practices that can erode study quality, 
  3. The risk of AI amplifying weak participant input into authoritative conclusions, and 
  4. The limitations of synthetic users.

The value of research lies in observing subjects closely to understand behaviours and pain points. This is compromised when relying heavily on AI interpretation. Subtle nuances like body language contradicting verbal responses are gold mines for researchers that automated systems cannot truly decode. Keywords like “disappointed” may be captured, but human interaction is required to grasp what such blanket terms signify.

Interestingly, research by Gale M. Lucas et al. (2017) found military personnel reported more PTSD symptoms to virtual humans than anonymous surveys, and Tamir Mendel et al. (2024) found people disclose health information more readily to AI when trust is controlled. This suggests disclosure behaviour varies by context. When sample quality is strong, AI can be adopted at appropriate stages. Researchers must judge when intimate topics justify AI and when moments require direct human moderation. This judgment is not new; AI simply introduces new tools into a familiar process.

The Rise of Unmoderated  Research

Unmoderated research has democratised UX research and allowed more people to conduct studies remotely, a valuable validation layer for many designers, marketers, and product managers. 

Still, many platforms are not designed by professional researchers, leading to reduced methodological rigour (Schirra et al., 2023). Without clear foundations, sample selection loses credibility. Untrained eyes may miss broken links or faulty prototypes, which directly impact participant responses. Glitches lead to frustration, while unclear questions reduce accuracy.

From an ethical standpoint, skipping quality assurance causes respondents to perform unpaid labour navigating inconsistencies. A faulty setup paired with low-quality input leads to inaccurate outcomes. This is dangerous because these assumptions are often scaled across organisations. Erroneous conclusions inevitably lead to solutions disconnected from real problems.

AI Amplification Risk

Erroneous conclusions are reinforced by AI tools that fill gaps so convincingly they go unnoticed. Sceptical researchers must maintain doubt, as most tools are optimised for confident outputs. Rather than admitting missing information, generative tools may compensate with fabricated details. Akintola asserts these modules seek to maximise completion rates and response richness while minimising cognitive fatigue. While the latter keeps participants engaged, the former makes reports look deceptively complete.

AI is skilled at filling perceived gaps but poor at detecting general gaps. Researchers must ensure all facets of a problem are covered in prompts or explicitly direct AI to look for blind spots. In an OpenAI demonstration of GPT-4, the tool failed to detect a person making “bunny ears” in the frame until specifically asked. Researchers should similarly prompt AI to watch for unexpected irregularities in data.

Synthetic Users and the Limits of Simulation

Another major AI development, perhaps the most provocative frontier in UX research, is the emergence of “digital twins” and synthetic users. In theory, these personas are created to simulate human behaviour, attitudes, and responses at scale without having to recruit and compensate real participants and the logistical hurdles that come along. While enticing, they are a derivative layer that depends on real-world data and cannot replace human participants entirely.

To put this into context, a 2024 Stanford study (Park et al., 2024) in which researchers built digital twins capable of replicating humans in survey responses resulted in responses with an accuracy of approximately 85%,  demonstrating that synthetic users have an impressive ability to emulate real users. The caveat, though: Reaching such high levels of accuracy is on the back of rich, detailed data collected from real participants over a substantial amount of time, including intensive two-hour qualitative interviews with more than 1,000 participants. When attempting to generate synthetic users using only brief persona descriptions or general demographic attributes, the accuracy dropped noticeably. A clear understanding of real users is a necessary precursor to any successful synthetic users.

Synthetic users also lack response diversity. Datasets often fall short in linguistic variability. A 2023 case study by Hämäläinen compared human accounts of video game art to GPT-3. While humans provided diverse titles, AI mentioned one popular game, Journey, 151 times compared to the humans’ 7. This “journey bias” shows AI gravitating toward widely recognised examples at the expense of organic variety.

Linguistically, synthetic datasets can be five times larger than human ones, yet their unique word count can be 2,000 words lower (Meyer, 2022). AI uses semantically dense words to appear articulate, resulting in language that lacks the naturality of human dialogue. Filler expressions, conversational irregularities, and unpredictability—which add value to research—are removed from the data pool.

Furthermore, synthetic users cannot fully represent lived experiences. They infer general patterns but fail to mirror individual life stories. In health coaching scenarios, models built from demographics highlighted reflective motivation but failed to represent social or physical opportunities (Yun, 2025). Personality-driven simulations also struggle with unpredictable human behaviours that current models cannot detect.

Despite these limitations, synthetic users provide value when used as a complement. They are useful for pilot testing prototypes, debugging analysis pipelines, or performing “smoke tests” before involving real participants. This preemptive step helps avoid issues during actual studies. Ultimately, effectiveness relies on preliminary research. The purpose of UX research is to uncover fine nuances hidden within cognitive workings, which is why real participants remain crucial.

Conclusion

As AI becomes central to workflows, researchers must learn where automation strengthens research and where it weakens it. AI accelerates transcription and data processing, but it cannot emulate human nuance, methodological rigour, or contextual understanding.

Several risks illustrate this tension: automated interpretation limits, methodological erosion in self-service research, amplification of weak input, and the reliance of synthetic users on real-world data. Used strategically, AI strengthens workflows, but the foundation of credible insight remains unchanged. Real human participants are the essential source of the understanding researchers seek to uncover.

Sources

      Akintola, A. F., & Akanji, A. R. (2025). Revolutionizing survey data collection with AI powered automation in sample selection and response quality. International Journal of Advanced Statistics and Probability, 12(1), 40-51. https://doi.org/10.14419/8m5jht86

     Hämäläinen, P., Tavast, M., & Kunnari, A. (2023). Evaluating large language models in generating synthetic HCI research data: A case study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23) (Article 237, pp. 1-19). ACM. https://doi.org/10.1145/3544548.3580688

     Lucas, G. M., Rizzo, A., Gratch, J., Scherer, S., Stratou, G., Boberg, J., & Morency, L.-P. (2017). Reporting mental health symptoms: Breaking down barriers to care with virtual human interviewers. Frontiers in Robotics and AI, 4, 51. https://doi.org/10.3389/frobt.2017.00051

     Ma, C., Xu, Z., Ren, Y., Hettiachchi, D., & Chan, J. (2025). PUB: An LLM-enhanced personality-driven user behaviour simulator for recommender system evaluation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25) (pp. 1-5). ACM. https://doi.org/10.1145/3726302.3730238

     Mendel, T., Nov, O., & Wiesenfeld, B. (2024). Advice from a doctor or AI? Understanding willingness to disclose information through remote patient monitoring to receive health advice. Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), Article 386. https://doi.org/10.1145/3686925

     Meyer, S., Elsweiler, D., Ludwig, B., Fernández-Pichel, M., & Losada, D. E. (2022). Do we still need human assessors? Prompt-based GPT-3 user simulation in conversational AI. In 4th Conference on Conversational User Interfaces (CUI 2022) (pp. 1-6). ACM. https://doi.org/10.1145/3543829.3544529

     Yun, T., Yang, E., Safdari, M., Lee, J. H., Kumar, V. V., Mahdavi, S. S., Aharony, R., Michaelides, A., Schneider, L., Galatzer-Levy, I., Jia, Y., Canny, J., Gretton, A., & Matarić, M. (2025). Sleepless nights, sugary days: Creating synthetic users with health conditions for realistic coaching agent interactions. Preprint.

     Zhang, Y., Atiq, A., & Chow, W. (2024). Exploring the role of AI in UX research: Challenges, opportunities, and educational implications. In T. Cochrane, V. Narayan, E. Bone, C. Deneen, M. Saligari, K. Tregloan, & R. Vanderburg (Eds.), Navigating the Terrain: Emerging Frontiers in Learning Spaces, Pedagogies, and Technologies. Proceedings ASCILITE 2024 (pp. 556-560). ASCILITE. https://doi.org/10.14742/apubs.2024.1341


Making UX Research Count at the Decision Level

For UX and research leaders, the focus is shifting from delivery to influence—ensuring insights shape decisions, not just inform them.

UX360 Europe 2026 brings together senior leaders working at that level. Explore how leading organisations apply research to guide product strategy, examine case studies grounded in execution, and gain frameworks that link UX work directly to measurable business outcomes.

Connect with peers facing similar challenges and learn innovative methods and cutting-edge strategies from DHL, Google, Mastercard, Airbus, and Volvo Cars, and more, to embed research into decision-making structures.

If you work in or with UX, research, or product teams, this is directly relevant to your role.

📍UX360 Europe 2026 | June 23–24 | Berlin, Germany
⏳ Regular Rate Ends May 23 — register now.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.