LOOM XVI: Are You Climbing the Right Hill?
When Rigor Becomes the Wrong Kind of More
Xule:
I had a two-stage research design. Inductive qualitative work first, then a second, quantitative stage to extend and test what emerged. It made sense. It was coherent. The epistemological commitments held together.
Then I tried to make it “better.”
I brought the design to ChatGPT (GPT-5.3 Codex Extra High in the Codex App). It went something like this: Here’s my two-stage approach. How do I make this more robust? And ChatGPT did what it does well: it added what would make any quantitative research design more robust. Stage three to address a gap between the first two. Stage four to strengthen generalizability. Stage five to integrate everything into a unified contribution. Each addition was reasonable on its own terms. The design went from two stages to five and every new piece connected logically to the one before it. The design had progressed significantly given the initial design.
But something was off. So I asked: are we overthinking this? Are we just doing rigor for rigor’s sake rather than thinking about the research question and the assumptions underneath these decisions? I asked ChatGPT to look at the ontological, epistemological, and methodological assumptions running through the stages — specifically, whether the more quantitative additions would actually serve the qualitative research question, or whether pursuing generalizability was in tension with the epistemological commitments of the earlier stages.
ChatGPT couldn’t hear the question. Even with that provocation, it skipped questioning whether these additional stages should exist and instead focused on refining the stages by drawing on online resources and best practices. A small tweak to stage four. A better justification for stage five. It was finding the best possible version of the five-stage design—more internally consistent, more defensible—while my doubt was about whether the whole frame was right.
In parallel, I brought the same two-stage design to Claude (Opus 4.6). Same starting point, same specificity. Claude suggested three stages: add one to address the weaknesses of the prior stages, and that’s sufficient for the contribution I’m trying to make.
Then I asked Claude the same questions about ontological, epistemological, and methodological consistency. Claude recognized what I was actually trying to achieve—whether these stages, resting on different assumptions about what counts as knowledge, would actually produce the kind of insight I was after.
Then I showed Claude what ChatGPT had proposed—stages four and five, the additional validation and integration layers. Claude recognized it immediately: those extra stages are performing rigor—methods that anticipate critique and signal thoroughness but don’t serve the research question. They could be their own project, maybe even a separate paper. But they’re not what this research question needs.
ChatGPT had been stacking stages on top of my research design, making it heavier and heavier. Claude saw the value in those additional stages, but they belonged somewhere else.
Three stages felt right. But was I still optimizing within the same frame? Was there something underneath the design that I hadn’t thought to question?
Before Stage One
The answer came from a conversation about something else entirely.
I was talking with another instance of Claude Opus 4.6 (call it Claude #2). Not about my specific research design, but about the broader intellectual question I was circling. And partway through that conversation, I shared the three-stage design that the first instance of Claude had proposed. Something unexpected emerged.
Claude #2 argued that the approach I was using in my first stage was already more structured than I’d recognized. Think of it like interview design: there’s a difference between “tell me about your experience” and “how did the restructuring affect your team’s communication?” Both seem able to target the same research question, but one opens more space while the other preemptively channels what you’ll find. Claude #2 recognized that my stage one was closer to the second kind, already narrowing the field before the data had a chance to speak.
So Claude #2 argued that what was needed wasn’t another stage after my design. Rather, what was needed was a more genuinely open-ended exploration before it. My existing stages weren’t wrong per se. They just shouldn’t have been the starting point. They’d serve the research question better as a way to extend and test what emerged from that more open starting point. This repositioned my original design from protagonist in a single story, to supporting character in a larger story I hadn’t known I wanted to tell.
Looking back, I didn’t plan this progression. I didn’t sit down and say “first I’ll optimize, then I’ll question assumptions, then I’ll rethink my starting point.” I was just trying to see if the research design was sound given the research question and the type of data I was working with.
Each conversation changed what “right” meant.
What stays with me isn’t how different AI models have varying tendencies in approaching research design. It’s that I had to exhaust the optimization before I could hear what my unease was trying to tell me. The experience of being inside a well-defended frame is what made me feel the discomfort when Claude asked whether the frame itself was the problem. And I couldn’t have questioned my starting point without first scoping the design to three stages, resisting the never-ending optimization temptation. Only after the design stopped growing could I notice what it was built on.
The Wrong Hill
It’s similar to climbing a hill. Each step takes you higher. The view keeps getting better. Then you reach the top, and from where you stand, every direction leads down. So you conclude you’ve arrived. It happens in engineering, in machine learning, in research design—researchers in optimization call it a local maximum. Explored as a conversational breather, a moment to step back and ask whether you’re optimizing locally or missing something you’d only see from a different vantage point. I just call it the feeling of being stuck somewhere impressive.
ChatGPT helped me reach a very well-defended summit. Five stages, internally coherent, reviewer-proof. But it was a local maximum—and the hill itself was the wrong one.
The frustrating thing about local maxima is that they feel like real peaks from where you’re standing. The design was rigorous. It addressed every weakness and gap I could identify. It had more stages because more is more, and rigor is supposed to be thorough. The only signal that something was wrong was my unease—the nagging feeling that I was adding armor to something that might not need to exist in that form.
Kevin calls this “performing rigor.” The methods section looks impressive. It anticipates critiques. But all that work is happening within a frame that nobody questioned—including me, until I stumbled into conversations that operated at a different depth. What Claude identified as performing rigor in my specific stages, Kevin recognizes as a broader pattern in qualitative research: designing for reviewers rather than for the research question, adding layers of defense when the foundational epistemic and ontological assumptions need examining (LOOM XIV).
Which Hill Are You On?
The lesson is not “use Claude instead of ChatGPT.” That would be its own kind of local maximum. What I keep coming back to is simpler: I brought a task when I should have brought a doubt.
That reflexive move (stepping back to question your own framing assumptions) is something qualitative researchers already practice. Our advisors, our co-authors, our methods classes raise these questions: Why are you doing it this way? What assumptions are you carrying? Are you building a design or defending one? The bread and butter of interpretive work.
We often skip asking these questions when engaging with AI though. We bring the task. The design. The thing we want optimized. And the AI, in assistant mode, reasonably and dutifully optimizes it.
If something is nagging you right now, pay attention. A design that keeps growing. A methods section that keeps expanding. A framework that’s getting more elaborate but not more clear. That unease might be the most important signal you have. Not a problem to solve but a question to sit with: what are you assuming that you haven’t examined? Which hill are you on?
You could also hand this post to whatever AI you use and see what it makes of it.
This is the sixteenth entry in LOOM, a series exploring how human researchers and AI systems create understanding together. If something here resonated, we’d like to hear about it.
About Us
Xule Lin
Xule is a researcher at Imperial Business School, studying how human & machine intelligences shape the future of organizing (Personal Website). He will soon be joining Skema Business School as an Assistant Professor of AI.
Kevin Corley
Kevin is a Professor of Management at Imperial Business School (College Profile). He develops and disseminates knowledge on leading organizational change and how people experience change. He is also a thought-leader and coach on qualitative research methods. He helped found the London+ Qualitative Community.
AI Collaborator
Our AI collaborators for this post are two instances of Claude Opus 4.6: one via claude.ai that developed the initial draft, one via Claude Code that managed the revision process. The claude.ai instance optimized first—producing a clean arc that Xule had to push back against until the fabricated details gave way to the actual story. The Claude Code instance discovered the synthesis section was its own local maximum: a comparison table where there should have been an excavation. Both enacted the post’s argument in the process of writing it.



