Lessons from using AI in Discovery

We recently went through a set of discovery sprints for clients in the healthcare and education industries. As you can imagine, they were chock-full of stakeholder, SME, customer interviews, ideation and planning and re-planning with the team, and quite a bit of market and competitive data. As you can imagine, there was a ton of documented output from this work, and we used a number of different AI tools to keep our findings organized and actionable.

In this post, we’ll share how we experimented with those different tools to speed up and ideally strengthen the synthesis process, comparing challenges, advantages and and limitations to watch for.

Human brain vs robot

Typical methods of capturing and synthesizing raw data from discovery work might include note-taking (WHAT?!), video recording, and a daily mind-meld where teams share their understanding and perspective of the information we’re hearing. While all of this is certainly still a critical part of continuous discovery, it’s nice to have tools like ChatGPT, NotebookLM, and Gemini to power your process.

What I found to be true during this recent sprint was that no matter how much context any of these AI tools had, the summarization of interviews and insights were always lacking a certain human perspective or nuance which is difficult to articulate. Learning how to use the tools to your advantage is key to maintaining that human quality, which is so critical to capturing customer dynamics and validating a product and business opportunity. After all, it’s people that will use your product, isn’t it?

Automated processes and leveraging AI tools to move more quickly sounds like the dream, but it’s not worth the cost of a shallow understanding of the opportunities at hand. This is why we insisted on a “daily debrief” to align the team on what we were hearing from customers. These daily debrief sessions served as an opportunity to bring many different stakeholders together, who were all working in parallel to do this “fact finding”.

AI Tools for discovery synthesis: a comparison

NotebookLM

This was my first time using NotebookLM, and I’ll give it a 7 out of 10. I’d say 8 if it wasn’t for the fact that I didn’t learn you actually have to “save to note” in order to keep the output it generates. Alas, I found it to be the most trust-worthy tool for synthesizing discovery output.

Some pros

NotebookLM only references the sources that you provide, and you can categorize them in individual “notebooks” which gives the tool the right contextual information. Similar to how we might categorize our thoughts as we write within context, this was a very helpful way to organize the data and I could be more specific with the prompts.

For example, I had a “notebook” that used a customer segmentation matrix and assumptions tracker (comprehensive list of assumptions). I then prompted it to generate a summary of the highest-risk assumptions in relation to target customers and users.

Another thing I really liked about NotebookLM was its referencing function. Any output it generated from the sources I gave it pointed to the specific area in the raw notes or even customer quotes that helped formulate the output.

Here’s an example of the prompt I used in NotebookLM, it’s output, and how that differed from my ultimate recommendation in the end.

Prompt:

Who is the target customers based on dynamic scoring and the highest risk assumptions?

Output:

The sources present various customer segments with a dynamic “Score” that appears to aggregate their needs or suitability across different factors like visibility of referral outcomes, patient attrition, proactive follow-up, EMR complexity, provider data reliability, real-time availability, mixed tool usage, insurance acceptance, proximity to specialists, and referral volumes. Based on these scores, the top target customer segments are:

[Hospital System], with the highest score of 16.

While this was accurate strictly based on a combination of market segmentation scoring and riskiest assumptions, it didn’t account for the nuanced and varied pain points we had gathered from qualitative research. This output prioritized business opportunity over market scalability. While it’s not the “wrong” answer, it didn’t make enough considerations based on other research. My ulitmate recommendation was:

[Client’s] strongest customer segments are smaller, hybrid (specialty and PCP) clinics that have a high likelihood of referring to a larger hospital facility but also rely on various methods of referral routing and communication to complete their workflow, leaving room for patient leakage.

Some cons

Let this be a lesson to all who try NotebookLM, the output it generates does not persist if you do not click the tiny button that says “save to note”. Once I learned this, I actually really appreciated the feature (not bug), because it helped to keep the output much more organized.

The output was robust. This isn’t necessarily a bad thing, but it’s definitely not a brief synthesis. Since I was reviewing an editing output before putting it into any kind of official documentation, it was fine. But I’ll say that ChatGPT 5 has some pretty intuitive follow up prompts to make the output more concise.

The last thing that was a challenge, as is with any LLM tool like this, was capturing the right tone-of-voice. Not just from a brand perspective, but in a way that reflects the human nature of customer discovery. These tools should never replace the human aspects of product design and discovery, but in order to leverage it for speed, we want to reduce how much time we’re spending rewriting or rebuilding that strategic understanding.

ChatGPT

ChatGPT has become a frequent tool I have casually used when needing to reword a paragraph, research an error, or grab a list of resources. However, our healthcare discovery sprint required ramping up on domain-specific terminology and quickly. It became a critical tool for digesting a massive amount of context in a short amount of time. I’ll give it a 7 out of 10 given the amount of time it takes to navigate the verbose output and fact-checking part of the work.

Some Pros

When entering a new domain like healthcare, I relied on ChatGPT to explain concepts in context, unpack acronyms, or provide definitions to unfamiliar terminology. ChatGPT’s broad knowledgebase was very helpful and acted as a bridge from knowing little about the territory to being able to provide meaningful recommendations. This accelerated the learning curve that sometimes comes with working in new industries and allowed for me to follow conversations with subject matter experts and stakeholders more effectively.

Creating a “Project” within this tool gave structure to the large amount of raw notes and transcripts I was working from and kept things organized and referenceable. Since speed is also an important factor in a Sprint setting, being able to prompt over outputs within the context of a project was helpful in getting to insights quickly.

A screenshot of ChatGPT's Project prompt interface

ChatGPT can ingest many types of documents and files and align insights from several formats. From market research spreadsheets to transcripts of user interviews, I was able to spot and reinforce themes.

One early goal of this project was to capture the user’s journey. When hearing signals of this in meetings, I quickly captured what I believed the journey to be in a list separated by an arrow written like “->”. I was able to later retrieve where I used this pattern from my raw notes and piece all of them together. I found this little pattern-matching nuance to be pretty helpful.

Prompt:

My use of “ ->” indicate people’s idea of what the referral process looks like from their vantage points. Based off of this what would be the path so far?

Output:

Referral process by person K. A.‘s vantage point: - Pre-registration - → Full registration in person (eligibility recheck, financials, demographic confirmation) - → Source system (EHR) - → Order sets - → Employed vs. Non-employed (affiliate) provider status - → CPT code capture (may be missing/incorrect)

Some Cons

Like NotbookLM, ChatGPT’s outputs are verbose. The tool tended to over-explain, often requiring additional rounds of prompting to distill insights into digestible, high-signal summaries we needed. In a sprint, where time is tight, this added points of slow-down.

ChatGPT added casual elements not compatible with the professional and clinical tone of this sprint. Namely, when asking for lists or key takeaways, I found this tools added emojis at the beginning of every large list item resulting in having to reprompt or manually remove them,

Fact-checking everything generated from ChatGPT was critical. The tool at times lacked nuance or arrived at absolutes that weren’t validated by the project’s reality. One workaround for this kind of overconfidence was to prompt “You said _____. Can you cite exactly where you heard this and from whom?” Credibility in our recommendations based on our findings is key and it’s important not to blindly base that on information received from a tool like this.

Claude

While NotebookLM was great for organizing sources, Claude became our secret weapon for the heavy analytical lifting that discovery synthesis really demands. I’d give Claude a solid 9 out of 10 for synthesis work.

In our recent sprint with an education technology client, we were juggling multiple user interviews, workflow analyses, and stakeholder voting data across different teams and tools. This is exactly where Claude shines: not just summarizing information, but actually connecting the dots between complex, interconnected problems.

Some Pros

Claude’s ability to hold multiple interview transcripts in working memory and identify patterns across them was game-changing. Instead of analyzing each conversation in isolation, I could ask it to find themes that emerged across all user types, even when people described the same underlying problem in completely different ways.

This is where Claude really stood out from other tools. It didn’t just tell me what people said. It helped identify what the implications were for product strategy. Claude connected individual pain points to systematic workflow breakdowns, and helped generate hypotheses about root causes that I could then validate against other data sources.

Unlike other AI tools, Claude could reference specific insights from one interview while analyzing patterns from others, without losing context. This made it incredibly powerful for testing hypotheses. Like when it suggested that users had already solved collaboration problems and the real issue was system integration, I could immediately validate that against our workflow data.

Some cons

Even with extensive context, Claude occasionally missed industry-specific details that were crucial to understanding why certain problems mattered so much to users. It required human oversight to weight stakeholder voices appropriately and understand the business implications of technical pain points.

Claude sometimes suggested solutions that fit existing tool categories rather than questioning whether the problems needed tool-based solutions at all. I had to explicitly prompt it to consider process changes alongside technology fixes.

Similar to NotebookLM, Claude struggled with capturing the emotional weight and human nuance of user frustrations. The strategic analysis was excellent, but maintaining empathy for the user experience required constant human input.

The Sweet Spot: AI + Human Partnership

Our best synthesis happened when we used Claude for what it does exceptionally well: processing massive amounts of qualitative data, identifying patterns, and generating testable hypotheses. This kept humans focused on strategic interpretation and stakeholder empathy.

The result? In a two-week sprint that would traditionally require weeks of manual analysis, we delivered comprehensive synthesis that identified core workflow failures, prioritized solutions based on both user feedback and strategic impact, and provided specific product requirements grounded in actual user needs.

Claude didn’t replace human insight in our discovery process. It amplified it, letting us focus our analytical time on strategic thinking instead of information processing.

AI limitations and considerations

Data privacy

A big consideration when bringing AI into discovery work is navigating sensitive data. In our healthcare sprint, we needed to be especially intentional about making sure we weren’t including private personal or organizational details. This meant removing personal identifiers from notes so as not to treat AI as a storage system for sensitive information.

Recognizing and counteracting bias

AI models are trained on large datasets that can carry their own biases. Left unchecked, those biases can creep into synthesis work, whether by overemphasizing certain themes, suggesting overly generic solutions, or missing underrepresented voices entirely. We made a point of balancing AI output with our own structured methods (like assumption tracking) to ensure that insights were grounded in what real users told us, not just what an AI “thought” sounded plausible.

Maintaining human empathy and nuance

As we’ve already emphasized a few times, the most important limitation is that AI can’t truly capture human nuance. It can identify themes, but it won’t pick up on the subtle frustration in a user’s tone, or the hesitation in their word choice. Context matters and that layer of empathy still requires a human lens. This is why we paired AI-generated summaries with daily debriefs: to bring in multiple perspectives, preserve the weight of what we heard, and avoid flattening discovery into a sterile list of bullet points.

Key Takeaways: AI-Augmented Discovery

After experimenting with multiple AI tools for discovery synthesis, here’s what we learned: the goal isn’t to automate away human insight, but to amplify it. The most powerful discoveries happened when we combined AI’s pattern recognition with human empathy and strategic thinking.

What AI tools excel at:

Processing large amounts of qualitative data simultaneously
Identifying patterns across multiple sources
Cross-referencing insights for validation
Generating hypotheses about root causes

What still requires human insight:

Understanding emotional weight behind user frustrations
Weighing business constraints against user needs
Recognizing when problems need process vs. technology solutions
Maintaining empathy throughout analysis

The teams that will create the most impactful products are those that learn to leverage AI for information processing while doubling down on uniquely human capabilities like strategic thinking and stakeholder understanding.

Sign up to receive a weekly recap from thoughtbot

About thoughtbot