Colectiv Blog

Can you trust AI to do qualitative research?

Patricia Lucas | 04 Mar 2025

There is plenty of hype about the potential of Generative AI, but just as much doubt and fear.

In the world of qualitative research, the emergence of Large Language Models, and ChatGPT in particular, has brought in the potential for machine-led automation to a system of inquiry that seeks to be context-based and naturalistic. There is lively debate in the academic literature and in online spaces about whether AI approaches are appropriate, ethical, or whether they are even any good.

As a researcher, I think the useful question isn't 'can we trust AI' but 'how can we make it trustworthy?'. It's a technological shift that is here to stay. What I want to know is what does a trustworthy AI tool look like, and how can I use AI in trustworthy and reliable ways?

What does trustworthy look like?

Trustworthiness is a keystone of quality in qualitative research. Qualitative research does not seek to establish generalisability, measure validity, or expect results to be replicable over time or place. These are the markers of quality in quantitative research. Instead, qualitative researchers want rich data, for which the markers of quality are credibility, transferability, dependability and confirmability (per Lincoln & Guba).

Typical actions to increase credibility and trustworthiness in qualitative research include: engaging with the context, triangulation (comparing different points of view through different sources of data or different analytic approaches), dependability checking (quality assurance processes such as peer review and audit trails), transferability (describing context, and seeking diversity when collecting data so that similarity to other contexts can be judged), and member checking (confirming whether participants feel their views and experiences have been accurately represented).

What have we done to build trustworthiness?

At Colectiv, we have built tools to both collect and analyse qualitative data.

Our AI interviewer is designed to ask questions following a topic guide, much like a traditional semi-structured interview. Each participant experiences a unique conversation because new questions are generated each time, responding to answers they provide and probing for further details. Credibility is increased because:

Every Interview is adapted to project context, to ensure that the tone and questions are appropriate for the specific participant, and that safeguards can be built in when necessary
We always pilot our interviews as part of interview development, so we can refine and improve them with feedback from participants
We are honest with participants: we let people know they are speaking to an AI agent, and make it easy for them to opt out or opt in. We use plain language
We make it easy for people to engage with interviews on their phones and in the language of their choice

We also use AI to support analysis. Generative AI is particularly good at summarising large amounts of text, but it is also lazy: it has a bias towards the information that appears first, it tends to stop when the summary appears sufficient (rather than complete), it may oversimply and ignore differing or nuanced viewpoints. It may also hallucinate, generating false data. We take a number of steps to avoid these risks, and to build trust in our outputs:

Analyses are always informed by project context to improve interpretation.
Our analytic framework is designed by our research team project partners (ie humans!).
Coding itself is performed by AI, and we can use AI to analyse and reanalyse findings (coding and recoding), piloting and refining the framework.
We have built our processes to make sure that every single interview is analysed in depth. We are not relying on simple (and potentially lazy) AI summaries. This also produces rich insights, supported by quotes from individual participants.
We check rigorously for hallucinations and errors. This process is automated, but does not employ AI.
We make it easy to view insights, compare between individual participants based on their personal characteristics, linking insights to context, or to search for key words or phrases.
We welcome opportunities for comparison and triangulation of our findings.

We believe that our processes contribute to credibility and dependability (using project bespoking, piloting, individual-level analysis and quality assurance), confirmability (pilot, access to individual level insights and team checking), and transferability (findings linked to participant characteristics, and available in a searchable, sortable format).

We know from the transcripts that our AI interviewer asks relevant and in-depth questions, focussed on the topics directed. But, as importantly for quality assurance, the feedback from interviews suggests that people understand the experience, and find it easy and enjoyable to take part:

"it was nice talking with you. You are a good qualitative researcher."

" This is my first interview with AI, and I really enjoyed it. For example, when I didn’t understand the question, its ability to quickly rephrase and clarify made me feel good. "

"it worked well and avoided errors such as asking for 1-5 rankings without saying which is low and high, unlike the human powered survey I answered just before this one."

Building context into the interviewer training improves the quality of the interviews. When we compare responses to our project interviews, to responses to short, context-free Demo interviews we use on our website and elsewhere. People who test context-free interviews are more likely to tell us the conversation felt less natural:

"I think it might need to be a bit more conversational. It appeared as if the bot was trying to base the conversation on a pre-defined set of questions or inquiry areas, which makes sense if it is an interviewer"

We have maximised trustworthiness by keeping human hands on the tiller of the AI machine. At the moment, this means holding back on fully automating some steps. But we can still complete and release quality-assured coding of hundreds of interviews in more than 20 languages within 24 hours of interview completion.

Who do you trust?

Some of those who argue against the use of AI for qualitative research do so because they find examples of low quality outputs. But not all research is good research, and being entirely conducted by humans is no protection against unreliable or poor quality research. Different AI tools are going to perform differently, just as different people and teams do. They should be judged individually. We have put trustworthiness and quality assurance at the heart of our tools, but we continue to look for improvements and are open to suggestions and challenge.

Just as every tool should be judged on its merit, its application should be evaluated in context. AI tools are not appropriate for every research context or question. They can come into their own when scale or speed is imperative, where there are barriers to traditional methods (such as language or distance), or when used alongside traditional qualitative methods to reach different groups of participants.

When you want to reach more people, in multiple languages, in a digital native mode, and with fast access to insights, we think our human-curated method can be trusted.

If you have any questions, or want to test out our tools, drop us a line hello@colectiv.tech

Can you trust AI to do qualitative research?

What does trustworthy look like?

What have we done to build trustworthiness?

Who do you trust?

Links

Contact us