Project Details

Maven: Mobile AI Research Assistant - Conversation Design Case study

Maven: Mobile AI Research Assistant - Conversation Design Case study

Maven: Mobile AI Research Assistant - Conversation Design Case study

Maven: Mobile AI Research Assistant - Conversation Design Case study

Hero title for Maven

Project Overview

Maven is a mobile AI research assistant designed to help knowledge workers find and synthesize information. The project focused on establishing three core principles: transparency over perfection, user agency, and confidence calibration. This study documents the transition from a "black box" assistant to a "helpful colleague" interface.

Designing trustworthy AI for mobile is hard. Users don't trust AI that:

- Makes things up when uncertain

- Hides its reasoning process

- Can't recover from errors gracefully

Research findings:

• 67% abandon AI tools after one "hallucination" or wrong answer

• 73% don't trust AI without source attribution

• Mobile users spend an average of 12 seconds per interaction

• 84% prefer "I don't know" over confident, wrong answers


User quote: "I need to know WHY the AI suggested this and WHERE it found it. On mobile, I need this info fast."

Client:

Client:

Maven

Maven

My Role:

My Role:

Lead Content Designer/UX Researcher

Lead Content Designer/UX Researcher

Year:

Year:

2025

2025

Services:

Services:

Conversation Design/UX Writing

Conversation Design/UX Writing

THE CHALLENGE

Most AI assistants operate as opaque systems, providing answers without reasoning or citations. On mobile, these issues are exacerbated by:

  • Mobile Constraints: Providing deep research without causing "Scroll Fatigue" in a 5.5" viewport.

  • Trust Deficit: The difficulty of verifying claims when sources are buried behind multiple taps.

  • Voice Tone: Moving away from a servile assistant tone to a collaborative, professional "colleague" voice.

Maven
Maven
Maven

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

  1. Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.

  2. User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.

  3. Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.

  4. Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

  1. Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.

  2. User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.

  3. Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.

  4. Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

  1. Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.

  2. User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.

  3. Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.

  4. Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

  1. Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.

  2. User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.

  3. Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.

  4. Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

user persona for Beacon web application
user persona for Beacon web application
user persona for Beacon web application

Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

  • HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"

  • MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"

  • LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"

  • NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

  • ERROR: No results found  

    "No recent results for 'X.' Try: • Broader search terms • Different time range 

    Related topics [which interests you?]."

  • ERROR: Ambiguous request

    "I can interpret 'climate' as: • Climate change policy • Business climate 

    Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)  

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

  • TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

  • TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

  1.  Quick reply buttons vs. open input


    Problem:
    When to guide the user vs. let them type freely?


    Solution: Use buttons after AI asks a question, open input for user-initiated


    Outcome:

    AI asks clarification → Show buttons:

    "Which time period? [Past week] [Past month] [Past year] [All time]."

    User asks a new question → Open input field


Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

  • HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"

  • MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"

  • LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"

  • NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

  • ERROR: No results found  

    "No recent results for 'X.' Try: • Broader search terms • Different time range 

    Related topics [which interests you?]."

  • ERROR: Ambiguous request

    "I can interpret 'climate' as: • Climate change policy • Business climate 

    Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)  

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

  • TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

  • TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

  1.  Quick reply buttons vs. open input


    Problem:
    When to guide the user vs. let them type freely?


    Solution: Use buttons after AI asks a question, open input for user-initiated


    Outcome:

    AI asks clarification → Show buttons:

    "Which time period? [Past week] [Past month] [Past year] [All time]."

    User asks a new question → Open input field


Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

  • HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"

  • MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"

  • LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"

  • NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

  • ERROR: No results found  

    "No recent results for 'X.' Try: • Broader search terms • Different time range 

    Related topics [which interests you?]."

  • ERROR: Ambiguous request

    "I can interpret 'climate' as: • Climate change policy • Business climate 

    Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)  

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

  • TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

  • TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

  1.  Quick reply buttons vs. open input


    Problem:
    When to guide the user vs. let them type freely?


    Solution: Use buttons after AI asks a question, open input for user-initiated


    Outcome:

    AI asks clarification → Show buttons:

    "Which time period? [Past week] [Past month] [Past year] [All time]."

    User asks a new question → Open input field


Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

  • HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"

  • MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"

  • LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"

  • NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

  • ERROR: No results found  

    "No recent results for 'X.' Try: • Broader search terms • Different time range 

    Related topics [which interests you?]."

  • ERROR: Ambiguous request

    "I can interpret 'climate' as: • Climate change policy • Business climate 

    Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)  

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

  • TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

  • TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

  1.  Quick reply buttons vs. open input


    Problem:
    When to guide the user vs. let them type freely?


    Solution: Use buttons after AI asks a question, open input for user-initiated


    Outcome:

    AI asks clarification → Show buttons:

    "Which time period? [Past week] [Past month] [Past year] [All time]."

    User asks a new question → Open input field


Conversation Evolution

Context

Standard AI

Maven

Vague Queries

I'll look into Apple for you. Here is a general overview of the company...

I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?

Confidence Signaling

The tax laws for 2026 state that capital gains will increase to 22%...

Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.

Source Attribution

According to research, sodium-ion batteries are cheaper but heavier.

Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]

Error Recovery

Server Error. Please try again later.

I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.

Response Length

A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...

AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Conversation Evolution

Context

Standard AI

Maven

Vague Queries

I'll look into Apple for you. Here is a general overview of the company...

I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?

Confidence Signaling

The tax laws for 2026 state that capital gains will increase to 22%...

Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.

Source Attribution

According to research, sodium-ion batteries are cheaper but heavier.

Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]

Error Recovery

Server Error. Please try again later.

I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.

Response Length

A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...

AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Conversation Evolution

Context

Standard AI

Maven

Vague Queries

I'll look into Apple for you. Here is a general overview of the company...

I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?

Confidence Signaling

The tax laws for 2026 state that capital gains will increase to 22%...

Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.

Source Attribution

According to research, sodium-ion batteries are cheaper but heavier.

Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]

Error Recovery

Server Error. Please try again later.

I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.

Response Length

A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...

AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Conversation Evolution

Context

Standard AI

Maven

Vague Queries

I'll look into Apple for you. Here is a general overview of the company...

I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?

Confidence Signaling

The tax laws for 2026 state that capital gains will increase to 22%...

Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.

Source Attribution

According to research, sodium-ion batteries are cheaper but heavier.

Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]

Error Recovery

Server Error. Please try again later.

I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.

Response Length

A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...

AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Quantified Impact

  • 92% Satisfaction

  • 78% Completion

  • +88% Trust

Key Findings

  • Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.

  • Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.

  • Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.

  • Transparency vs Trust: Showing reasoning mattered more than being right 100%.  Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.

      Transparency → trust → sustained usage.

Quantified Impact

  • 92% Satisfaction

  • 78% Completion

  • +88% Trust

Key Findings

  • Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.

  • Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.

  • Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.

  • Transparency vs Trust: Showing reasoning mattered more than being right 100%.  Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.

      Transparency → trust → sustained usage.

Quantified Impact

  • 92% Satisfaction

  • 78% Completion

  • +88% Trust

Key Findings

  • Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.

  • Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.

  • Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.

  • Transparency vs Trust: Showing reasoning mattered more than being right 100%.  Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.

      Transparency → trust → sustained usage.

Quantified Impact

  • 92% Satisfaction

  • 78% Completion

  • +88% Trust

Key Findings

  • Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.

  • Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.

  • Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.

  • Transparency vs Trust: Showing reasoning mattered more than being right 100%.  Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.

      Transparency → trust → sustained usage.

Create a free website with Framer, the website builder loved by startups, designers and agencies.