Project Details

Maven: Mobile AI Research Assistant - Conversation Design Case study

Project Overview

Maven is a mobile AI research assistant designed to help knowledge workers find and synthesize information. The project focused on establishing three core principles: transparency over perfection, user agency, and confidence calibration. This study documents the transition from a "black box" assistant to a "helpful colleague" interface.

Designing trustworthy AI for mobile is hard. Users don't trust AI that:

- Makes things up when uncertain

- Hides its reasoning process

- Can't recover from errors gracefully

Research findings:

• 67% abandon AI tools after one "hallucination" or wrong answer

• 73% don't trust AI without source attribution

• Mobile users spend an average of 12 seconds per interaction

• 84% prefer "I don't know" over confident, wrong answers

User quote: "I need to know WHY the AI suggested this and WHERE it found it. On mobile, I need this info fast."

Client:

Maven

My Role:

Lead Content Designer/UX Researcher

Year:

2025

Services:

Conversation Design/UX Writing

THE CHALLENGE

Most AI assistants operate as opaque systems, providing answers without reasoning or citations. On mobile, these issues are exacerbated by:

Mobile Constraints: Providing deep research without causing "Scroll Fatigue" in a 5.5" viewport.
Trust Deficit: The difficulty of verifying claims when sources are buried behind multiple taps.
Voice Tone: Moving away from a servile assistant tone to a collaborative, professional "colleague" voice.

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

Design Principles

My approach prioritized a "Transparency-First" philosophy, organized around four pillars:

Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.

Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."

ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

Quick reply buttons vs. open input

Problem: When to guide the user vs. let them type freely?

Solution: Use buttons after AI asks a question, open input for user-initiated

Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field

Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."

ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

Quick reply buttons vs. open input

Problem: When to guide the user vs. let them type freely?

Solution: Use buttons after AI asks a question, open input for user-initiated

Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field

Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."

ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

Quick reply buttons vs. open input

Problem: When to guide the user vs. let them type freely?

Solution: Use buttons after AI asks a question, open input for user-initiated

Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field

Key Tactical Decisions

1. Confidence Calibration

Problem: Users need to know how reliable information is

Solution: Four-tier confidence system with specific phrasing

HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"

2. Resilient Recovery

Problem: AI failures destroy trust if not handled well

Solution: Context-specific recovery with alternatives

ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."

ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"

Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.

3. The 50-Word Constraint

Problem: Mobile users scan. Long AI responses get ignored.

Solution: Strict 50-word limit + progressive disclosure

Structure:

1. Answer (1-2 sentences, ~20 words)

2. Source attribution (1 line, ~10 words)

3. Action/follow-up (1 question, ~10 words)

4. [See details] button for expansion

Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.

4. Integrated Source Chips

Problem: Mobile screens can't show 5 source links without clutter

Solution: Tiered disclosure system

TIER 1 - Inline mention: "According to Reuters and Bloomberg..."

TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]

Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.

Quick reply buttons vs. open input

Problem: When to guide the user vs. let them type freely?

Solution: Use buttons after AI asks a question, open input for user-initiated

Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field

Conversation Evolution

Context	Standard AI	Maven
Vague Queries	I'll look into Apple for you. Here is a general overview of the company...	I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?
Confidence Signaling	The tax laws for 2026 state that capital gains will increase to 22%...	Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.
Source Attribution	According to research, sodium-ion batteries are cheaper but heavier.	Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]
Error Recovery	Server Error. Please try again later.	I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.
Response Length	A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...	AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Conversation Evolution

Context	Standard AI	Maven
Vague Queries	I'll look into Apple for you. Here is a general overview of the company...	I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?
Confidence Signaling	The tax laws for 2026 state that capital gains will increase to 22%...	Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.
Source Attribution	According to research, sodium-ion batteries are cheaper but heavier.	Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]
Error Recovery	Server Error. Please try again later.	I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.
Response Length	A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...	AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Conversation Evolution

Context	Standard AI	Maven
Vague Queries	I'll look into Apple for you. Here is a general overview of the company...	I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?
Confidence Signaling	The tax laws for 2026 state that capital gains will increase to 22%...	Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.
Source Attribution	According to research, sodium-ion batteries are cheaper but heavier.	Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]
Error Recovery	Server Error. Please try again later.	I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.
Response Length	A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...	AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Conversation Evolution

Context	Standard AI	Maven
Vague Queries	I'll look into Apple for you. Here is a general overview of the company...	I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize?
Confidence Signaling	The tax laws for 2026 state that capital gains will increase to 22%...	Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20.
Source Attribution	According to research, sodium-ion batteries are cheaper but heavier.	Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026]
Error Recovery	Server Error. Please try again later.	I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday.
Response Length	A long, 300-word essay about the history of artificial intelligence from 1950 to the present day...	AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More]

Quantified Impact

92% Satisfaction
78% Completion
+88% Trust

Key Findings

Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.

Quantified Impact

92% Satisfaction
78% Completion
+88% Trust

Key Findings

Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.

Quantified Impact

92% Satisfaction
78% Completion
+88% Trust

Key Findings

Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.

Quantified Impact

92% Satisfaction
78% Completion
+88% Trust

Key Findings

Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.

Project Details

Project Overview

THE CHALLENGE

Design Principles

Design Principles

Design Principles

Design Principles

Key Tactical Decisions

1. Confidence Calibration

2. Resilient Recovery

3. The 50-Word Constraint

4. Integrated Source Chips

Quick reply buttons vs. open input

Key Tactical Decisions

1. Confidence Calibration

2. Resilient Recovery

3. The 50-Word Constraint

4. Integrated Source Chips

Quick reply buttons vs. open input

Key Tactical Decisions

1. Confidence Calibration

2. Resilient Recovery

3. The 50-Word Constraint

4. Integrated Source Chips

Quick reply buttons vs. open input

Key Tactical Decisions

1. Confidence Calibration

2. Resilient Recovery

3. The 50-Word Constraint

4. Integrated Source Chips

Quick reply buttons vs. open input

Conversation Evolution

Conversation Evolution

Conversation Evolution

Conversation Evolution

Quantified Impact

Key Findings

Quantified Impact

Key Findings

Quantified Impact

Key Findings

Quantified Impact

Key Findings