Project Details
Maven: Mobile AI Research Assistant - Conversation Design Case study
Maven: Mobile AI Research Assistant - Conversation Design Case study
Maven: Mobile AI Research Assistant - Conversation Design Case study
Maven: Mobile AI Research Assistant - Conversation Design Case study

Project Overview
Maven is a mobile AI research assistant designed to help knowledge workers find and synthesize information. The project focused on establishing three core principles: transparency over perfection, user agency, and confidence calibration. This study documents the transition from a "black box" assistant to a "helpful colleague" interface.
Designing trustworthy AI for mobile is hard. Users don't trust AI that:
- Makes things up when uncertain
- Hides its reasoning process
- Can't recover from errors gracefully
Research findings:
• 67% abandon AI tools after one "hallucination" or wrong answer
• 73% don't trust AI without source attribution
• Mobile users spend an average of 12 seconds per interaction
• 84% prefer "I don't know" over confident, wrong answers
User quote: "I need to know WHY the AI suggested this and WHERE it found it. On mobile, I need this info fast."
Client:
Client:
Maven
Maven
My Role:
My Role:
Lead Content Designer/UX Researcher
Lead Content Designer/UX Researcher
Year:
Year:
2025
2025
Services:
Services:
Conversation Design/UX Writing
Conversation Design/UX Writing
THE CHALLENGE
Most AI assistants operate as opaque systems, providing answers without reasoning or citations. On mobile, these issues are exacerbated by:
Mobile Constraints: Providing deep research without causing "Scroll Fatigue" in a 5.5" viewport.
Trust Deficit: The difficulty of verifying claims when sources are buried behind multiple taps.
Voice Tone: Moving away from a servile assistant tone to a collaborative, professional "colleague" voice.



Design Principles
My approach prioritized a "Transparency-First" philosophy, organized around four pillars:
Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.
Design Principles
My approach prioritized a "Transparency-First" philosophy, organized around four pillars:
Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.
Design Principles
My approach prioritized a "Transparency-First" philosophy, organized around four pillars:
Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.
Design Principles
My approach prioritized a "Transparency-First" philosophy, organized around four pillars:
Graceful Uncertainty: Admitting limitations and offering alternatives rather than hallucinating.
User Agency: Ensuring the AI assists but never assumes actions on the user's behalf.
Progressive Disclosure: Delivering a high-level "Result → Source → Method" hierarchy with details on demand.
Helpful Colleague Tone: Adopting a professional, synthesis-oriented voice limited to 50-word max responses.



Key Tactical Decisions
1. Confidence Calibration
Problem: Users need to know how reliable information is
Solution: Four-tier confidence system with specific phrasing
HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"
Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.
2. Resilient Recovery
Problem: AI failures destroy trust if not handled well
Solution: Context-specific recovery with alternatives
ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."
ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"
Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.
3. The 50-Word Constraint
Problem: Mobile users scan. Long AI responses get ignored.
Solution: Strict 50-word limit + progressive disclosure
Structure:
1. Answer (1-2 sentences, ~20 words)
2. Source attribution (1 line, ~10 words)
3. Action/follow-up (1 question, ~10 words)
4. [See details] button for expansion
Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.
4. Integrated Source Chips
Problem: Mobile screens can't show 5 source links without clutter
Solution: Tiered disclosure system
TIER 1 - Inline mention: "According to Reuters and Bloomberg..."
TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]
Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.
Quick reply buttons vs. open input
Problem: When to guide the user vs. let them type freely?Solution: Use buttons after AI asks a question, open input for user-initiated
Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field
Key Tactical Decisions
1. Confidence Calibration
Problem: Users need to know how reliable information is
Solution: Four-tier confidence system with specific phrasing
HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"
Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.
2. Resilient Recovery
Problem: AI failures destroy trust if not handled well
Solution: Context-specific recovery with alternatives
ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."
ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"
Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.
3. The 50-Word Constraint
Problem: Mobile users scan. Long AI responses get ignored.
Solution: Strict 50-word limit + progressive disclosure
Structure:
1. Answer (1-2 sentences, ~20 words)
2. Source attribution (1 line, ~10 words)
3. Action/follow-up (1 question, ~10 words)
4. [See details] button for expansion
Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.
4. Integrated Source Chips
Problem: Mobile screens can't show 5 source links without clutter
Solution: Tiered disclosure system
TIER 1 - Inline mention: "According to Reuters and Bloomberg..."
TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]
Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.
Quick reply buttons vs. open input
Problem: When to guide the user vs. let them type freely?Solution: Use buttons after AI asks a question, open input for user-initiated
Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field
Key Tactical Decisions
1. Confidence Calibration
Problem: Users need to know how reliable information is
Solution: Four-tier confidence system with specific phrasing
HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"
Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.
2. Resilient Recovery
Problem: AI failures destroy trust if not handled well
Solution: Context-specific recovery with alternatives
ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."
ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"
Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.
3. The 50-Word Constraint
Problem: Mobile users scan. Long AI responses get ignored.
Solution: Strict 50-word limit + progressive disclosure
Structure:
1. Answer (1-2 sentences, ~20 words)
2. Source attribution (1 line, ~10 words)
3. Action/follow-up (1 question, ~10 words)
4. [See details] button for expansion
Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.
4. Integrated Source Chips
Problem: Mobile screens can't show 5 source links without clutter
Solution: Tiered disclosure system
TIER 1 - Inline mention: "According to Reuters and Bloomberg..."
TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]
Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.
Quick reply buttons vs. open input
Problem: When to guide the user vs. let them type freely?Solution: Use buttons after AI asks a question, open input for user-initiated
Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field
Key Tactical Decisions
1. Confidence Calibration
Problem: Users need to know how reliable information is
Solution: Four-tier confidence system with specific phrasing
HIGH CONFIDENCE (3+ consistent sources): "Found 5 sources confirming: [answer]. [Source chips]"
MEDIUM CONFIDENCE (2 sources or one authoritative): "Based on 2 sources, it appears [answer]. Want more verification?"
LOW CONFIDENCE (1 source or conflicting info): "I found this in one source, but couldn't verify elsewhere: [answer]"
NO CONFIDENCE: "I couldn't find reliable information on this. Try: [alternative approaches]"
Outcome: I created a 4-tier system that tells users exactly how certain Maven is. "Found 5 sources confirming..." vs "I couldn't verify this anywhere." Users trust AI that admits what it doesn't know—trust in responses: +47%.
2. Resilient Recovery
Problem: AI failures destroy trust if not handled well
Solution: Context-specific recovery with alternatives
ERROR: No results found
"No recent results for 'X.' Try: • Broader search terms • Different time range
Related topics [which interests you?]."
ERROR: Ambiguous request
"I can interpret 'climate' as: • Climate change policy • Business climate
Weather patterns. Which one? [buttons]"
Outcome: Eliminated dead-end error states by providing cached results or scheduled follow-up notifications during network failures.
3. The 50-Word Constraint
Problem: Mobile users scan. Long AI responses get ignored.
Solution: Strict 50-word limit + progressive disclosure
Structure:
1. Answer (1-2 sentences, ~20 words)
2. Source attribution (1 line, ~10 words)
3. Action/follow-up (1 question, ~10 words)
4. [See details] button for expansion
Outcome: Forced synthesis at the top level to maintain mobile readability, pushing long-form data to secondary expandable views.
4. Integrated Source Chips
Problem: Mobile screens can't show 5 source links without clutter
Solution: Tiered disclosure system
TIER 1 - Inline mention: "According to Reuters and Bloomberg..."
TIER 2 - Source chips (tap to expand): [Reuters] [Bloomberg] [See 3 more sources]
Outcome: Designed citations to offer credibility metrics on one tap and the full original source on two taps.
Quick reply buttons vs. open input
Problem: When to guide the user vs. let them type freely?Solution: Use buttons after AI asks a question, open input for user-initiated
Outcome:
AI asks clarification → Show buttons:
"Which time period? [Past week] [Past month] [Past year] [All time]."
User asks a new question → Open input field



Conversation Evolution
Context | Standard AI | Maven |
|---|---|---|
Vague Queries | I'll look into Apple for you. Here is a general overview of the company... | I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize? |
Confidence Signaling | The tax laws for 2026 state that capital gains will increase to 22%... | Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20. |
Source Attribution | According to research, sodium-ion batteries are cheaper but heavier. | Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026] |
Error Recovery | Server Error. Please try again later. | I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday. |
Response Length | A long, 300-word essay about the history of artificial intelligence from 1950 to the present day... | AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More] |
Conversation Evolution
Context | Standard AI | Maven |
|---|---|---|
Vague Queries | I'll look into Apple for you. Here is a general overview of the company... | I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize? |
Confidence Signaling | The tax laws for 2026 state that capital gains will increase to 22%... | Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20. |
Source Attribution | According to research, sodium-ion batteries are cheaper but heavier. | Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026] |
Error Recovery | Server Error. Please try again later. | I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday. |
Response Length | A long, 300-word essay about the history of artificial intelligence from 1950 to the present day... | AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More] |
Conversation Evolution
Context | Standard AI | Maven |
|---|---|---|
Vague Queries | I'll look into Apple for you. Here is a general overview of the company... | I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize? |
Confidence Signaling | The tax laws for 2026 state that capital gains will increase to 22%... | Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20. |
Source Attribution | According to research, sodium-ion batteries are cheaper but heavier. | Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026] |
Error Recovery | Server Error. Please try again later. | I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday. |
Response Length | A long, 300-word essay about the history of artificial intelligence from 1950 to the present day... | AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More] |
Conversation Evolution
Context | Standard AI | Maven |
|---|---|---|
Vague Queries | I'll look into Apple for you. Here is a general overview of the company... | I found multiple angles for Apple: Q4 Financials, Stock Trends, or Hardware Specs. Which should I prioritize? |
Confidence Signaling | The tax laws for 2026 state that capital gains will increase to 22%... | Based on 2 legislative drafts, it appears capital gains may rise. I'm 70% certain; wait for the final vote on Feb 20. |
Source Attribution | According to research, sodium-ion batteries are cheaper but heavier. | Sodium-ion batteries cost 30% less but weigh 20% more. [Source: Bloomberg NEF 2026] |
Error Recovery | Server Error. Please try again later. | I've hit a network wall. I can't get live data, but I can show you our cached results from yesterday. |
Response Length | A long, 300-word essay about the history of artificial intelligence from 1950 to the present day... | AI research shifted from logic-based to neural networks. Here's the 3-bullet summary of the 2026 impact. [See More] |






Quantified Impact
92% Satisfaction
78% Completion
+88% Trust
Key Findings
Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.
Quantified Impact
92% Satisfaction
78% Completion
+88% Trust
Key Findings
Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.
Quantified Impact
92% Satisfaction
78% Completion
+88% Trust
Key Findings
Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.
Quantified Impact
92% Satisfaction
78% Completion
+88% Trust
Key Findings
Accountability builds trust: Users are more forgiving of system limitations when the AI is transparent about its reasoning and source data.
Brevity equals authority: The ability to synthesize complex data into clear, 50-word responses on mobile is perceived as a higher-tier capability than long-form generation.
Implicit vs. Explicit Agency: Mobile users prefer an assistant that suggests the next best action via buttons/cards rather than making assumptions in the text block.
Transparency vs Trust: Showing reasoning mattered more than being right 100%. Users who saw "Based on 2 sources..." trusted Maven even when the answer was uncertain.
Transparency → trust → sustained usage.





