case study 02 — UX + Full-Stack
Agentic Search: AI-Powered Summaries for the Webex Help Center
With 3,000+ articles across dozens of products and user roles, Webex help center search was giving users answers they didn't trust enough to click — so I designed and prototyped an AI layer that could earn that trust.
The Situation
help.webex.com is the knowledge base for Cisco's entire Webex product suite — Meetings, Calling, Contact Center, AI Agent, Devices, and more. It serves IT administrators, partners, and end users across a global audience. At 3,000+ articles and growing, the scale of the content library had become both an asset and a problem.
The existing search experience was traditional lexical search: type a query, get a ranked list of article links. The click-through rate from search results to article pages hovered around 30%. That means 70% of users who searched for something either couldn't identify the right result, didn't trust the results enough to commit to clicking through, or were overwhelmed by the volume of returns. For IT administrators trying to resolve configuration issues under time pressure, that friction has real cost — in user frustration and in support ticket escalation.
The broader context made this worse: Webex's product suite was expanding rapidly, with new AI-powered products (AI Agent, AI Assistant) adding complexity that documentation had to keep pace with. The people searching the help center weren't just looking for how-tos — they were trying to understand new product categories that didn't exist a year ago.
The strategic framing I developed for stakeholders: IT administrators face increasing complexity — new products, new AI capabilities, new configuration surfaces. Generative AI can augment their processes by providing faster, more contextual information where and when users need it most. But the opportunity only works if AI outputs are grounded in trustworthy, well-maintained documentation. That dependency — AI quality being downstream of content quality — shaped how I positioned this project not just as a search feature, but as a reason the content team's work matters more than ever.
My Role & Scope
I led this initiative end-to-end across multiple workstreams over roughly three quarters (FY25 Q2–Q4): competitive research, interaction design, system prompt definition, full-stack prototype development, usability testing, and stakeholder presentations. I collaborated with an engineering partner on API integration research, and worked with the broader content ops team on understanding search behavior through user telemetry service analytics.
The Approach
Competitive research and the intent framework
The work started with competitive research into how other products handle AI-assisted search — how they surface summaries, cite sources, handle ambiguity, and manage the transition between AI-generated content and traditional results. I studied patterns across consumer and enterprise products to understand what earned trust versus what felt intrusive or unreliable.
From this research, I developed a user intent framework that became the conceptual foundation for the entire project. Not every search query has the same cognitive intent, and the assistant's behavior should reflect that:
| Intent | What the user needs | Optimal behavior |
|---|---|---|
| Searching | Find specific information | Surface relevant results with AI summary |
| Navigating | Find a location or path | Guide to the right page efficiently |
| Deep Diving | Understand a complex concept | Provide in-context analysis and explanation |
| Escalating | Resolve a blocking issue | Detect frustration, offer human support |
This framework informed every subsequent design decision — from how the AI panel behaves differently in search contexts versus article pages, to how the system routes queries to different LLM services based on complexity and cost.
Live prototype — search results page with AI summary
AI-generated summary alongside traditional search results, with source citations and follow-up suggestions
Designing for trust through transparency
The core design challenge wasn't "how do we add AI to search" — it was "how do we add AI to search without undermining trust." Help center users are often troubleshooting production issues. They need to know why they should trust a response, not just receive one.
Four design principles guided the interaction patterns:
- 01Transparent status updates and source disclosures — The AI assistant shows its working state and always cites where information comes from. No black-box answers.
- 02Augment, don't obscure — The AI panel is dismissible and sits alongside traditional search results, never replacing them. Users remain in control of their information-seeking strategy.
- 03Follow-up queries drive deeper exploration — AI-generated follow-up search suggestions help users discover related contexts they wouldn't have thought to search for.
- 04Clear pathways to source material — Annotated references in AI responses link directly to the specific articles, so users can verify and go deeper.
Principle 01 — status updates
Streaming status communicates that the system is actively working
Principle 03 — follow-up queries
Suggested follow-up queries surface related contexts the user may not have thought to explore
Principle 04 — annotated sources
Every AI response links back to the specific help center articles it drew from
Article page — summarize action
The AI assistant also operates on individual article pages, not just search results
Usability testing: five interaction patterns
I designed and tested five distinct interaction patterns for how the AI assistant integrates with the search experience. Each pattern represented a different philosophy about user control, discoverability, and the relationship between AI-generated content and traditional results:
- Pattern A — Single "browse for me" button: A single button in the search input triggers the AI panel. Every query gets a summary; there's no way to opt out. (Reference: Arc browser, Vercel docs)
- Pattern B — Dual "browse for me" dismissible button: A dual button lets users toggle the AI panel on or off. LLM search activates or deactivates in tandem. (Reference: Webex Meetings dual pill buttons)
- Pattern C — "AI-Powered" dismissible input chip: A chip beneath the search input indicates AI capability. Dismissing the panel removes the chip — explicit opt-in/opt-out. (Reference: Figma, Rocket Software)
- Pattern D — "AI-Powered" search input toggle: A dual toggle in the search input switches between AI-powered and conventional modes.
- Pattern E — AI assistant drawer: No button — just persistent placeholder text indicating "AI-Powered search." The AI panel appears by default on search results; when dismissed, a tab drawer anchored to the right edge lets users reopen it. (Reference: AWS support)
Pattern E — dismissible drawer
The winning pattern: AI by default, always dismissible, drawer affordance for reopen
Pattern E — dismiss interaction
Short-form view of the dismiss and reopen flow
Pattern E won. The drawer-based approach succeeded because it delivered AI summaries by default — no extra clicks for first-time discovery — while keeping the dismiss-and-reopen affordance for users who wanted traditional results. It was the pattern that best balanced the "augment, don't obscure" principle: AI content was present and useful without requiring users to opt into a new interaction model.
The testing also validated a dual-intent navigation approach: for search-scoped queries, auto-navigation to results felt natural (ingrained user behavior), while for article-scoped queries, users preferred staying on the page with contextual answers. Over 60% of test participants could articulate why the system behaved differently in each context — evidence that intent-based pattern variance felt coherent rather than inconsistent.
System prompt architecture
The AI assistant needed a consistent voice across every surface it appeared on — search results, article pages, and eventually other Webex web properties. But the contexts are fundamentally different. A system prompt that works well for summarizing search results doesn't work for answering questions about a specific article.
I defined a system prompt template guide that established:
- A consistent persona and tone for the Cisco AI Assistant across all surfaces
- Context-specific prompt variations for search-scoped vs. article-scoped queries
- Guard rails for response formatting, source citation, and hallucination prevention
- Role-aware adjustments for administrators, partners, and end users
The goal was ensuring a singular entity — one assistant that users recognize and trust — even as the underlying behavior adapts to context. This template became a shared reference for anyone integrating the assistant into new surfaces.
Enabling the content team as an AI input layer
A piece of this project that doesn't look like traditional design work but was critical: I framed the content authoring team's existing work — writing documentation, adding metadata, refining annotations — as the foundational input layer for AI-powered features. The AI assistant can only surface trustworthy responses if the knowledge base content is accurate, well-structured, and properly annotated with metadata for user role specificity.
This reframing had practical consequences. It justified investment in content quality and metadata strategies as prerequisites for AI features, not afterthoughts. And it positioned the team's collaboration with data analysts and engineers around shared metadata standards as directly enabling the AI assistant's reliability — which it is.
Chat with an article — interaction
Users can ask questions scoped to the article they're currently reading
Chat with doc — alternate take
Multi-turn conversation flow within a single article context
Scroll to relevant section
The AI response can navigate the user directly to the relevant section of a long article
The Build
Full-stack prototype
To validate the interaction designs and assist engineering's technical feasibility research, I built a working full-stack prototype — not a Figma mockup, but a functional application that connected to enterprise AI services and returned real responses.
- Frontend: React with Momentum Design System components, react-markdown for response rendering, react-syntax-highlighter for code blocks
- Backend (primary): Python Flask server handling authentication, streaming, and RAG context injection
- Backend (alternative): Express.js implementation for comparison
- AI Service: Enterprise LLM proxy routing to GPT-4o-mini
- Document retrieval: Static Elasticsearch export processed with TF-IDF vectorization (scikit-learn) and cosine similarity for query-relevant document matching
- Streaming: Server-Sent Events with character-by-character pacing for a natural reading experience
The RAG implementation was the critical piece. Documents from the help center's Elasticsearch index were vectorized at server startup. When a user submitted a query, the system found the most relevant documents via cosine similarity, injected them as context into the AI prompt, and the response included source attribution with document URLs. The AI wasn't hallucinating answers — it was synthesizing from the actual knowledge base, and showing its sources.
# Simplified RAG flow
# 1. Query → TF-IDF similarity against document corpus
# 2. Top documents injected as context in system prompt
# 3. Response streamed character-by-character with source URLs
# 4. Frontend renders markdown with clickable citationsI deliberately chose character-by-character streaming over chunk-based delivery because the pacing communicates that the system is working, not just dumping text — the same typewriter effect users recognize from consumer LLM products, but tuned for a help center context where readability matters more than speed.
The Wikipedia prototype
Before building the Enterprise LLM proxy service prototype, I built and self-hosted an "agentic Wikipedia search" application on my home Linux web server using Wikipedia's open API and consumer OpenAI API keys. This wasn't a Cisco project — it was exploratory work on my own time to understand LLM RAG architecture hands-on.
The key insight it validated: context window management is the difference between a useful assistant and a hallucination machine. In the Wikipedia prototype, a user could only get answers grounded in the specific article they were viewing — ask a non-Corgi question on the Corgi article page, and the system would acknowledge it couldn't help rather than fabricate an answer. This RAG constraint pattern directly informed the help center implementation, where responses are bounded to the knowledge base content.
Success metrics framework
I defined a comprehensive KPI framework across four categories to measure the feature's impact:
- Efficiency: Task completion time (target: 33% reduction), interaction steps (target: 37% reduction), support ticket deflection (target: 15% reduction)
- Quality: Task success rate (target: 85%, up from 70%), response accuracy (target: 90%+), SUS score (target: 75+)
- Perception: Trust score (target: 4/5), helpfulness rating (target: 4.2/5), control perception (target: 4.3/5)
- Engagement: Adoption rate (target: 40% within first month), return rate (target: 60% of adopters)
Outcomes
- Live since September 2025 — AI-powered search summaries are in production on help.webex.com
- +22.1% quarter-over-quarter increase in search submissions (882k Aug–Nov vs. 723k May–Aug, user telemetry service, January 2026) — users are engaging with search significantly more since the AI features launched
- Documented codebase became an engineering resource — Engineering partners cited the prototype's documented examples as instrumental in their own API research, particularly in understanding how back-end data processing and chunking directly impacts the front-end user experience
- Established the architectural pattern for a unified sitewide AI assistant with intent-based routing — the framework I designed is the foundation for future phases including advanced context window management and cross-device session persistence
Reflection
The most interesting tension in this project was between moving fast on a prototype and thinking systemically about what a help center AI assistant should be. It would have been easy to ship a basic "ask AI" textbox and call it done. But the intent framework, the trust-through-transparency principles, and the cost-tiering architecture are what make this a sustainable feature rather than a demo. Building the prototype myself — in Python and React, not Figma — meant I could hand engineering not just a design spec but a working reference implementation with documented code. That's a different kind of design artifact, and for LLM-powered features where the UX is inseparable from the technical behavior, it was the right one.















