case study 02 — UX + Full-Stack

Agentic Search: AI-Powered Summaries for the Webex Help Center

With 3,000+ articles across dozens of products and user roles, Webex help center search was giving users answers they didn't trust enough to click — so I designed and prototyped an AI layer that could earn that trust.

RoleLead — Design + Dev

DurationFY25 Q2–Q4 (~3 quarters)

StackReact · Python Flask · GPT-4o · Enterprise LLM Proxy

OutcomeLive Sep 2025 · +22.1% search engagement

The Situation

help.webex.com is the knowledge base for Cisco's entire Webex product suite — Meetings, Calling, Contact Center, AI Agent, Devices, and more. It serves IT administrators, partners, and end users across a global audience. At 3,000+ articles and growing, the scale of the content library had become both an asset and a problem.

The existing search experience was traditional lexical search: type a query, get a ranked list of article links. The click-through rate from search results to article pages hovered around 30%. That means 70% of users who searched for something either couldn't identify the right result, didn't trust the results enough to commit to clicking through, or were overwhelmed by the volume of returns. For IT administrators trying to resolve configuration issues under time pressure, that friction has real cost — in user frustration and in support ticket escalation.

The broader context made this worse: Webex's product suite was expanding rapidly, with new AI-powered products (AI Agent, AI Assistant) adding complexity that documentation had to keep pace with. The people searching the help center weren't just looking for how-tos — they were trying to understand new product categories that didn't exist a year ago.

The strategic framing I developed for stakeholders: IT administrators face increasing complexity — new products, new AI capabilities, new configuration surfaces. Generative AI can augment their processes by providing faster, more contextual information where and when users need it most. But the opportunity only works if AI outputs are grounded in trustworthy, well-maintained documentation. That dependency — AI quality being downstream of content quality — shaped how I positioned this project not just as a search feature, but as a reason the content team's work matters more than ever.

Stakeholder presentation — context slides

Why AI for Webex help center — continued

My Role & Scope

I led this initiative end-to-end across multiple workstreams over roughly three quarters (FY25 Q2–Q4): competitive research, interaction design, system prompt definition, full-stack prototype development, usability testing, and stakeholder presentations. I collaborated with an engineering partner on API integration research, and worked with the broader content ops team on understanding search behavior through user telemetry service analytics.

The Approach

Competitive research and the intent framework

The work started with competitive research into how other products handle AI-assisted search — how they surface summaries, cite sources, handle ambiguity, and manage the transition between AI-generated content and traditional results. I studied patterns across consumer and enterprise products to understand what earned trust versus what felt intrusive or unreliable.

From this research, I developed a user intent framework that became the conceptual foundation for the entire project. Not every search query has the same cognitive intent, and the assistant's behavior should reflect that:

Intent	What the user needs	Optimal behavior
Searching	Find specific information	Surface relevant results with AI summary
Navigating	Find a location or path	Guide to the right page efficiently
Deep Diving	Understand a complex concept	Provide in-context analysis and explanation
Escalating	Resolve a blocking issue	Detect frustration, offer human support

This framework informed every subsequent design decision — from how the AI panel behaves differently in search contexts versus article pages, to how the system routes queries to different LLM services based on complexity and cost.

Live prototype — search results page with AI summary

AI-generated summary alongside traditional search results, with source citations and follow-up suggestions

Designing for trust through transparency

The core design challenge wasn't "how do we add AI to search" — it was "how do we add AI to search without undermining trust." Help center users are often troubleshooting production issues. They need to know why they should trust a response, not just receive one.

Four design principles guided the interaction patterns:

01Transparent status updates and source disclosures — The AI assistant shows its working state and always cites where information comes from. No black-box answers.
02Augment, don't obscure — The AI panel is dismissible and sits alongside traditional search results, never replacing them. Users remain in control of their information-seeking strategy.
03Follow-up queries drive deeper exploration — AI-generated follow-up search suggestions help users discover related contexts they wouldn't have thought to search for.
04Clear pathways to source material — Annotated references in AI responses link directly to the specific articles, so users can verify and go deeper.

Principle 01 — status updates

Streaming status communicates that the system is actively working

Principle 03 — follow-up queries

Suggested follow-up queries surface related contexts the user may not have thought to explore

Principle 04 — annotated sources

Every AI response links back to the specific help center articles it drew from

Article page — summarize action

The AI assistant also operates on individual article pages, not just search results

Usability testing: five interaction patterns

I designed and tested five distinct interaction patterns for how the AI assistant integrates with the search experience. Each pattern represented a different philosophy about user control, discoverability, and the relationship between AI-generated content and traditional results:

Pattern A — Single "browse for me" button: A single button in the search input triggers the AI panel. Every query gets a summary; there's no way to opt out. (Reference: Arc browser, Vercel docs)
Pattern B — Dual "browse for me" dismissible button: A dual button lets users toggle the AI panel on or off. LLM search activates or deactivates in tandem. (Reference: Webex Meetings dual pill buttons)
Pattern C — "AI-Powered" dismissible input chip: A chip beneath the search input indicates AI capability. Dismissing the panel removes the chip — explicit opt-in/opt-out. (Reference: Figma, Rocket Software)
Pattern D — "AI-Powered" search input toggle: A dual toggle in the search input switches between AI-powered and conventional modes.
Pattern E — AI assistant drawer: No button — just persistent placeholder text indicating "AI-Powered search." The AI panel appears by default on search results; when dismissed, a tab drawer anchored to the right edge lets users reopen it. (Reference: AWS support)

Pattern E — dismissible drawer

The winning pattern: AI by default, always dismissible, drawer affordance for reopen

Pattern E — dismiss interaction

Short-form view of the dismiss and reopen flow

Pattern E won. The drawer-based approach succeeded because it delivered AI summaries by default — no extra clicks for first-time discovery — while keeping the dismiss-and-reopen affordance for users who wanted traditional results. It was the pattern that best balanced the "augment, don't obscure" principle: AI content was present and useful without requiring users to opt into a new interaction model.

The testing also validated a dual-intent navigation approach: for search-scoped queries, auto-navigation to results felt natural (ingrained user behavior), while for article-scoped queries, users preferred staying on the page with contextual answers. Over 60% of test participants could articulate why the system behaved differently in each context — evidence that intent-based pattern variance felt coherent rather than inconsistent.

LLM RAG search — stakeholder presentation slides

System prompt architecture

The AI assistant needed a consistent voice across every surface it appeared on — search results, article pages, and eventually other Webex web properties. But the contexts are fundamentally different. A system prompt that works well for summarizing search results doesn't work for answering questions about a specific article.

I defined a system prompt template guide that established:

A consistent persona and tone for the Cisco AI Assistant across all surfaces
Context-specific prompt variations for search-scoped vs. article-scoped queries
Guard rails for response formatting, source citation, and hallucination prevention
Role-aware adjustments for administrators, partners, and end users

The goal was ensuring a singular entity — one assistant that users recognize and trust — even as the underlying behavior adapts to context. This template became a shared reference for anyone integrating the assistant into new surfaces.

Enabling the content team as an AI input layer

A piece of this project that doesn't look like traditional design work but was critical: I framed the content authoring team's existing work — writing documentation, adding metadata, refining annotations — as the foundational input layer for AI-powered features. The AI assistant can only surface trustworthy responses if the knowledge base content is accurate, well-structured, and properly annotated with metadata for user role specificity.

This reframing had practical consequences. It justified investment in content quality and metadata strategies as prerequisites for AI features, not afterthoughts. And it positioned the team's collaboration with data analysts and engineers around shared metadata standards as directly enabling the AI assistant's reliability — which it is.

Chat with an article — feature slides

Chat with an article — interaction

Users can ask questions scoped to the article they're currently reading

Chat with doc — alternate take

Multi-turn conversation flow within a single article context

Scroll to relevant section

The AI response can navigate the user directly to the relevant section of a long article

The Build

Full-stack prototype

To validate the interaction designs and assist engineering's technical feasibility research, I built a working full-stack prototype — not a Figma mockup, but a functional application that connected to enterprise AI services and returned real responses.

Frontend: React with Momentum Design System components, react-markdown for response rendering, react-syntax-highlighter for code blocks
Backend (primary): Python Flask server handling authentication, streaming, and RAG context injection
Backend (alternative): Express.js implementation for comparison
AI Service: Enterprise LLM proxy routing to GPT-4o-mini
Document retrieval: Static Elasticsearch export processed with TF-IDF vectorization (scikit-learn) and cosine similarity for query-relevant document matching
Streaming: Server-Sent Events with character-by-character pacing for a natural reading experience

The RAG implementation was the critical piece. Documents from the help center's Elasticsearch index were vectorized at server startup. When a user submitted a query, the system found the most relevant documents via cosine similarity, injected them as context into the AI prompt, and the response included source attribution with document URLs. The AI wasn't hallucinating answers — it was synthesizing from the actual knowledge base, and showing its sources.

# Simplified RAG flow
# 1. Query → TF-IDF similarity against document corpus
# 2. Top documents injected as context in system prompt
# 3. Response streamed character-by-character with source URLs
# 4. Frontend renders markdown with clickable citations

I deliberately chose character-by-character streaming over chunk-based delivery because the pacing communicates that the system is working, not just dumping text — the same typewriter effect users recognize from consumer LLM products, but tuned for a help center context where readability matters more than speed.

The Wikipedia prototype

Before building the Enterprise LLM proxy service prototype, I built and self-hosted an "agentic Wikipedia search" application on my home Linux web server using Wikipedia's open API and consumer OpenAI API keys. This wasn't a Cisco project — it was exploratory work on my own time to understand LLM RAG architecture hands-on.

The key insight it validated: context window management is the difference between a useful assistant and a hallucination machine. In the Wikipedia prototype, a user could only get answers grounded in the specific article they were viewing — ask a non-Corgi question on the Corgi article page, and the system would acknowledge it couldn't help rather than fabricate an answer. This RAG constraint pattern directly informed the help center implementation, where responses are bounded to the knowledge base content.

check it out on github →

Success metrics framework

I defined a comprehensive KPI framework across four categories to measure the feature's impact:

Efficiency: Task completion time (target: 33% reduction), interaction steps (target: 37% reduction), support ticket deflection (target: 15% reduction)
Quality: Task success rate (target: 85%, up from 70%), response accuracy (target: 90%+), SUS score (target: 75+)
Perception: Trust score (target: 4/5), helpfulness rating (target: 4.2/5), control perception (target: 4.3/5)
Engagement: Adoption rate (target: 40% within first month), return rate (target: 60% of adopters)

How we plan to measure success

Outcomes

Live since September 2025 — AI-powered search summaries are in production on help.webex.com
+22.1% quarter-over-quarter increase in search submissions (882k Aug–Nov vs. 723k May–Aug, user telemetry service, January 2026) — users are engaging with search significantly more since the AI features launched
Documented codebase became an engineering resource — Engineering partners cited the prototype's documented examples as instrumental in their own API research, particularly in understanding how back-end data processing and chunking directly impacts the front-end user experience
Established the architectural pattern for a unified sitewide AI assistant with intent-based routing — the framework I designed is the foundation for future phases including advanced context window management and cross-device session persistence

Reflection

The most interesting tension in this project was between moving fast on a prototype and thinking systemically about what a help center AI assistant should be. It would have been easy to ship a basic "ask AI" textbox and call it done. But the intent framework, the trust-through-transparency principles, and the cost-tiering architecture are what make this a sustainable feature rather than a demo. Building the prototype myself — in Python and React, not Figma — meant I could hand engineering not just a design spec but a working reference implementation with documented code. That's a different kind of design artifact, and for LLM-powered features where the UX is inseparable from the technical behavior, it was the right one.

← global nav redesign next: ai agent studio →

Kristian Talley

design engineer