פרק 15: Build -- Customer Support Agent

מה יהיה לך בסוף הפרק הזה

Architecture document מלא -- תרשים מערכת, רשימת components, flow diagrams
Knowledge Base עם vector store -- FAQs, מוצרים, מדיניות, troubleshooting -- עם retrieval pipeline עובד
Router Agent שמסווג פניות ל-4 קטגוריות (FAQ, הזמנות, תלונות, אחר) עם accuracy של 90%+
3 Specialist Agents עם tools וguardrails: FAQ Agent, Order Agent, Complaint Agent
Human Escalation workflow -- סיכום שיחה, יצירת טיקט, העברה חלקה לנציג אנושי
Conversation memory -- זיכרון תוך-שיחתי וחוצה-שיחות, פרופיל לקוח
Testing suite -- 100 שיחות דוגמה, מטריקות accuracy, routing, tone, escalation
Deployed API -- FastAPI עם streaming responses, מוכן לחיבור ל-Slack, אתר, WhatsApp
Deliverable סופי: סוכן תמיכת לקוחות מלא עם router, specialists, RAG, escalation, testing, ו-monitoring

מה תוכלו לעשות אחרי הפרק הזה

תוכלו לתכנן ארכיטקטורת multi-agent לתמיכת לקוחות -- router, specialists, escalation, memory
תוכלו לבנות RAG pipeline מלא -- chunking, embedding, retrieval, reranking -- ולבדוק שהוא מחזיר תוצאות נכונות
תוכלו לממש Router Agent שמסווג פניות בדיוק גבוה עם מודל קטן וזול (Haiku/Flash)
תוכלו לבנות specialist agents עם tools, guardrails, ו-system prompts מותאמים לכל תפקיד
תוכלו להפעיל את הסוכן כ-REST API עם streaming, monitoring, ותמיכה בעברית

לפני שמתחילים

פרקים קודמים: פרק 2 (Architecture), פרק 5 (Claude SDK), פרק 8 (LangGraph), פרק 11 (Tool Use Mastery), פרק 12 (Memory), פרק 13 (Multi-Agent), פרק 14 (HITL/Guardrails)
מה תצטרכו: Python 3.11+ ו/או Node.js 18+, מפתח API (Anthropic / OpenAI), עורך קוד, Docker (אופציונלי)
ידע נדרש: בניית סוכנים עם tool calling, multi-agent coordination, guardrails, RAG בסיסי
זמן משוער: 5-7 שעות (כולל בניית כל ה-components)
עלות API משוערת: $10-25 (embedding, routing, specialist agents, testing)

הפרויקט שלך -- קו אדום לאורך הקורס

בפרק 14 בניתם מערכת בטיחות מלאה -- guardrails, approval workflows, monitoring, ו-audit logging. עכשיו אתם משתמשים בכל מה שלמדתם לאורך הקורס כדי לבנות מערכת אמיתית. זה הפרויקט הראשון מחלק 4 (Projects), והוא משלב: architecture (פרק 2), Claude SDK (פרק 5), LangGraph (פרק 8), tool use (פרק 11), memory (פרק 12), multi-agent (פרק 13), ו-safety (פרק 14). בפרק 16 תבנו Research Agent שמשתמש ב-patterns דומים אבל עם fan-out search ו-report generation.

מילון מונחים -- פרק 15

מונח (English)	עברית	הסבר
Router Agent	סוכן ניתוב	סוכן שמנתח את הפנייה ומחליט לאיזה specialist agent לשלוח אותה. קל, מהיר, וזול
Specialist Agent	סוכן מומחה	סוכן שמתמחה בתחום מסוים (FAQ, הזמנות, תלונות). עם tools ו-system prompt ייעודיים
RAG (Retrieval-Augmented Generation)	יצירה מועשרת באחזור	הדפוס שבו הסוכן מחפש מידע ממסד נתונים לפני שהוא עונה. מונע hallucination
Vector Store	מאגר וקטורים	מסד נתונים שמאחסן embeddings של טקסט ומאפשר חיפוש סמנטי (לפי משמעות, לא מילים)
Embedding	הטמעה / ייצוג וקטורי	המרת טקסט לוקטור מספרי (למשל 1536 מספרים) שמייצג את המשמעות הסמנטית
Chunking	חלוקה לקטעים	פירוק מסמכים ארוכים לקטעים קצרים (200-500 מילים) לפני embedding וחיפוש
Reranking	דירוג מחדש	שלב שני של דירוג תוצאות חיפוש. אחרי ש-vector search מחזיר 20 תוצאות, reranker מדרג אותן מדויק יותר
Escalation	הסלמה / העברה	העברת שיחה מהסוכן האוטומטי לנציג אנושי -- עם סיכום, context, ופרטי לקוח
Handoff	מסירה	הרגע שבו הסוכן מעביר את השיחה לגורם אחר (סוכן אחר או אדם) עם כל הcontext
Ticket	פנייה / טיקט	רשומה שמתעדת בקשת לקוח מתחילתה ועד סגירתה -- כולל סטטוס, היסטוריה, ותוצאה
CSAT (Customer Satisfaction)	שביעות רצון לקוחות	מדד שביעות רצון (1-5 או 1-10) שלקוחות נותנים אחרי אינטראקציה עם התמיכה
Resolution Rate	שיעור פתרון	אחוז הפניות שהסוכן פתר בהצלחה בלי צורך בהסלמה לנציג אנושי
First Response Time	זמן תגובה ראשונית	כמה זמן עובר מרגע הפנייה עד שהלקוח מקבל תשובה ראשונה. סוכן AI: שניות. אנושי: דקות עד שעות
Intent Classification	סיווג כוונה	זיהוי מה הלקוח רוצה -- שאלה על מוצר? בדיקת סטטוס הזמנה? תלונה? בקשה אחרת?
Entity Extraction	חילוץ ישויות	זיהוי פרטים ספציפיים בהודעת הלקוח: מספר הזמנה, שם מוצר, תאריך, סכום
Streaming Response	תגובה בזרימה	שליחת התשובה מילה אחרי מילה (לא מחכים לסוף). משפר UX וזמן תגובה נתפס

סקירת הפרויקט וארכיטקטורה

beginner30 דקותconcept + design

מה אנחנו בונים

סוכן תמיכת לקוחות AI שיכול לטפל ב-80% מהפניות בלי התערבות אנושית. הסוכן עונה על שאלות על מוצרים, בודק סטטוס הזמנות, מעבד החזרות, מטפל בתלונות, ומעביר לנציג אנושי כשצריך -- כל זה תוך שמירה על טון מקצועי, אמפתי, ומדויק.

יכולת	מה הסוכן עושה	מה הסוכן לא עושה
שאלות מוצר	עונה מבסיס הידע, מצטט מקורות	ממציא מידע שלא קיים בבסיס הידע
סטטוס הזמנה	בודק סטטוס, מיקום, זמן הגעה משוער	משנה הזמנה, מבטל בלי אישור
החזרות	פותח בקשת החזרה לפי מדיניות	מעבד החזר כספי (דורש אישור אנושי)
תלונות	מקשיב, מציע פתרונות, מתעד	מתווכח עם הלקוח, עושה חריגות ממדיניות
כללי	עונה בעברית ואנגלית, זוכר שיחות קודמות	מבצע פעולות פיננסיות, ניגש למידע רגיש

ארכיטקטורת המערכת

המערכת בנויה כ-multi-agent system עם דפוס Router + Specialists:

Customer Message
       |
       v
  +-----------+
  |  Router   |  (Haiku/Flash -- fast, cheap)
  |  Agent    |  Classify: FAQ / Order / Complaint / Other
  +-----------+
       |
       +---> FAQ Agent ---------> Knowledge Base (Vector Store)
       |                          search --> rerank --> cite
       |
       +---> Order Agent -------> Order Database (PostgreSQL)
       |                          get_status, track_shipment, initiate_return
       |
       +---> Complaint Agent ---> Complaint System
       |                          log_complaint, offer_discount, escalate
       |
       +---> General Agent -----> Fallback (Sonnet)
       |
       v
  +-----------+
  | Escalation|  (when needed)
  | Handler   |  summarize --> create ticket --> notify human --> handoff
  +-----------+

Tech Stack

Component	טכנולוגיה	למה
Router Agent	Claude Haiku / Gemini Flash	מהיר (~200ms), זול (~$0.001/call), מדויק מספיק לclassification
Specialist Agents	Claude Sonnet / GPT-4o	חכם מספיק לשיחות מורכבות, איזון בין quality ל-cost
Orchestration	Claude Agent SDK + LangGraph	Claude SDK לagents בודדים, LangGraph לflow בין agents
Vector Store	ChromaDB / Pinecone / Qdrant	אחסון embeddings ל-RAG
Database	PostgreSQL / SQLite	הזמנות, לקוחות, היסטוריית שיחות
API	FastAPI (Python) / Express (TS)	REST API עם streaming
Monitoring	Langfuse / LangSmith	tracing, metrics, alerting

עשו עכשיו 10 דקות

צרו תיקיית פרויקט חדשה עם המבנה הבא. התקינו את הדפנדנסיז ווידאו שהכל רץ:

# Project structure
customer-support-agent/
  agents/
    router.py         # Router Agent
    faq.py            # FAQ Specialist
    orders.py         # Order Specialist
    complaints.py     # Complaint Specialist
  tools/
    knowledge_base.py # RAG tools
    order_tools.py    # Order database tools
    complaint_tools.py# Complaint handling tools
  data/
    faqs.json         # FAQ data
    products.json     # Product catalog
    policies.md       # Return/refund policies
    orders.db         # SQLite for orders (mock)
  tests/
    test_conversations.json
    test_runner.py
  server.py           # FastAPI server
  config.py           # Configuration
  requirements.txt

# Install dependencies
pip install anthropic langgraph chromadb fastapi uvicorn
# Or for TypeScript:
npm install @anthropic-ai/sdk @langchain/langgraph chromadb express

עלויות צפויות

Component	עלות לשיחה	הערות
Router (Haiku)	~$0.001	~500 tokens in/out
Specialist (Sonnet)	~$0.02-0.10	תלוי באורך השיחה (2K-10K tokens)
Embedding	~$0.001	embedding של שאילתת חיפוש
RAG retrieval	~$0.005	reranking + context building
סה"כ לשיחה	$0.03-0.12	שיחה ממוצעת: 4-6 turns

בהשוואה: נציג אנושי עולה $3-8 לשיחה (כולל שכר, תשתית, הכשרה). סוכן AI זול פי 30-100, זמין 24/7, ואף פעם לא "ביום חולה."

הקמת בסיס הידע -- Knowledge Base ו-RAG

intermediate45 דקותpractice

בסיס הידע הוא הלב של ה-FAQ Agent. בלעדיו, הסוכן חייב לסמוך על הידע שיש למודל -- מה שאומר hallucinations בלתי נמנעים. עם RAG, הסוכן מחפש מידע ממקורות ספציפיים ומאומתים ומצטט אותם.

שלב 1: הכנת הנתונים

בסיס ידע טוב לתמיכת לקוחות כולל 4 סוגי מסמכים:

סוג מסמך	דוגמאות	כמות טיפוסית
FAQs	שאלות ותשובות נפוצות	50-200 Q&A pairs
Product Docs	מפרטי מוצרים, מדריכי שימוש	10-50 מוצרים
Policies	מדיניות החזרות, אחריות, משלוח	5-15 מסמכי מדיניות
Troubleshooting	פתרון בעיות נפוצות	20-50 guides

# Python -- sample FAQ data structure
faqs = [
    {
        "id": "faq-001",
        "question": "What is your return policy?",
        "question_he": "מה מדיניות ההחזרות שלכם?",
        "answer": "You can return any item within 30 days of purchase for a full refund. "
                  "Items must be in original packaging and unused condition. "
                  "To initiate a return, contact us with your order number.",
        "answer_he": "ניתן להחזיר כל מוצר תוך 30 יום מרגע הרכישה לקבלת החזר מלא. "
                     "המוצרים חייבים להיות באריזה המקורית ובמצב שלא נעשה בהם שימוש. "
                     "ליצירת בקשת החזרה, פנו אלינו עם מספר ההזמנה.",
        "category": "returns",
        "metadata": {"last_updated": "2026-03-01", "source": "policy_doc_v3"}
    },
    {
        "id": "faq-002",
        "question": "How long does shipping take?",
        "question_he": "כמה זמן לוקח המשלוח?",
        "answer": "Standard shipping: 5-7 business days. Express shipping: 2-3 business days. "
                  "Same-day delivery available in Tel Aviv and Jerusalem metro areas.",
        "answer_he": "משלוח רגיל: 5-7 ימי עסקים. משלוח מהיר: 2-3 ימי עסקים. "
                     "משלוח ביום ההזמנה זמין באזורי תל אביב וירושלים.",
        "category": "shipping",
        "metadata": {"last_updated": "2026-02-15", "source": "shipping_policy_v2"}
    }
]

שלב 2: Chunking -- פירוק לקטעים

מסמכי FAQ קצרים -- אפשר לשמור כל אחד כ-chunk בודד. מסמכי מדיניות ו-troubleshooting ארוכים יותר -- צריך לחלק:

# Python -- Chunking strategy
from typing import List, Dict

def chunk_document(text: str, chunk_size: int = 400,
                   overlap: int = 50) -> List[Dict]:
    """
    Split a document into overlapping chunks.
    chunk_size: target words per chunk
    overlap: words of overlap between consecutive chunks
    """
    words = text.split()
    chunks = []
    start = 0

    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunk_text = " ".join(words[start:end])
        chunks.append({
            "text": chunk_text,
            "start_word": start,
            "end_word": end,
            "word_count": end - start
        })
        start += chunk_size - overlap  # overlap for continuity

    return chunks

# For FAQs: each Q&A pair = 1 chunk (no splitting needed)
# For policies: chunk_size=400, overlap=50
# For product docs: chunk per section/heading

Chunking שגוי = תשובות שגויות

chunk-ים קטנים מדי (50 מילים) מפספסים context. chunk-ים גדולים מדי (2000 מילים) מכילים מידע לא רלוונטי שמבלבל את המודל. 400-500 מילים עם overlap של 50 מילים הוא נקודת פתיחה טובה. בדקו תמיד עם שאילתות אמיתיות ותכוונו.

שלב 3: Embedding ו-Vector Store

# Python -- Building the vector store with ChromaDB
import chromadb
from anthropic import Anthropic

# Initialize
chroma_client = chromadb.PersistentClient(path="./data/chroma_db")
collection = chroma_client.get_or_create_collection(
    name="support_knowledge_base",
    metadata={"hnsw:space": "cosine"}  # cosine similarity
)

def embed_and_store(documents: list[dict]):
    """Embed documents and store in ChromaDB."""
    # ChromaDB has built-in embedding (uses sentence-transformers)
    # For production, use OpenAI or Voyage AI embeddings
    collection.add(
        documents=[doc["text"] for doc in documents],
        metadatas=[{
            "category": doc.get("category", "general"),
            "source": doc.get("source", "unknown"),
            "language": doc.get("language", "en"),
            "last_updated": doc.get("last_updated", "")
        } for doc in documents],
        ids=[doc["id"] for doc in documents]
    )
    print(f"Stored {len(documents)} documents in vector store")

def search_knowledge_base(query: str, n_results: int = 5,
                          category: str = None) -> list[dict]:
    """Search the knowledge base for relevant documents."""
    where_filter = {"category": category} if category else None

    results = collection.query(
        query_texts=[query],
        n_results=n_results,
        where=where_filter
    )

    return [
        {
            "text": doc,
            "metadata": meta,
            "distance": dist,
            "relevance": 1 - dist  # convert distance to similarity
        }
        for doc, meta, dist in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0]
        )
    ]

// TypeScript -- Vector store with ChromaDB
import { ChromaClient } from "chromadb";

const client = new ChromaClient();

async function buildKnowledgeBase() {
  const collection = await client.getOrCreateCollection({
    name: "support_knowledge_base",
    metadata: { "hnsw:space": "cosine" },
  });

  // Add documents
  await collection.add({
    documents: faqs.map((f) => `Q: ${f.question}\nA: ${f.answer}`),
    metadatas: faqs.map((f) => ({
      category: f.category,
      source: f.metadata.source,
    })),
    ids: faqs.map((f) => f.id),
  });

  console.log(`Stored ${faqs.length} documents`);
}

async function searchKnowledgeBase(
  query: string,
  nResults: number = 5
): Promise<SearchResult[]> {
  const collection = await client.getCollection({
    name: "support_knowledge_base",
  });

  const results = await collection.query({
    queryTexts: [query],
    nResults,
  });

  return results.documents[0].map((doc, i) => ({
    text: doc,
    metadata: results.metadatas[0][i],
    distance: results.distances[0][i],
    relevance: 1 - results.distances[0][i],
  }));
}

שלב 4: Retrieval Pipeline

חיפוש vector לבד לא מספיק. Pipeline מלא כולל:

Query --> Embed --> Vector Search (top 20) --> Rerank (top 5) --> Filter (relevance > 0.7) --> Return

# Python -- Full retrieval pipeline
import anthropic
import json

client = anthropic.Anthropic()

def retrieve_and_rerank(query: str, n_results: int = 5) -> list[dict]:
    """Full RAG pipeline: search, rerank, filter."""

    # Step 1: Vector search -- get broad results
    raw_results = search_knowledge_base(query, n_results=20)

    # Step 2: Rerank using LLM -- more accurate than vector similarity
    rerank_prompt = f"""Given this customer question: "{query}"

Rate each of these knowledge base results from 0.0 to 1.0 based on
how well they answer the question. Return JSON array with scores.

Results:
{chr(10).join(f'{i+1}. {r["text"][:200]}' for i, r in enumerate(raw_results[:10]))}

Return format: [{{"index": 1, "score": 0.9, "reason": "directly answers"}}, ...]"""

    response = client.messages.create(
        model="claude-haiku-4-20250314",
        max_tokens=500,
        messages=[{"role": "user", "content": rerank_prompt}]
    )

    # Step 3: Parse scores and filter
    scores = json.loads(response.content[0].text)

    reranked = []
    for item in scores:
        idx = item["index"] - 1
        if idx < len(raw_results) and item["score"] >= 0.7:
            result = raw_results[idx]
            result["rerank_score"] = item["score"]
            reranked.append(result)

    # Step 4: Sort by rerank score and return top N
    reranked.sort(key=lambda x: x["rerank_score"], reverse=True)
    return reranked[:n_results]

עשו עכשיו 15 דקות

בנו את ה-Knowledge Base עם לפחות 20 FAQs (10 באנגלית, 10 בעברית). כללו שאלות על: משלוח, החזרות, מוצרים, תשלומים, ותמיכה טכנית. אחרי שהשאלות ב-vector store, הריצו 10 שאילתות חיפוש ובדקו: האם ה-top 3 results רלוונטיים? אם לא -- תכוונו את ה-chunking או ה-reranking.

ה-Router Agent -- ניתוב חכם

intermediate30 דקותpractice

ה-Router Agent הוא שומר הסף. הוא מקבל כל פנייה, מנתח אותה, ומחליט לאיזה specialist agent לשלוח. חשוב שהוא יהיה מהיר (< 500ms), זול (< $0.002/call), ומדויק (> 90% routing accuracy).

עקרונות הRouter

עיקרון	מה זה אומר	למה
מודל קטן	Haiku / Flash, לא Sonnet / GPT-4o	Classification לא דורש reasoning מתקדם
Structured output	JSON עם category, confidence, entities	הrouter לא מדבר עם הלקוח -- הוא מחזיר data
Confidence threshold	אם confidence < 0.7, שלח ל-General Agent	ניתוב שגוי גרוע מניתוב ל-fallback
Entity extraction	חלץ order_id, product_name, issue_type	הspecialist agent צריך את הנתונים האלה

# Python -- Router Agent
import anthropic
import json

client = anthropic.Anthropic()

ROUTER_SYSTEM_PROMPT = """You are a customer support router. Analyze the customer message
and classify it into exactly one category. Extract relevant entities.

Categories:
- "faq": Questions about products, policies, shipping, general info
- "order": Questions about a specific order, shipment tracking, delivery
- "complaint": Customer is unhappy, wants to report a problem, needs resolution
- "other": Doesn't fit any category, or needs human judgment

Response format (JSON only):
{
    "category": "faq|order|complaint|other",
    "confidence": 0.0-1.0,
    "entities": {
        "order_id": "ORD-XXXXX or null",
        "product_name": "string or null",
        "issue_type": "string or null"
    },
    "language": "en|he",
    "sentiment": "positive|neutral|negative|angry",
    "summary": "One-sentence summary of the request"
}

Rules:
- If message mentions an order number, ALWAYS classify as "order"
- If message expresses frustration/anger, classify as "complaint"
- If unsure between categories, use lower confidence (< 0.7)
- Detect language: Hebrew = "he", English = "en"
- Extract ALL entities you can find"""

def route_message(message: str) -> dict:
    """Route a customer message to the appropriate specialist."""
    response = client.messages.create(
        model="claude-haiku-4-20250314",
        max_tokens=300,
        system=ROUTER_SYSTEM_PROMPT,
        messages=[{"role": "user", "content": message}]
    )

    result = json.loads(response.content[0].text)

    # Apply confidence threshold -- route to fallback if unsure
    if result["confidence"] < 0.7:
        result["category"] = "other"
        result["routing_note"] = "Low confidence, routing to general agent"

    return result

# Test examples
test_messages = [
    "Where is my order ORD-12345?",            # -> order
    "מה מדיניות ההחזרות שלכם?",                  # -> faq (Hebrew)
    "This is the THIRD time I'm contacting you about this broken product!",  # -> complaint
    "Can you tell me about the Pro Headphones?", # -> faq
    "I want to speak to a manager NOW",          # -> complaint
]

for msg in test_messages:
    result = route_message(msg)
    print(f"Message: {msg[:50]}...")
    print(f"  -> {result['category']} (confidence: {result['confidence']})")
    print(f"     Entities: {result['entities']}")
    print()

// TypeScript -- Router Agent
import Anthropic from "@anthropic-ai/sdk";

const anthropicClient = new Anthropic();

interface RouteResult {
  category: "faq" | "order" | "complaint" | "other";
  confidence: number;
  entities: {
    order_id: string | null;
    product_name: string | null;
    issue_type: string | null;
  };
  language: "en" | "he";
  sentiment: "positive" | "neutral" | "negative" | "angry";
  summary: string;
}

async function routeMessage(message: string): Promise<RouteResult> {
  const response = await anthropicClient.messages.create({
    model: "claude-haiku-4-20250314",
    max_tokens: 300,
    system: ROUTER_SYSTEM_PROMPT,
    messages: [{ role: "user", content: message }],
  });

  const result: RouteResult = JSON.parse(
    response.content[0].type === "text" ? response.content[0].text : ""
  );

  // Confidence threshold
  if (result.confidence < 0.7) {
    result.category = "other";
  }

  return result;
}

Framework: Routing Decision Matrix

השתמשו במטריצה הזו כדי לקבוע routing rules:

Signal	Category	Priority
מספר הזמנה מוזכר (ORD-XXXXX)	order	HIGH -- תמיד מנצח
ביטויי כעס ("terrible", "נמאס", "unacceptable")	complaint	HIGH -- גם אם יש שאלת FAQ בתוך התלונה
בקשה לנציג אנושי ("speak to human", "תעביר למנהל")	complaint -> escalation	IMMEDIATE -- אל תנסה לפתור, תעביר
שאלת "how to" / "what is" / "מה" / "איך"	faq	NORMAL
אין signals ברורים	other (fallback)	LOW -- ה-general agent יטפל

כלל ברזל: אם יש ספק, תעלו (complaint > order > faq). לקוח שנשלח ל-complaint agent בטעות יקבל שירות יותר טוב. לקוח עם תלונה שנשלח ל-FAQ agent יתסכל יותר.

עשו עכשיו 10 דקות

בדקו את הRouter Agent שלכם עם 20 הודעות מגוונות: 5 FAQ (עברית ואנגלית), 5 הזמנות (עם מספרי הזמנה שונים), 5 תלונות (רמות כעס שונות), 5 edge cases (הודעות מעורבות, אמביגואליות, שפות אחרות). מדדו: כמה סווגו נכון? target: 90%+. אם פחות -- שפרו את ה-system prompt.

Specialist Agents -- שלושה מומחים

intermediate60 דקותpractice

כל specialist agent הוא סוכן עצמאי עם: system prompt ייחודי, tools מותאמים, guardrails ספציפיים, ו-personality שמתאימה לתפקיד.

FAQ Agent -- עונה מבסיס הידע

# Python -- FAQ Agent
FAQ_SYSTEM_PROMPT = """You are a friendly, knowledgeable customer support agent
specializing in answering product and policy questions.

CRITICAL RULES:
1. ONLY answer based on information from the knowledge base search results
2. If the knowledge base doesn't contain the answer, say "I don't have specific
   information about that. Let me connect you with a team member who can help."
3. ALWAYS cite your source: "According to our [policy/FAQ/product docs]..."
4. Respond in the SAME LANGUAGE the customer used (Hebrew or English)
5. Be concise but complete. Don't over-explain simple questions
6. If the customer seems frustrated, acknowledge it before answering

NEVER:
- Make up information not in the knowledge base
- Promise things you can't verify (specific dates, prices that might change)
- Share internal policies or system details
- Provide legal, medical, or financial advice"""

class FAQAgent:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.tools = [
            {
                "name": "search_knowledge_base",
                "description": "Search the company knowledge base for answers to "
                               "customer questions. Use this for EVERY question.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "The search query based on customer question"
                        },
                        "category": {
                            "type": "string",
                            "enum": ["shipping", "returns", "products", "payments",
                                     "technical", "general"],
                            "description": "Optional: filter by category"
                        }
                    },
                    "required": ["query"]
                }
            },
            {
                "name": "get_product_info",
                "description": "Get detailed information about a specific product.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "product_name": {
                            "type": "string",
                            "description": "Name or ID of the product"
                        }
                    },
                    "required": ["product_name"]
                }
            }
        ]

    def handle_tool_call(self, tool_name: str, tool_input: dict) -> str:
        if tool_name == "search_knowledge_base":
            results = retrieve_and_rerank(
                tool_input["query"],
                n_results=3
            )
            if not results:
                return "No relevant information found in the knowledge base."
            return json.dumps(results, ensure_ascii=False)

        elif tool_name == "get_product_info":
            # Mock product lookup
            return json.dumps({
                "name": tool_input["product_name"],
                "price": "$99.99",
                "in_stock": True,
                "description": "High-quality product with 1-year warranty"
            })

    def respond(self, conversation: list[dict]) -> str:
        """Generate a response using the agent loop."""
        messages = conversation.copy()

        while True:
            response = self.client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1000,
                system=FAQ_SYSTEM_PROMPT,
                tools=self.tools,
                messages=messages
            )

            # If model wants to use a tool
            if response.stop_reason == "tool_use":
                tool_block = next(
                    b for b in response.content if b.type == "tool_use"
                )
                tool_result = self.handle_tool_call(
                    tool_block.name, tool_block.input
                )
                messages.append({
                    "role": "assistant", "content": response.content
                })
                messages.append({
                    "role": "user",
                    "content": [{
                        "type": "tool_result",
                        "tool_use_id": tool_block.id,
                        "content": tool_result
                    }]
                })
                continue

            # Model returned text -- extract and return
            text_block = next(
                b for b in response.content if b.type == "text"
            )
            return text_block.text

Order Agent -- מידע על הזמנות

# Python -- Order Agent
ORDER_SYSTEM_PROMPT = """You are a customer support agent specializing in order inquiries.
You help customers check order status, track shipments, and initiate returns.

CRITICAL RULES:
1. ALWAYS verify the customer provided an order ID before looking up order info
2. If no order ID given, ask: "Could you please provide your order number?
   It starts with ORD-"
3. For returns: check the order date -- returns allowed within 30 days only
4. For refunds: you can INITIATE the return process but CANNOT process the
   actual refund (that requires human approval)
5. Respond in the same language as the customer

SECURITY: Never reveal internal order notes, supplier information, or cost prices."""

class OrderAgent:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.tools = [
            {
                "name": "get_order_status",
                "description": "Look up the current status of an order.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "order_id": {
                            "type": "string",
                            "description": "Order ID in format ORD-XXXXX"
                        }
                    },
                    "required": ["order_id"]
                }
            },
            {
                "name": "track_shipment",
                "description": "Get real-time shipment tracking information.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "order_id": {
                            "type": "string",
                            "description": "Order ID to track"
                        }
                    },
                    "required": ["order_id"]
                }
            },
            {
                "name": "initiate_return",
                "description": "Start a return process for an order. Does NOT process "
                               "the refund -- just creates the return request.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "order_id": {"type": "string"},
                        "reason": {
                            "type": "string",
                            "enum": ["defective", "wrong_item", "changed_mind",
                                     "not_as_described", "other"]
                        },
                        "description": {
                            "type": "string",
                            "description": "Customer description of the issue"
                        }
                    },
                    "required": ["order_id", "reason"]
                }
            }
        ]

    def handle_tool_call(self, tool_name: str, tool_input: dict) -> str:
        if tool_name == "get_order_status":
            # Mock order lookup (production: query your database)
            return json.dumps({
                "order_id": tool_input["order_id"],
                "status": "shipped",
                "items": [
                    {"name": "Wireless Headphones", "qty": 1, "price": "$79.99"}
                ],
                "order_date": "2026-03-18",
                "estimated_delivery": "2026-03-27",
                "shipping_address": "Tel Aviv, Israel"
            })
        elif tool_name == "track_shipment":
            return json.dumps({
                "order_id": tool_input["order_id"],
                "carrier": "Israel Post",
                "tracking_number": "RR123456789IL",
                "status": "in_transit",
                "last_update": "2026-03-23 14:30",
                "location": "Sorting facility, Modiin",
                "estimated_delivery": "2026-03-27"
            })
        elif tool_name == "initiate_return":
            return json.dumps({
                "return_id": "RET-98765",
                "order_id": tool_input["order_id"],
                "status": "pending_approval",
                "message": "Return request created. A team member will review "
                          "and approve within 24 hours."
            })

    def respond(self, conversation: list[dict]) -> str:
        """Same agent loop pattern as FAQ agent."""
        messages = conversation.copy()
        for _ in range(5):  # max 5 tool calls
            response = self.client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1000,
                system=ORDER_SYSTEM_PROMPT,
                tools=self.tools,
                messages=messages
            )
            if response.stop_reason == "tool_use":
                tool_block = next(
                    b for b in response.content if b.type == "tool_use"
                )
                result = self.handle_tool_call(
                    tool_block.name, tool_block.input
                )
                messages.append({
                    "role": "assistant", "content": response.content
                })
                messages.append({
                    "role": "user",
                    "content": [{"type": "tool_result",
                                 "tool_use_id": tool_block.id,
                                 "content": result}]
                })
                continue
            return next(
                b for b in response.content if b.type == "text"
            ).text
        return ("I apologize, but I'm having trouble processing your request. "
                "Let me connect you with a team member.")

Complaint Agent -- טיפול אמפתי בתלונות

# Python -- Complaint Agent
COMPLAINT_SYSTEM_PROMPT = """You are a senior customer support specialist who handles
complaints and unhappy customers. You are empathetic, patient, and solution-oriented.

YOUR PERSONALITY:
- Lead with empathy: "I completely understand your frustration..."
- Take responsibility: "I'm sorry for this experience" (never blame the customer)
- Offer concrete solutions, don't just apologize
- Be calm even if the customer is angry

ESCALATION RULES:
1. If customer explicitly asks for a human: IMMEDIATELY escalate (don't try to solve)
2. If customer is extremely angry (cursing, threats): escalate with "high_priority"
3. If issue requires a policy exception: escalate -- you can't make exceptions
4. If you can't resolve after 3 attempts: escalate

TOOLS:
- log_complaint: Always log the complaint first, then try to resolve
- offer_discount: You can offer up to 15% discount or free shipping as goodwill
- escalate_to_human: Use when escalation rules apply

RESPONSE STYLE:
- Acknowledge the emotion first, then address the issue
- Use the customer's name if available
- End with a clear next step
- Respond in the customer's language"""

class ComplaintAgent:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.tools = [
            {
                "name": "log_complaint",
                "description": "Log a customer complaint in the system. "
                               "ALWAYS call this first for any complaint.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "customer_message": {"type": "string"},
                        "category": {
                            "type": "string",
                            "enum": ["product_quality", "shipping_delay",
                                     "wrong_item", "billing", "service",
                                     "website", "other"]
                        },
                        "severity": {
                            "type": "string",
                            "enum": ["low", "medium", "high", "critical"]
                        }
                    },
                    "required": ["customer_message", "category", "severity"]
                }
            },
            {
                "name": "offer_discount",
                "description": "Offer a goodwill discount or benefit to the customer.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "type": {
                            "type": "string",
                            "enum": ["percentage_discount", "free_shipping",
                                     "store_credit"]
                        },
                        "value": {
                            "type": "string",
                            "description": "e.g. '15%' or '$10'. Max 15% or $20."
                        },
                        "reason": {"type": "string"}
                    },
                    "required": ["type", "value", "reason"]
                }
            },
            {
                "name": "escalate_to_human",
                "description": "Escalate the conversation to a human agent. "
                               "Use when: customer asks for human, issue is complex, "
                               "or you can't resolve after 3 attempts.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "reason": {"type": "string"},
                        "priority": {
                            "type": "string",
                            "enum": ["normal", "high", "urgent"]
                        },
                        "summary": {
                            "type": "string",
                            "description": "Summary of the issue for the human agent"
                        }
                    },
                    "required": ["reason", "priority", "summary"]
                }
            }
        ]

    def handle_tool_call(self, tool_name: str, tool_input: dict) -> str:
        if tool_name == "log_complaint":
            complaint_id = "CMP-" + str(abs(hash(
                tool_input["customer_message"]
            )) % 100000).zfill(5)
            return json.dumps({
                "complaint_id": complaint_id,
                "status": "logged",
                "message": f"Complaint {complaint_id} logged successfully"
            })
        elif tool_name == "offer_discount":
            return json.dumps({
                "offer_id": "OFF-" + str(abs(hash(
                    tool_input["value"]
                )) % 10000),
                "status": "offered",
                "coupon_code": "SORRY15",
                "valid_until": "2026-04-25"
            })
        elif tool_name == "escalate_to_human":
            return json.dumps({
                "ticket_id": "TKT-" + str(abs(hash(
                    tool_input["summary"]
                )) % 100000),
                "status": "escalated",
                "priority": tool_input["priority"],
                "estimated_response": (
                    "15 minutes" if tool_input["priority"] == "urgent"
                    else "2 hours"
                ),
                "message": "Conversation has been escalated to a human agent."
            })

    def respond(self, conversation: list[dict]) -> str:
        messages = conversation.copy()
        for _ in range(6):  # complaints may need more tool calls
            response = self.client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1200,
                system=COMPLAINT_SYSTEM_PROMPT,
                tools=self.tools,
                messages=messages
            )
            if response.stop_reason == "tool_use":
                tool_block = next(
                    b for b in response.content if b.type == "tool_use"
                )
                result = self.handle_tool_call(
                    tool_block.name, tool_block.input
                )
                messages.append({
                    "role": "assistant", "content": response.content
                })
                messages.append({
                    "role": "user",
                    "content": [{"type": "tool_result",
                                 "tool_use_id": tool_block.id,
                                 "content": result}]
                })
                continue
            return next(
                b for b in response.content if b.type == "text"
            ).text
        return ("I sincerely apologize for the difficulty. "
                "Let me connect you with a senior team member right away.")

Guardrails חובה לכל Specialist Agent

בלי guardrails, הspecialist agents יכולים: (1) להמציא מידע שלא קיים בבסיס הידע, (2) להבטיח דברים שלא מותר להם, (3) לחשוף מידע פנימי, (4) לבצע פעולות בלי אימות זהות הלקוח. שלבו את ה-guardrails מפרק 14: input guardrails (PII filtering, injection detection), execution guardrails (tool whitelist, budget limit), output guardrails (hallucination check, PII detection).

עשו עכשיו 15 דקות

בדקו כל specialist agent בנפרד עם 5 שיחות דוגמה. עבור כל agent:

FAQ Agent: שאלו שאלה שהתשובה קיימת ב-KB, שאלה שלא קיימת (צריך לסרב), ושאלה בעברית
Order Agent: בקשו סטטוס הזמנה, בקשו tracking, בקשו החזרה
Complaint Agent: שלחו תלונה קלה, תלונה כועסת, ובקשו נציג אנושי

מה לבדוק: הסוכן עונה מהKB (לא ממציא)? מכבד את ה-guardrails? מגיב בשפה הנכונה?

Human Escalation -- העברה לנציג אנושי

intermediate30 דקותpractice

אסקלציה חלקה היא מה שמבדיל בין סוכן AI טוב לרע. לקוח שמבקש נציג אנושי ונתקע בלופ של "אני יכול לעזור" -- זו חוויה נוראית. הescalation צריך להיות מיידי, חלק, ועם context מלא.

מתי לעשות Escalation

טריגר	פעולה	Priority
לקוח מבקש נציג אנושי במפורש	Escalate מיד, בלי ניסיון נוסף	NORMAL
סוכן לא מצליח לפתור אחרי 3 ניסיונות	Escalate עם סיכום של מה נוסה	NORMAL
לקוח כועס מאוד / משתמש בשפה בוטה	Escalate מיד עם flag של "angry customer"	HIGH
בקשה שדורשת חריגה ממדיניות	Escalate עם ציון שהבקשה out of policy	NORMAL
בעיה טכנית / באג / פעולה שנכשלה	Escalate עם log של השגיאה	HIGH
נושא רגיש: חיוב שגוי, אבטחה, פרטיות	Escalate מיד, אל תנסה לטפל	URGENT

# Python -- Escalation Handler
from datetime import datetime

class EscalationHandler:
    """Handles smooth handoff from AI agent to human agent."""

    def create_escalation(self, conversation: list[dict],
                          reason: str, priority: str,
                          agent_type: str) -> dict:
        """
        Create an escalation ticket with full context.
        Returns the ticket for the human agent queue.
        """
        # Step 1: Summarize the conversation for the human agent
        summary = self._summarize_conversation(conversation)

        # Step 2: Extract customer info
        customer_info = self._extract_customer_info(conversation)

        # Step 3: Create ticket
        ticket = {
            "ticket_id": f"TKT-{datetime.now().strftime('%Y%m%d%H%M%S')}",
            "created_at": datetime.now().isoformat(),
            "priority": priority,
            "status": "open",
            "source_agent": agent_type,
            "escalation_reason": reason,
            "customer_info": customer_info,
            "conversation_summary": summary,
            "full_conversation": conversation,
            "ai_resolution_attempted": True,
            "message_count": len([m for m in conversation
                                  if m["role"] == "user"]),
        }

        # Step 4: Add to human agent queue
        self._add_to_queue(ticket)

        # Step 5: Notify human agent (Slack, email, etc.)
        self._notify_agent(ticket)

        return ticket

    def _summarize_conversation(self, conversation: list[dict]) -> str:
        """Use LLM to create a concise summary for the human agent."""
        summarizer = anthropic.Anthropic()
        conv_text = "\n".join(
            f"{m['role']}: {m['content']}"
            for m in conversation
            if isinstance(m.get("content"), str)
        )

        response = summarizer.messages.create(
            model="claude-haiku-4-20250314",
            max_tokens=300,
            messages=[{
                "role": "user",
                "content": f"""Summarize this customer support conversation for
a human agent who will take over. Include:
1. What the customer wants
2. What was tried by the AI agent
3. Why escalation was needed
4. Customer's emotional state

Conversation:
{conv_text}"""
            }]
        )
        return response.content[0].text

    def _extract_customer_info(self, conversation: list[dict]) -> dict:
        """Extract customer details mentioned in conversation."""
        return {
            "name": None,  # extracted from conversation if available
            "order_id": None,
            "language": "he",  # detected from conversation
            "sentiment": "negative"
        }

    def _add_to_queue(self, ticket: dict):
        """Add ticket to human agent queue (mock implementation)."""
        print(f"[QUEUE] Added ticket {ticket['ticket_id']} "
              f"(priority: {ticket['priority']})")

    def _notify_agent(self, ticket: dict):
        """Notify available human agent about new escalation."""
        # In production: send Slack message, push notification, etc.
        print(f"[NOTIFY] New {ticket['priority']} ticket: "
              f"{ticket['ticket_id']}")

    def get_handoff_message(self, ticket: dict, language: str) -> str:
        """Generate the handoff message for the customer."""
        if language == "he":
            return (
                f"אני מעביר אותך לנציג צוות שיוכל לעזור עם זה. "
                f"מספר הפנייה שלך הוא {ticket['ticket_id']}. "
                f"הנציג יראה את כל מה שדיברנו עליו, "
                f"אז לא תצטרך/י לחזור על הכל. "
                f"זמן המתנה משוער: "
                f"{'כמה דקות' if ticket['priority'] == 'urgent' else 'עד שעתיים'}. "
                f"תודה על הסבלנות."
            )
        else:
            return (
                f"I'm connecting you with a team member who can help. "
                f"Your ticket number is {ticket['ticket_id']}. "
                f"They'll have the full context of our conversation, "
                f"so you won't need to repeat anything. "
                f"Estimated wait time: "
                f"{'a few minutes' if ticket['priority'] == 'urgent' else 'up to 2 hours'}. "
                f"Thank you for your patience."
            )

עשו עכשיו 10 דקות

בדקו את ה-escalation flow:

שלחו לComplaint Agent הודעה: "I want to speak to a human NOW"
ודאו שההסלמה קורית מיד (בלי ניסיון לפתור)
בדקו שהticket שנוצר מכיל: summary מדויק, priority נכון, כל ה-context
בדקו שהודעת ה-handoff ללקוח ברורה ומרגיעה

זיכרון שיחות ו-Context

intermediate25 דקותpractice

סוכן תמיכה שלא זוכר שהלקוח פנה אתמול על אותה בעיה -- זו חוויה מתסכלת. זיכרון טוב כולל שלוש שכבות:

שכבה	מה	איך
Within-session	כל מה שנאמר בשיחה הנוכחית	פשוט שמרו את ה-messages array
Cross-session	שיחות קודמות של אותו לקוח	Vector store לחיפוש סמנטי + DB לmetadata
Customer profile	פרופיל לקוח: הזמנות, העדפות, היסטוריה	Structured DB (PostgreSQL/SQLite)

# Python -- Customer Memory System
import sqlite3
from datetime import datetime

class CustomerMemory:
    """Manages customer context across sessions."""

    def __init__(self, db_path: str = "./data/customers.db"):
        self.conn = sqlite3.connect(db_path)
        self._init_db()

    def _init_db(self):
        self.conn.executescript("""
            CREATE TABLE IF NOT EXISTS customers (
                customer_id TEXT PRIMARY KEY,
                name TEXT,
                email TEXT,
                language TEXT DEFAULT 'he',
                created_at TEXT,
                total_orders INTEGER DEFAULT 0,
                total_spent REAL DEFAULT 0,
                satisfaction_avg REAL DEFAULT 0,
                notes TEXT DEFAULT ''
            );

            CREATE TABLE IF NOT EXISTS conversations (
                conversation_id TEXT PRIMARY KEY,
                customer_id TEXT,
                started_at TEXT,
                ended_at TEXT,
                category TEXT,
                resolved BOOLEAN DEFAULT 0,
                escalated BOOLEAN DEFAULT 0,
                satisfaction_score INTEGER,
                summary TEXT,
                FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
            );

            CREATE TABLE IF NOT EXISTS conversation_messages (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                conversation_id TEXT,
                role TEXT,
                content TEXT,
                timestamp TEXT,
                FOREIGN KEY (conversation_id)
                    REFERENCES conversations(conversation_id)
            );
        """)

    def get_customer_context(self, customer_id: str) -> dict:
        """Build full context for a returning customer."""

        # Get customer profile
        row = self.conn.execute(
            "SELECT * FROM customers WHERE customer_id = ?",
            (customer_id,)
        ).fetchone()

        if not row:
            return {"is_new_customer": True}

        profile = {
            "customer_id": row[0], "name": row[1], "email": row[2],
            "language": row[3], "total_orders": row[5],
            "total_spent": row[6], "satisfaction_avg": row[7]
        }

        # Get recent conversations (last 5)
        conversations = self.conn.execute("""
            SELECT conversation_id, started_at, category,
                   resolved, escalated, summary
            FROM conversations
            WHERE customer_id = ?
            ORDER BY started_at DESC LIMIT 5
        """, (customer_id,)).fetchall()

        recent_history = [
            {
                "date": c[1], "category": c[2],
                "resolved": bool(c[3]), "escalated": bool(c[4]),
                "summary": c[5]
            }
            for c in conversations
        ]

        return {
            "is_new_customer": False,
            "profile": profile,
            "recent_conversations": recent_history,
            "personalization_prompt": self._build_personalization(
                profile, recent_history
            )
        }

    def _build_personalization(self, profile: dict,
                                history: list[dict]) -> str:
        """Build a personalization snippet for the system prompt."""
        parts = []

        if profile.get("name"):
            parts.append(f"Customer name: {profile['name']}")

        if profile.get("total_orders", 0) > 5:
            parts.append(
                f"Loyal customer ({profile['total_orders']} orders, "
                f"${profile['total_spent']:.0f} total spent). "
                f"Treat with extra care."
            )

        if history:
            last = history[0]
            if last.get("escalated"):
                parts.append(
                    f"WARNING: Last interaction ({last['date']}) was escalated. "
                    f"Summary: {last['summary']}. Be extra attentive."
                )
            if (last.get("category") == "complaint"
                    and not last.get("resolved")):
                parts.append(
                    f"UNRESOLVED complaint from {last['date']}: "
                    f"{last['summary']}"
                )

        if profile.get("language") == "he":
            parts.append("Customer prefers Hebrew. Respond in Hebrew.")

        return "\n".join(parts) if parts else "No prior history."

    def save_conversation(self, customer_id: str,
                          conversation_id: str,
                          messages: list[dict],
                          category: str,
                          resolved: bool,
                          escalated: bool,
                          summary: str):
        """Save a completed conversation."""
        self.conn.execute("""
            INSERT OR REPLACE INTO conversations
            (conversation_id, customer_id, started_at, ended_at,
             category, resolved, escalated, summary)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            conversation_id, customer_id,
            datetime.now().isoformat(), datetime.now().isoformat(),
            category, resolved, escalated, summary
        ))

        for msg in messages:
            if isinstance(msg.get("content"), str):
                self.conn.execute("""
                    INSERT INTO conversation_messages
                    (conversation_id, role, content, timestamp)
                    VALUES (?, ?, ?, ?)
                """, (
                    conversation_id, msg["role"],
                    msg["content"], datetime.now().isoformat()
                ))

        self.conn.commit()

// TypeScript -- Customer context injection
interface CustomerContext {
  isNewCustomer: boolean;
  profile?: {
    name: string;
    language: "en" | "he";
    totalOrders: number;
    totalSpent: number;
  };
  recentConversations?: {
    date: string;
    category: string;
    resolved: boolean;
    summary: string;
  }[];
  personalizationPrompt: string;
}

function injectCustomerContext(
  systemPrompt: string,
  context: CustomerContext
): string {
  if (context.isNewCustomer) {
    return systemPrompt + "\n\nThis is a new customer. Welcome them warmly.";
  }

  return (
    systemPrompt +
    `\n\n--- CUSTOMER CONTEXT ---\n` +
    `${context.personalizationPrompt}\n` +
    `--- END CONTEXT ---`
  );
}

הרעיון: לפני שהסוכן מתחיל לטפל בפנייה, הכניסו את ה-customer context לתוך ה-system prompt. ככה הסוכן "יודע" מי הלקוח, מה ההיסטוריה, ומה לשים לב אליו.

פרטיות: מה לשמור ומה לא

כן לשמור: סיכום שיחות, קטגוריות פניות, סטטוס פתרון, העדפת שפה, מספר הזמנות. לא לשמור: מספרי כרטיס אשראי, סיסמאות, מידע רפואי, מספרי תעודת זהות. גם אם הלקוח שלח אותם בשיחה -- מחקו את הPII לפני שמירה. בישראל, חוק הגנת הפרטיות מחייב מינימליזם באיסוף מידע.

Testing ובדיקות

intermediate30 דקותpractice

סוכן שלא עובר בדיקות מקיפות הוא סוכן שאתם לא יודעים אם הוא עובד. ב-customer support, שגיאה = לקוח מתוסכל = נטישה. צריך testing מדויק ומקיף.

Testing Framework

Framework: Support Agent Testing Dimensions

Dimension	מה מודדים	Target	איך מודדים
Routing Accuracy	הrouter שלח לspecialist הנכון?	> 90%	50 הודעות עם label ידוע
Answer Correctness	התשובה נכונה ומבוססת על KB?	> 85%	LLM-as-judge + human review
No Hallucination	הסוכן לא המציא מידע?	> 95%	Cross-reference עם KB
Escalation Appropriateness	הסלים כשצריך, לא כשלא?	> 90%	Review escalation decisions
Tone and Empathy	הטון מתאים? אמפטי עם תלונות?	> 80%	LLM-as-judge for tone
Language Handling	עונה בשפה הנכונה? עברית תקינה?	> 95%	Language detection on response
Resolution Rate	אחוז הפניות שנפתרו בלי escalation	> 70%	Track escalation rate
Latency	זמן מתחילת שיחה ועד תשובה ראשונה	< 3 sec	Measure response time

# Python -- Test Runner
import json
import anthropic

class SupportTestRunner:
    """Run test suite on the support agent."""

    def __init__(self, test_file: str = "./tests/test_conversations.json"):
        with open(test_file) as f:
            self.test_cases = json.load(f)
        self.client = anthropic.Anthropic()

    def run_tests(self) -> dict:
        """Run all test cases and compute metrics."""
        results = {
            "total": len(self.test_cases),
            "routing_correct": 0,
            "answer_correct": 0,
            "no_hallucination": 0,
            "tone_appropriate": 0,
            "escalation_correct": 0,
            "language_correct": 0,
            "details": []
        }

        for case in self.test_cases:
            result = self._test_single(case)
            results["details"].append(result)

            if result["routing_correct"]:
                results["routing_correct"] += 1
            if result["answer_correct"]:
                results["answer_correct"] += 1
            if result["no_hallucination"]:
                results["no_hallucination"] += 1
            if result["tone_appropriate"]:
                results["tone_appropriate"] += 1
            if result.get("escalation_correct", True):
                results["escalation_correct"] += 1
            if result["language_correct"]:
                results["language_correct"] += 1

        # Compute percentages
        n = results["total"]
        results["metrics"] = {
            "routing_accuracy": results["routing_correct"] / n * 100,
            "answer_accuracy": results["answer_correct"] / n * 100,
            "hallucination_free": results["no_hallucination"] / n * 100,
            "tone_score": results["tone_appropriate"] / n * 100,
            "escalation_accuracy": results["escalation_correct"] / n * 100,
            "language_accuracy": results["language_correct"] / n * 100,
        }

        return results

    def _test_single(self, case: dict) -> dict:
        """Test a single case using LLM-as-judge."""
        # Step 1: Run the router
        route_result = route_message(case["customer_message"])

        # Step 2: Get agent response
        agent_response = self._get_agent_response(
            case["customer_message"],
            route_result["category"]
        )

        # Step 3: Judge with LLM
        judge_prompt = f"""Judge this customer support interaction.

Customer message: {case["customer_message"]}
Expected category: {case["expected_category"]}
Actual category: {route_result["category"]}
Expected answer contains: {case.get("expected_answer_contains", "N/A")}
Agent response: {agent_response}
Expected language: {case.get("expected_language", "en")}

Return JSON:
{{
    "routing_correct": true/false,
    "answer_correct": true/false,
    "no_hallucination": true/false,
    "tone_appropriate": true/false,
    "language_correct": true/false,
    "explanation": "brief explanation"
}}"""

        judge_response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=300,
            messages=[{"role": "user", "content": judge_prompt}]
        )

        return json.loads(judge_response.content[0].text)

    def _get_agent_response(self, message: str, category: str) -> str:
        """Route to appropriate agent and get response."""
        agents = {
            "faq": FAQAgent(),
            "order": OrderAgent(),
            "complaint": ComplaintAgent(),
        }
        agent = agents.get(category, FAQAgent())
        return agent.respond([{"role": "user", "content": message}])


# Sample test cases format
sample_test_cases = [
    {
        "customer_message": "What is your return policy?",
        "expected_category": "faq",
        "expected_answer_contains": "30 days",
        "expected_language": "en",
        "should_escalate": False
    },
    {
        "customer_message": "איפה ההזמנה שלי ORD-12345?",
        "expected_category": "order",
        "expected_answer_contains": "ORD-12345",
        "expected_language": "he",
        "should_escalate": False
    },
    {
        "customer_message": "This is UNACCEPTABLE! "
                           "Third time the product arrived broken!",
        "expected_category": "complaint",
        "expected_answer_contains": "sorry",
        "expected_language": "en",
        "should_escalate": False
    },
    {
        "customer_message": "I want to speak to a manager",
        "expected_category": "complaint",
        "expected_answer_contains": "connect",
        "expected_language": "en",
        "should_escalate": True
    }
]

עשו עכשיו 15 דקות

צרו test suite של 20 שיחות -- 5 לכל קטגוריה (FAQ, Order, Complaint, Edge Cases). לכל שיחה הגדירו: expected_category, expected_answer_contains, expected_language, should_escalate. הריצו את ה-test runner ובדקו:

Routing accuracy -- target: 90%+ (18/20 נכון)
No hallucination -- target: 95%+ (19/20 ללא המצאות)
Language correct -- target: 95%+ (עברית לעברית, אנגלית לאנגלית)

אם לא עומדים ב-targets: שפרו prompts, הוסיפו דוגמאות ל-KB, תכוונו confidence thresholds.

Deployment ו-Integration

intermediate35 דקותpractice

סוכן שרץ ב-notebook לא שווה הרבה. צריך לעטוף אותו ב-API, לחבר ל-channels שלקוחות משתמשים בהם, ולהוסיף monitoring.

FastAPI Server עם Streaming

# Python -- FastAPI server with streaming
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uuid

app = FastAPI(title="Customer Support Agent API")

# In-memory conversation store (production: use Redis/DB)
conversations: dict[str, list[dict]] = {}

# Initialize components
memory = CustomerMemory()
escalation_handler = EscalationHandler()

class MessageRequest(BaseModel):
    conversation_id: str | None = None
    customer_id: str | None = None
    message: str

class MessageResponse(BaseModel):
    conversation_id: str
    response: str
    category: str
    escalated: bool
    ticket_id: str | None = None

@app.post("/chat", response_model=MessageResponse)
async def chat(request: MessageRequest):
    """Handle a customer support message."""

    # Step 1: Get or create conversation
    conv_id = request.conversation_id or str(uuid.uuid4())
    if conv_id not in conversations:
        conversations[conv_id] = []

    # Step 2: Load customer context (if returning customer)
    customer_context = None
    if request.customer_id:
        customer_context = memory.get_customer_context(
            request.customer_id
        )

    # Step 3: Route the message
    route_result = route_message(request.message)
    category = route_result["category"]

    # Step 4: Add message to conversation history
    conversations[conv_id].append({
        "role": "user",
        "content": request.message
    })

    # Step 5: Get response from appropriate specialist
    agents = {
        "faq": FAQAgent(),
        "order": OrderAgent(),
        "complaint": ComplaintAgent(),
    }
    agent = agents.get(category, FAQAgent())

    messages = conversations[conv_id].copy()
    response_text = agent.respond(messages)

    # Step 6: Store response
    conversations[conv_id].append({
        "role": "assistant",
        "content": response_text
    })

    # Step 7: Check if escalation happened
    escalated = ("escalate" in response_text.lower() or
                 "connecting you" in response_text.lower() or
                 "מעביר" in response_text)
    ticket_id = None

    if escalated:
        ticket = escalation_handler.create_escalation(
            conversation=conversations[conv_id],
            reason=route_result.get("summary", "Customer request"),
            priority=("urgent"
                      if route_result["sentiment"] == "angry"
                      else "normal"),
            agent_type=category
        )
        ticket_id = ticket["ticket_id"]

    return MessageResponse(
        conversation_id=conv_id,
        response=response_text,
        category=category,
        escalated=escalated,
        ticket_id=ticket_id
    )

@app.get("/conversation/{conv_id}")
async def get_conversation(conv_id: str):
    """Get full conversation history."""
    if conv_id not in conversations:
        raise HTTPException(404, "Conversation not found")
    return {"conversation_id": conv_id,
            "messages": conversations[conv_id]}

@app.get("/health")
async def health():
    return {
        "status": "ok",
        "agents": ["router", "faq", "order", "complaint"]
    }

# Run: uvicorn server:app --host 0.0.0.0 --port 8000

// TypeScript -- Express server (equivalent)
import express from "express";
import { v4 as uuidv4 } from "uuid";

const app = express();
app.use(express.json());

const conversations = new Map<string, any[]>();

app.post("/chat", async (req, res) => {
  const { conversation_id, customer_id, message } = req.body;
  const convId = conversation_id || uuidv4();

  if (!conversations.has(convId)) {
    conversations.set(convId, []);
  }

  // Route message
  const routeResult = await routeMessage(message);

  // Add to history
  const conv = conversations.get(convId)!;
  conv.push({ role: "user", content: message });

  // Get specialist response
  const agentResponse = await getSpecialistResponse(
    routeResult.category,
    conv
  );

  conv.push({ role: "assistant", content: agentResponse });

  res.json({
    conversation_id: convId,
    response: agentResponse,
    category: routeResult.category,
    escalated: agentResponse.includes("connecting you"),
  });
});

app.listen(8000, () => console.log("Support agent API on :8000"));

Integration Points

Channel	איך מתחברים	הערות
Website Widget	WebSocket / SSE מהFastAPI server	כמו Intercom -- chat bubble בפינה
Slack	Slack Bot API, events subscription	Internal support (עובדים פונים לIT)
WhatsApp Business	WhatsApp Business API (Meta)	פופולרי מאוד בישראל -- 93% מהאוכלוסייה
Email	Webhook מ-email provider + API לשליחה	Async -- לא צריך streaming
Telegram	Telegram Bot API	פשוט לחיבור, פופולרי בקרב developers

Israeli Context: תמיכה בעברית ו-patterns ישראליים

כמה דברים שחשוב לדעת כשמפעילים סוכן תמיכה בישראל:

נושא	מה לעשות
שפה	הסוכן חייב לתמוך בעברית ואנגלית. רוב הלקוחות כותבים בעברית. חלקם מערבבים (עבראנגלית / "Hebrish")
WhatsApp	93% מהישראלים משתמשים ב-WhatsApp. זה ערוץ התקשורת העיקרי -- לא אימייל
שבת וחגים	הוסיפו auto-reply לשבתות וחגים: "אנחנו כרגע לא פעילים. נחזור אליכם במוצ"ש / אחרי החג"
טון	ישראלים מצפים לתקשורת ישירה ולא פורמלית. "היי, איך אפשר לעזור?" טוב יותר מ-"לקוח יקר"
ציפיות	ישראלים רגילים לשירות מהיר ואישי. "אני צריך לבדוק ולחזור אליך" -- עם follow-up אמיתי

עשו עכשיו 10 דקות

הפעילו את ה-FastAPI server ובדקו את ה-API:

# Start the server
uvicorn server:app --reload --port 8000

# Test with curl
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "מה מדיניות ההחזרות שלכם?"}'

# Test order inquiry
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Where is my order ORD-12345?",
       "conversation_id": "test-1"}'

# Test complaint
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "I want to speak to a manager NOW"}'

ודאו שכל 3 הcalls מחזירים תשובות הגיוניות עם הcategory הנכון.

למידה ושיפור מתמיד

intermediate20 דקותconcept + practice

Deploy הוא לא הסוף -- זו ההתחלה. סוכן תמיכה טוב משתפר עם הזמן על בסיס feedback, ניתוח שגיאות, ועדכון בסיס הידע.

The Improvement Cycle

Collect Feedback --> Analyze Failures --> Update KB/Prompts --> Run Tests --> Deploy --> Repeat

שלב	מה עושים	תדירות
Collect	CSAT ratings, thumbs up/down, escalation reasons, human corrections	כל שיחה (אוטומטי)
Analyze	סקרו שיחות עם דירוג נמוך. מה השתבש? חסר בKB? prompt חלש? routing שגוי?	שבועי
Update	הוסיפו FAQs חדשים, שפרו system prompts, תקנו routing rules	שבועי
Test	הריצו את ה-test suite המלא. Regression? מטריקות השתפרו?	אחרי כל שינוי
Deploy	אם הtests עוברים: deploy. אם לא: חזרו ל-Analyze	שבועי

# Python -- Feedback collection and analysis
import sqlite3
from datetime import datetime

class FeedbackCollector:
    """Collect and analyze customer feedback."""

    def __init__(self, db_path: str = "./data/feedback.db"):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS feedback (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                conversation_id TEXT,
                rating INTEGER,  -- 1-5
                thumbs TEXT,     -- up/down
                comment TEXT,
                category TEXT,
                created_at TEXT
            )
        """)

    def add_feedback(self, conversation_id: str, rating: int,
                     thumbs: str = None, comment: str = None):
        self.conn.execute("""
            INSERT INTO feedback (conversation_id, rating, thumbs,
                                  comment, created_at)
            VALUES (?, ?, ?, ?, ?)
        """, (conversation_id, rating, thumbs, comment,
              datetime.now().isoformat()))
        self.conn.commit()

    def weekly_analysis(self) -> dict:
        """Generate weekly analysis of feedback."""
        rows = self.conn.execute("""
            SELECT rating, thumbs, comment, category
            FROM feedback
            WHERE created_at > datetime('now', '-7 days')
        """).fetchall()

        if not rows:
            return {"message": "No feedback this week"}

        ratings = [r[0] for r in rows if r[0]]
        thumbs_down = [r for r in rows if r[1] == "down"]

        analysis = {
            "total_feedback": len(rows),
            "avg_rating": sum(ratings) / len(ratings) if ratings else 0,
            "thumbs_down_count": len(thumbs_down),
            "thumbs_down_rate": len(thumbs_down) / len(rows) * 100,
            "negative_comments": [
                r[2] for r in thumbs_down if r[2]
            ],
        }

        # Use LLM to analyze negative feedback patterns
        if analysis["negative_comments"]:
            analyzer = anthropic.Anthropic()
            resp = analyzer.messages.create(
                model="claude-haiku-4-20250314",
                max_tokens=500,
                messages=[{
                    "role": "user",
                    "content": (
                        "Analyze these negative customer feedback comments "
                        "and identify the top 3 issues to fix:\n\n"
                        + "\n".join(
                            f"- {c}"
                            for c in analysis["negative_comments"]
                        )
                        + '\n\nReturn JSON: '
                          '[{"issue": "...", "count": N, "fix": "..."}]'
                    )
                }]
            )
            analysis["top_issues"] = json.loads(
                resp.content[0].text
            )

        return analysis

Knowledge Base Updates

בסיס הידע צריך לגדול עם הזמן. שני מקורות לעדכונים:

מקור	מה	תהליך
שאלות שהסוכן לא ידע לענות	שאלות שהובילו ל-"I don't have information" או escalation	סקרו שבועית, כתבו תשובות, הוסיפו ל-KB
שינויים עסקיים	מוצרים חדשים, שינויי מדיניות, עדכוני מחירים	אוטומטי (webhook מ-CMS) או ידני

Framework: Agent Maturity Model

סוכן תמיכה מתפתח דרך 4 שלבים:

שלב	יכולות	Resolution Rate	זמן להגיע
MVP	FAQ בלבד, שפה אחת, ללא memory	30-40%	שבוע 1
Functional	FAQ + Orders + Complaints, 2 שפות, basic memory	50-65%	שבוע 2-3
Production	Full system + guardrails + monitoring + testing	65-80%	שבוע 4-6
Optimized	Personalization + learning loop + A/B testing prompts	80-90%	חודש 2-3

אל תנסו לבנות Optimized מהיום הראשון. התחילו ב-MVP, הגיעו ל-Functional, ורק אז production. כל שלב מלמד אתכם מה באמת חשוב ללקוחות שלכם.

טעויות נפוצות -- ואיך להימנע מהן

beginner15 דקותconcept

טעות 1: הסוכן ממציא מידע (Hallucination)

הבעיה: הסוכן מחזיר תשובה "בטוחה" שלא קיימת בבסיס הידע. למשל, ממציא מדיניות החזרות של 60 יום כשהמדיניות האמיתית היא 30 יום.

הפתרון: (1) הוסיפו guardrail ב-system prompt: "ONLY answer from knowledge base results" (2) הוסיפו output guardrail שבודק cross-reference מול ה-KB (3) כשאין תשובה ב-KB, הסוכן חייב לומר "I don't have that information" ולהסלים.

טעות 2: הסוכן מסרב להסלים כשצריך

הבעיה: לקוח מבקש שוב ושוב לדבר עם נציג אנושי, והסוכן ממשיך לנסות לפתור בעצמו. זו חוויה נוראית.

הפתרון: כלל ברזל: אם הלקוח מבקש אדם -- תעביר מיד. אין חריגים. הוסיפו detection ל-keywords: "speak to human", "manager", "real person", "נציג", "מנהל", "בנאדם". אם מזוהה -- escalation מיידית.

טעות 3: בסיס ידע לא מעודכן

הבעיה: המחירים, המדיניות, או פרטי המוצרים השתנו, אבל ה-KB עדיין מכיל את הגרסה הישנה. הסוכן נותן מידע שגוי בביטחון מלא.

הפתרון: (1) הוסיפו last_updated לכל מסמך (2) הוסיפו warning כשהתוצאה ישנה מ-90 יום (3) קבעו תהליך עדכון שבועי (4) webhook מ-CMS לעדכון אוטומטי.

טעות 4: Router שגוי = specialist שגוי = תשובה שגויה

הבעיה: תלונה נשלחת ל-FAQ Agent במקום ל-Complaint Agent. הלקוח הכועס מקבל תשובה "מידעית" במקום אמפטית. ההתסכלות מוכפלת.

הפתרון: (1) כלל "escalate up": אם יש ספק, שלח ל-complaint (שירות יותר טוב > ניתוב מדויק) (2) הוסיפו sentiment analysis לrouter (3) בדקו routing accuracy שבועית עם 20 הודעות חדשות.

שגרת עבודה -- תחזוקת סוכן תמיכה

תדירות	משימה	זמן
יומי	בדקו dashboard: resolution rate, escalation rate, error rate. משהו חריג?	3 דק'
שבועי	סקרו 10 שיחות אקראיות + כל השיחות עם feedback שלילי. מה צריך לשפר?	20 דק'
שבועי	בדקו "unanswered questions" -- שאלות שהסוכן לא ידע לענות עליהן. הוסיפו ל-KB	15 דק'
שבועי	הריצו test suite אחרי כל עדכון ל-KB או prompts. regression?	10 דק'
חודשי	ניתוח trends: אילו נושאים תופסים תאוצה? מה דורש FAQ חדש?	30 דק'
חודשי	A/B test על prompt changes -- האם הגרסה החדשה באמת משפרת?	20 דק'

אם אתם עושים רק דבר אחד מהפרק הזה 30 דקות

בנו FAQ Agent עם RAG. רק זה. קחו 20 FAQs של העסק שלכם (או עסק דמיוני), שימו אותם ב-ChromaDB, בנו search tool, וחברו לסוכן Claude עם system prompt שאומר "ONLY answer from search results." זה לבד כבר נותן ערך עצום -- סוכן שעונה על 60% מהשאלות הנפוצות, 24/7, בשניות, ב-$0.02 לשיחה. את כל השאר (router, specialists, escalation) אפשר להוסיף בהדרגה.

תרגילים

תרגיל 1: Full System Build (90 דקות)

בנו את המערכת המלאה מאפס:

Knowledge Base: 30+ FAQs + 5 product docs + 3 policy docs. Chunk, embed, store ב-ChromaDB
Router Agent: Haiku-based router עם 4 categories. בדקו עם 20 הודעות -- target: 90%+ accuracy
3 Specialist Agents: FAQ (with RAG), Order (with mock DB), Complaint (with escalation). בדקו כל אחד בנפרד
Orchestrator: חברו הכל -- message comes in, router classifies, specialist handles, response goes out
FastAPI: עטפו ב-API endpoint. בדקו עם curl

Deliverable: API endpoint שמקבל הודעת לקוח ומחזיר תשובה מהspecialist הנכון.

תרגיל 2: Hebrew-First Support Agent (60 דקות)

התאימו את הסוכן לשוק הישראלי:

הוסיפו 20 FAQs בעברית (מוצרים, משלוח, החזרות -- בהקשר ישראלי)
ודאו שהRouter מזהה עברית ומנתב נכון
ודאו שכל Specialist Agent עונה בעברית כשהלקוח כותב בעברית
הוסיפו Shabbat auto-reply: אם השעה בין שישי אחה"צ לשבת בערב -- תשובה אוטומטית
התאימו את הטון: ישיר, חם, לא פורמלי מדי ("היי!" לא "לקוח יקר")

בדקו: 10 שיחות בעברית. הסוכן עונה טבעי? מבין עבראנגלית ("אני צריך refund")?

תרגיל 3: Comprehensive Test Suite (45 דקות)

בנו test suite מקיף:

צרו 100 test conversations (25 לכל category, כולל edge cases)
הגדירו expected outputs לכל שיחה: category, answer contains, language, should_escalate
הריצו את הtest runner ובדקו את כל 8 הדימנסיות (routing, accuracy, hallucination, tone, escalation, language, resolution, latency)
צרו דוח מסכם: מה עובד, מה שבור, מה לשפר

Targets: routing > 90%, accuracy > 85%, hallucination-free > 95%, escalation > 90%.

תרגיל 4: Multi-Channel Integration (60 דקות)

חברו את הסוכן לערוץ תקשורת אמיתי:

אופציה A: Slack Bot -- צרו Slack app, חברו ל-API שלכם, שלחו הודעות ל-channel וקבלו תשובות
אופציה B: Telegram Bot -- צרו bot ב-BotFather, חברו webhook ל-FastAPI, שוחחו עם הסוכן ב-Telegram
אופציה C: Web Widget -- בנו chat UI פשוט ב-HTML/JS שמתחבר ל-API

לכל אופציה: ודאו שהchat history נשמר, שהsession לא מתאפס, ושescalation עובד גם דרך הchannel.

בדוק את עצמך -- 5 שאלות

תארו את הארכיטקטורה של סוכן תמיכת הלקוחות: מה תפקיד ה-Router Agent? למה משתמשים במודל קטן (Haiku) בשבילו? מה קורה כשה-confidence נמוך? (רמז: classification vs conversation)
מה ההבדל בין vector search ל-reranking ב-RAG pipeline? למה צריך את שניהם? (רמז: breadth vs precision)
כתבו 3 כללי ברזל של ה-Complaint Agent. מה הטעות הכי חמורה שסוכן תלונות יכול לעשות? (רמז: argue, refuse to escalate, promise exceptions)
מתי הסוכן חייב להסלים לנציג אנושי? רשמו 4 סיטואציות ולמה ניסיון נוסף לפתור יהיה גרוע מהסלמה. (רמז: explicit request, angry, policy exception, sensitive)
הסבירו את Agent Maturity Model: מה ההבדל בין MVP לProduction? למה אסור לנסות לבנות "Optimized" מהיום הראשון? (רמז: data, learning, iteration)

עברתם 4 מתוך 5? מצוין -- אתם מוכנים לפרק 16.

סיכום הפרק

בפרק הזה בניתם סוכן תמיכת לקוחות מלא מאפס -- הפרויקט הראשון מחלק 4 של הקורס. התחלתם עם תכנון ארכיטקטורה: Router Agent + 3 Specialist Agents + Escalation Handler. בניתם Knowledge Base עם RAG -- chunking, embedding ב-ChromaDB, retrieval pipeline עם reranking, ו-citation. מימשתם Router Agent עם Haiku שמסווג פניות ל-4 קטגוריות ב-200ms ובעלות של פרוטה. בניתם 3 Specialist Agents: FAQ Agent שעונה מבסיס הידע ולא ממציא, Order Agent שבודק סטטוס ומעבד החזרות, ו-Complaint Agent שמטפל באמפתיה ומסלים כשצריך -- כל אחד עם tools, guardrails, ו-system prompt ייעודי. מימשתם Human Escalation חלק: סיכום אוטומטי, יצירת טיקט, הודעת handoff ללקוח, ו-notification לנציג אנושי. הוספתם Conversation Memory -- זיכרון תוך-שיחתי, חוצה-שיחות, ופרופיל לקוח עם personalization. בניתם Testing Suite עם 8 דימנסיות ו-LLM-as-judge. עטפתם הכל ב-FastAPI server עם REST API, ולמדתם איך לחבר ל-Slack, WhatsApp, ו-web widget. סיימתם עם Agent Improvement Cycle -- feedback collection, weekly analysis, KB updates, ו-Agent Maturity Model שמראה את הדרך מ-MVP ל-Optimized.

הנקודה המרכזית: סוכן תמיכה טוב הוא לא מודל שפה חכם -- הוא מערכת מלאה שמשלבת routing חכם, RAG מדויק, guardrails קפדניים, escalation חלק, memory, testing, ושיפור מתמיד. כל component לבד הוא פשוט. המורכבות היא בשילוב כולם יחד למערכת שעובדת 24/7 ומשרתת לקוחות אמיתיים.

בפרק הבא (פרק 16) תבנו Research and Analysis Agent -- סוכן שחוקר נושאים מרובים sources, מנתח, ומייצר דוחות מקיפים. תשתמשו ב-fan-out pattern (חיפוש מקבילי), fact-checking, ו-report generation.

צ'קליסט -- סיכום פרק 15

תכננת ארכיטקטורה של multi-agent support system: Router + Specialists + Escalation
בנית Knowledge Base עם 20+ FAQs, chunking, embedding, ו-vector store
מימשת RAG pipeline מלא: search --> rerank --> filter --> cite
בנית Router Agent עם Haiku שמסווג ב-90%+ accuracy
בנית FAQ Agent שעונה רק מ-KB ולא ממציא מידע
בנית Order Agent עם tools לבדיקת סטטוס, tracking, ו-returns
בנית Complaint Agent אמפתי עם escalation rules ברורים
מימשת Human Escalation: summary, ticket, notification, handoff message
הוספת Conversation Memory: within-session, cross-session, customer profile
בנית Testing Suite עם 8 dimensions ו-LLM-as-judge
עטפת ב-FastAPI server עם REST API
הבנת Israeli context: תמיכה בעברית, WhatsApp, שבת, טון ישראלי
הכרת את Agent Maturity Model: MVP --> Functional --> Production --> Optimized
הקמת Improvement Cycle: feedback --> analyze --> update --> test --> deploy
הכרת 4 טעויות נפוצות: hallucination, refusal to escalate, stale KB, wrong routing