Building
Agentic AI Systems
from Scratch

One Problem, Three Frameworks

LangChain  Β·  LangGraph  Β·  CrewAI

Instructor: Ruslan Magana Vsevolodovna  |  ruslanmv.com

What You Will Learn

🧠 Agentic AI Theory

What agents are, how ReAct works, and when to use multi-agent systems

πŸ› οΈ Three Frameworks

Build the same system in LangChain, LangGraph, and CrewAI β€” compare hands-on

πŸ”§ Production Skills

PII masking, guardrails, testing, evaluation with precision/recall/F1

πŸ“Š Make the Decision

Run a framework comparison and pick the right tool for the job

Course Outline

01 Foundations β€” Spectrum, Memory, MCP/A2A, Orchestration
02 The Problem & Setup
03 Shared Modules β€” Schema, PII, Fallback, Routing
04 Agent Tools β€” Evidence over Guessing
05 LangChain β€” ReAct Agent
06 LangGraph β€” State Machine
07 CrewAI β€” Multi-Agent Crew
08 Testing & Quality
09 Evaluation & Metrics
10 Framework Verdict
01

Agentic AI
Foundations

What is an agent? The spectrum from LLM to CoT to Agent. Memory, MCP, A2A, orchestration.

What Is an Agent?

A system that controls the flow, not just the output.

Perceive Read input Reason Decide next step Act Call tools Observe Check result Iterate until done

The agent decides what to do next β€” a regular LLM just gives you one answer.

The Spectrum: LLM β†’ CoT β†’ Agent

It is not binary. There is a spectrum of intelligence β€” understanding it helps you choose the right level for each problem.

Conventional LLM

A Large Language Model takes one prompt, produces one response. No tools, no memory, no iteration. Fast and cheap, but if it hallucinates, you have no safety net.

Best for: text generation, Q&A, summarisation

Chain-of-Thought (CoT)

The model generates its reasoning steps before answering. "The URL is suspicious… the tone is urgent… classic phishing." Much more accurate β€” but still reasoning in a vacuum. It cannot verify anything externally.

Used by: DeepSeek-R1, OpenAI o1/o3, Claude

Agentic AI

The LLM reasons plus takes real actions β€” calling tools, checking databases, scanning URLs. It acts on the world and feeds results back into its reasoning. Evidence-based decisions.

Best for: workflows, decisions, business impact

Rule of thumb: if the task needs external data or has business impact β†’ use an agent. Pure text β†’ LLM or CoT is enough.

The ReAct Pattern

Reason + Act β€” the loop at the heart of modern agents

🧠 REASON
"What next?"
β†’
⚑ ACT
Call a tool
β†’
πŸ‘οΈ OBSERVE
Check result
β†’
πŸ” REPEAT
Until goal met

Reasoning is explicit, auditable, and debuggable β€” every step is logged.

Multi-Agent Systems

Multiple agents, each with a specialised role, collaborating to solve a problem

🏷️ CLASSIFIER
"What is this email?"
β†’
πŸ›‘οΈ RISK ANALYST
"Is this dangerous?"
β†’
πŸ“‹ POLICY ROUTER
"What do we do?"

βœ… Separation of concerns

Each agent has one job

βœ… Extensibility

Add a new agent without rewriting

Memory & Context: How Agents Remember

Three layers of memory β€” each extends how far the agent can reach.

Short-Term Memory

The conversation itself. Each reasoning step, tool call, and observation is appended to the message history. The LLM sees everything from the current run inside its context window β€” a fixed-size buffer measured in tokens (roughly 3/4 of a word). GPT-4o-mini: 128k tokens. Claude: 200k tokens.

Long-Term Memory

LangGraph supports checkpointing β€” serialising the agent's full state to a persistent store. The agent can resume conversations or recall information from previous sessions. This is how you build agents that remember past interactions.

RAG + Vector Stores

An embedding is a numerical representation of text that captures meaning. A vector store (ChromaDB, Pinecone, FAISS) indexes embeddings for fast semantic search. The agent converts its query to an embedding, finds similar passages, and injects them into the prompt. This is Retrieval-Augmented Generation (RAG).

In this course we use short-term memory. For production, add checkpointing and RAG for knowledge that exceeds the context window.

MCP & A2A: Connecting Agents

Two protocols power the agentic ecosystem. Understanding the difference is critical.

MCP β€” Model Context Protocol

An open standard (by Anthropic) for connecting agents to tools β€” passive functions that take input and return output. URL scanners, database queries, weather APIs. The MCP Server exposes tools; the MCP Client (the agent) discovers and calls them.

Transport: stdio (local subprocess) or SSE (Server-Sent Events β€” remote HTTP service).

A2A β€” Agent-to-Agent Protocol

An architectural pattern (by Google) for connecting agents to other autonomous agents β€” systems that reason, use their own tools, and make their own decisions. The difference: MCP tools are passive functions. A2A agents are active reasoners.

Example: a coordinator delegates "find flights" to a Flight Agent that reasons about layovers, compares prices, and calls airline APIs via MCP.

Orchestration Patterns

Choosing the right pattern is often more important than choosing the right framework.

Sequential Pipeline CrewAI A B C Fixed order. Simple, debuggable. DAG (Directed Acyclic Graph) LangGraph Start ? route fallback Conditional branches. Auditable. ReAct Loop LangChain Think Act Observe LLM decides when to stop. Hierarchical Manager Worker Worker Worker Routing-Based Input R Billing Support Returns ← only one

In this course: ReAct Loop (LangChain), DAG (LangGraph), Sequential (CrewAI).

Key Takeaway


An agent is a system that reasons, acts, and iterates toward a goal using tools and state, rather than producing a single response. It manages memory through context windows, checkpoints, and vector stores. It connects to tools via MCP and to other agents via A2A. The orchestration pattern β€” sequential, DAG, ReAct, hierarchical, or routing β€” determines how much control vs. flexibility the system has.
02

The Problem &
Project Setup

Enterprise email classification β€” one problem, three frameworks.

The Problem: Email Triage

Classify incoming emails and route them to the right action.

CategoryExampleAction
phishing"Verify your account immediately"Quarantine + review
spam"Limited time! Win a free iPhone"Quarantine
invoice"Invoice #2026-042 β€” payment due"Accounts payable
meeting"Team sync Thursday 10 AM"Calendar suggestion
support"Ticket #5432 β€” production outage"Support ticket
otherEverything elseInbox

The Pipeline

Shared by all three approaches

πŸ“§ Preprocess
PII (Personally Identifiable Information) masking
β†’
πŸ€– Classify
LLM / agent
β†’
πŸ›‘οΈ Guardrails
Business rules
β†’
🚦 Route
Action

Clone & Install

git clone https://github.com/ruslanmv/agentic-ai-concepts.git
cd agentic-ai-concepts

python3 -m venv .venv
source .venv/bin/activate

make install          # pip install -r requirements.txt
export OPENAI_API_KEY="sk-..."

Verify the setup:

make test             # 80 offline tests β€” no API key needed
make evaluate         # baseline evaluation against golden dataset

Project Structure

agentic-ai-concepts/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ schema.py              # Pydantic models
β”‚   β”œβ”€β”€ preprocessing.py       # PII masking
β”‚   β”œβ”€β”€ fallback.py            # Keyword fallback
β”‚   β”œβ”€β”€ routing.py             # Label β†’ action
β”‚   β”œβ”€β”€ tools.py               # Agent tools
β”‚   β”œβ”€β”€ evaluate.py            # Metrics + prod gate
β”‚   β”œβ”€β”€ langchain_agent.py     # Approach 1
β”‚   β”œβ”€β”€ langgraph_agent.py     # Approach 2
β”‚   └── crewai_agent.py        # Approach 3
β”œβ”€β”€ data/golden_dataset.csv    # 30 labelled emails
β”œβ”€β”€ tests/                     # 80 offline + 12 integration
└── examples/
    β”œβ”€β”€ run_all.py
    └── compare_frameworks.py  # Side-by-side verdict
03

Building the
Shared Modules

Schema, PII preprocessing, keyword fallback, routing.

Module 1 β€” Schema

The contract all three frameworks must produce.

class EmailLabel(str, Enum):
    PHISHING = "phishing"
    SPAM     = "spam"
    INVOICE  = "invoice"
    MEETING  = "meeting"
    SUPPORT  = "support"
    OTHER    = "other"

class EmailClassification(BaseModel):
    label: EmailLabel
    confidence: confloat(ge=0.0, le=1.0)
    rationale: str
    indicators: List[str] = []
    requires_human_review: bool = False

Pydantic enforces the contract β€” invalid LLM output fails fast.

Module 2 β€” PII Preprocessing

Replace sensitive data before sending to the LLM.

_PII_PATTERNS = OrderedDict(
    SSN=re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
    CREDIT_CARD=re.compile(r"\b(?:\d[ -]*?){13,16}\b"),
    EMAIL=re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"),
    IBAN=re.compile(r"\b[A-Z]{2}\d{2}[A-Z0-9]{11,30}\b"),
    PHONE=re.compile(r"\b(\+?\d[\d\s\-\(\)]{7,}\d)\b"),
)

def mask_pii(text: str) -> str:
    for tag, pattern in _PII_PATTERNS.items():
        text = pattern.sub(f"[{tag}]", text)
    return text
>>> mask_pii("Contact [email protected], SSN 123-45-6789")
'Contact [EMAIL], SSN [SSN]'

Module 3 β€” Keyword Fallback

Deterministic safety net β€” always have a fallback for any probabilistic component.

_KEYWORD_MAP = {
    EmailLabel.PHISHING: ["verify", "password", "urgent", "suspend", 
                          "locked", "click", "link", "account"],
    EmailLabel.INVOICE:  ["invoice", "payment", "remittance", "iban", "vat"],
    EmailLabel.MEETING:  ["meeting", "calendar", "invite", "zoom", "agenda"],
    EmailLabel.SUPPORT:  ["ticket", "issue", "bug", "incident", "outage"],
    EmailLabel.SPAM:     ["unsubscribe", "promotion", "deal", "free", "win"],
}

def keyword_fallback(subject, body) -> EmailClassification:
    # Count keyword hits per category β†’ pick highest
    # Confidence capped at 0.8 (honest about limitations)
    ...

Module 4 β€” Routing

Map classification β†’ downstream action. Human review always takes priority.

class RouteAction(str, Enum):
    HUMAN_REVIEW = "queue_for_human_review"
    AP_QUEUE     = "send_to_ap_queue"
    CALENDAR     = "create_calendar_suggestion"
    TICKET       = "create_support_ticket"
    QUARANTINE   = "quarantine"
    INBOX        = "inbox"

def route(classification: EmailClassification) -> RouteAction:
    if classification.requires_human_review:
        return RouteAction.HUMAN_REVIEW    # always takes priority
    return _LABEL_TO_ACTION.get(classification.label, RouteAction.INBOX)
04

Agent Tools

Giving the agent access to real data β€” evidence over guessing.

Why Tools Matter

πŸ” Real data

The agent checks facts instead of hallucinating

πŸ“‹ Evidence-based

Classification backed by tool results

πŸ“ Audit trail

Every tool call is logged β€” you know why

🧠 Agent decides

LLM picks which tools to call per email

Four Tools

check_sender_reputation

Takes a domain name and returns a risk score. Is this sender known to be malicious?

Triggered by: suspicious sender

scan_urls

Extracts all URLs from the email body and checks them against known malicious patterns.

Triggered by: links in body

lookup_known_contacts

Checks if the sender is in our internal contact list β€” a known, trusted colleague or vendor.

Triggered by: sender email

check_invoice_registry

Validates whether an invoice number matches a known record in accounts payable.

Triggered by: invoice number

Simulated databases for offline testing. In production, swap for real API (Application Programming Interface) calls.

Tool Example: scan_urls

@tool
def scan_urls(email_body: str) -> str:
    """Scan all URLs in an email body for malicious indicators."""
    url_pattern = re.compile(r"https?://[^\s<>\"']+")
    urls = url_pattern.findall(email_body)

    if not urls:
        return "No URLs found. URL risk: NONE."

    results = []
    for url in urls:
        is_malicious = any(
            re.search(p, url.lower())
            for p in _MALICIOUS_URL_PATTERNS
        )
        if is_malicious:
            results.append(f"  {url} β†’ MALICIOUS (risk: 0.9)")
        ...

    return f"Found {len(urls)} URL(s):\n" + "\n".join(results)

The @tool decorator makes it callable by the LLM agent.

Different Emails β†’ Different Tools

ReAct Agent Phishing Email Invoice Email Meeting Invite scan_urls check_sender_reputation check_invoice_registry lookup_known_contacts The LLM decides which tools to call β€” no hardcoded paths
05

LangChain
ReAct Agent

The LLM decides which tools to call. Evidence-based classification.

The ReAct Loop in Action

Email arrives: "URGENT: Verify your account" + URL: https://totallylegit.com/verify REASON: "Suspicious URL + urgency β†’ let me scan it" ACT: scan_urls(body) OBSERVE: "MALICIOUS URL detected (risk: 0.90)" REASON: "Malicious URL confirmed. Check the sender too." ACT: check_sender_reputation("totallylegit.com") OBSERVE: "HIGH RISK domain (score: 0.92)" REASON: "Both tools confirm phishing. High confidence." OUTPUT: {"label": "phishing", "confidence": 0.95, ...}

Building the Agent

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from src.tools import ALL_TOOLS

def _build_agent():
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.0)

    agent = create_react_agent(
        model=llm,
        tools=ALL_TOOLS,      # 4 tools from src/tools.py
        prompt=SYSTEM_PROMPT,
    )
    return agent

That's it β€” create_react_agent handles the entire Reason β†’ Act β†’ Observe loop.

System Prompt

SYSTEM_PROMPT = """
You are an enterprise email triage agent.
Classify emails by gathering evidence using your tools.

## Workflow
1. Analyse the email for signals
2. Use tools to gather evidence:
   - URLs or suspicious β†’ scan_urls
   - Sender domain β†’ check_sender_reputation
   - Sender email β†’ lookup_known_contacts
   - Invoice number β†’ check_invoice_registry
3. Classify: phishing/spam/invoice/meeting/support/other
4. Return JSON: label, confidence, rationale, indicators

## Rules
- Phishing β†’ always requires_human_review = true
- Base confidence on tool evidence, not gut feeling
"""

Guardrails

Applied after the agent produces a classification.

CONFIDENCE_THRESHOLD = 0.6

def apply_guardrails(classification, subject, body):
    # Hard rule: phishing β†’ always flag, cap confidence
    if classification.label == EmailLabel.PHISHING:
        classification.requires_human_review = True
        classification.confidence = min(classification.confidence, 0.85)
        return classification

    # Soft rule: low confidence β†’ deterministic fallback
    if classification.confidence < CONFIDENCE_THRESHOLD:
        return keyword_fallback(subject, body)

    return classification

Run It

make run-langchain
# or: python -m src.langchain_agent

Subject: URGENT: Verify your account now Label: phishing Confidence: 0.85 Action: queue_for_human_review Review: True Tools used: ['scan_urls', 'check_sender_reputation']

The tools_used field is the audit trail β€” you know exactly why the agent decided.

06

LangGraph
State Machine

Explicit graph. Typed state. Conditional edges. Bank-grade auditability.

Architecture

START preprocess classify guardrails ? < 0.6 fallback ≥ 0.6 route END

Typed State

class GraphState(TypedDict):
    """Immutable state flowing through the graph."""
    subject: str
    body: str
    masked_body: str
    classification: Optional[EmailClassification]
    action: Optional[str]

No hidden mutations

Every field is typed and explicit

Auditable

Compliance can review the full state at any node

Node Functions

Each node is independently testable.

def preprocess_node(state: GraphState) -> dict:
    sanitised = preprocess_email(state["subject"], state["body"])
    return {"masked_body": sanitised["body"]}

def classify_node(state: GraphState) -> dict:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.0)
    classifier = llm.with_structured_output(EmailClassification)
    result = (prompt | classifier).invoke({
        "subject": state["subject"], "body": state["masked_body"],
    })
    return {"classification": result}

def guardrails_node(state: GraphState) -> dict:
    result = state["classification"]
    if result.label == EmailLabel.PHISHING:
        result.requires_human_review = True
        result.confidence = min(result.confidence, 0.85)
    return {"classification": result}

Building the Graph

from langgraph.graph import END, StateGraph

def build_graph():
    graph = StateGraph(GraphState)

    # Nodes
    graph.add_node("preprocess", preprocess_node)
    graph.add_node("classify",   classify_node)
    graph.add_node("guardrails", guardrails_node)
    graph.add_node("fallback",   fallback_node)
    graph.add_node("route",      route_node)

    # Edges
    graph.set_entry_point("preprocess")
    graph.add_edge("preprocess", "classify")
    graph.add_edge("classify",   "guardrails")
    graph.add_conditional_edges("guardrails",
        decide_after_guardrails,
        {"fallback": "fallback", "route": "route"})
    graph.add_edge("fallback", "route")
    graph.add_edge("route", END)

    return graph.compile()

Why LangGraph for Enterprise

Explicit edges

Every transition is declared and reviewable

Typed state

No silent mutations β€” TypedDict enforced

Conditional branching

Logic is declared, not buried in if/else

Unit testable

Each node function is independently testable

07

CrewAI
Multi-Agent

Three agents collaborate: Classifier β†’ Risk Analyst β†’ Policy Router.

Three Agents, One Crew

🏷️ Classifier

"Senior email triage specialist at a Fortune 500 company"

  • Produces initial label + confidence

πŸ›‘οΈ Risk Analyst

"Cybersecurity analyst specialising in email threats"

  • Reviews for false negatives
  • Escalates suspicious items

πŸ“‹ Policy Router

"Compliance officer β€” when in doubt, escalate"

  • Determines final action
  • Applies company policy

Agent Definition

from crewai import Agent, Crew, Process, Task

classifier_agent = Agent(
    role="Email Classifier",
    goal="Classify the email into one of: phishing, spam, "
         "invoice, meeting, support, other.",
    backstory="You are a senior email triage specialist "
              "at a Fortune 500 company.",
    verbose=False,
    allow_delegation=False,
)

# ... risk_agent, policy_agent defined similarly

crew = Crew(
    agents=[classifier_agent, risk_agent, policy_agent],
    tasks=[classify_task, risk_task, policy_task],
    process=Process.sequential,
)
result = crew.kickoff()

Trade-offs

Strengths

Clear role separation β€” each agent has one job. Easy to extend β€” add a fourth agent for compliance or translation without rewriting anything. Mirrors how human teams collaborate: analyst, reviewer, decision-maker.

Costs

Three LLM calls per email instead of one. More latency and higher API spend. For a single classification task the extra agents do not significantly improve accuracy β€” the overhead is not justified.

Best for complex problems that genuinely need collaborative multi-step reasoning across different roles.

08

Testing &
Quality

80 offline tests. No API key needed. Every component covered.

Test Strategy

🟒 80 Offline Tests

  • PII masking patterns
  • Keyword fallback β€” all 6 categories
  • Routing β€” label-to-action mapping
  • Guardrails β€” phishing, thresholds
  • Tools β€” every function
  • Evaluation β€” metrics math, prod gate
  • JSON parsing β€” 3 extraction strategies

πŸ”΅ 12 Integration Tests

  • Require OPENAI_API_KEY
  • End-to-end pipeline
  • Verify tool usage
  • Test all 3 frameworks

Separated by @pytest.mark.integration

Running Tests

# Offline only (fast, no API key)
make test

# Everything including live LLM
make test-all

tests/test_tools.py::TestScanUrls::test_malicious_url ............. PASSED tests/test_evaluate.py::TestComputeMetrics::test_precision_recall .. PASSED tests/test_fallback.py::TestKeywordFallback::test_phishing ........ PASSED tests/test_routing.py::TestRouting::test_invoice_to_ap_queue ...... PASSED ... ========================= 80 passed, 12 deselected in 1.54s ============
09

Evaluation
Before Production

Golden dataset. Precision & recall. Production readiness gate.

The Golden Dataset

data/golden_dataset.csv β€” 30 hand-labelled emails

CategoryEasyMediumHardTotal
phishing1225
spam2114
invoice3115
meeting2226
support2125
other2305

Hard samples: BEC wire transfer, legit security alert, ambiguous reply threads.

Precision vs Recall

Precision

Of everything flagged as X, how many actually were X?

Low precision = too many false alarms β†’ alert fatigue

Recall

Of all actual X emails, how many did we catch?

Low recall = missed threats β†’ security risk


F1 = harmonic mean of both. It balances precision and recall.

Production Readiness Gate

MIN_WEIGHTED_F1      = 0.70   # Overall performance
MIN_PHISHING_RECALL  = 0.80   # Must catch β‰₯ 80% of phishing
MIN_PHISHING_PREC    = 0.60   # Must not over-flag

Three checks. All must pass.

Weighted F1 β‰₯ 0.70

Overall quality

Phishing recall β‰₯ 0.80

Safety-critical

Phishing precision β‰₯ 0.60

Alert fatigue

Baseline Result: Keyword Fallback

make evaluate
Class Precision Recall F1 Support phishing 0.50 1.00 0.67 5 spam 0.75 0.75 0.75 4 invoice 0.83 1.00 0.91 5 meeting 0.86 1.00 0.92 6 support 1.00 0.60 0.75 5 other 0.00 0.00 0.00 5 Weighted F1: 0.67 Accuracy: 73.3% Production Gate: βœ— FAIL β€” Not ready for production βœ— Weighted F1 = 0.67 < 0.70 βœ“ Phishing recall = 1.00 β‰₯ 0.80 βœ— Phishing precision = 0.50 < 0.60

Keywords alone aren't enough. We need the LLM agents.

10

Framework
Comparison &
The Verdict

Head-to-head results. Which framework wins?

Run the Comparison

export OPENAI_API_KEY="sk-..."
make compare

Runs all 30 golden samples through all 4 approaches.

# Or directly:
python examples/compare_frameworks.py --all

Side-by-Side Results

MetricFallbackLangChainLangGraphCrewAI
Accuracy73.3%~90% β˜…~90%~87%
Weighted F10.67~0.90 β˜…~0.90~0.87
Phishing Recall1.001.001.000.80
Phishing Precision0.500.830.830.80
"other" F10.000.780.750.78
Time (30 emails)0.0s~45s~13s~68s
LLM calls/email01 (multi-turn)13
Prod GateFAILPASSPASSPASS

Numbers may vary slightly due to LLM non-determinism.

The Verdict by Criterion

CriterionWinnerWhy
Best accuracyLangChain β‰ˆ LangGraphBoth ~90%
Best speedLangGraphSingle LLM call, ~3x faster
Best auditabilityLangGraphExplicit edges, typed state
Best safetyLangChainTool evidence = audit trail
Best costLangGraph1 call vs multi-turn vs 3
Best extensibilityCrewAIAdding an agent is trivial

πŸ† Recommendation


LangGraph for Production

Best balance of accuracy, speed, cost, and auditability. Every edge is reviewable. Typed state. Independently testable nodes.

LangChain for Prototyping

Evidence-based reasoning with tool audit trail. Better at explaining decisions. Excellent for discovery.


CrewAI is the right choice when you need collaborative multi-step reasoning across truly different roles β€” but overkill for single classification.

Production Architecture

Email LangGraph Primary classifier <.6? no Route Action yes Keyword Fallback Nightly: make compare + CI/CD gate

Saving & CI/CD (Continuous Integration / Continuous Delivery) Integration

# In your CI pipeline:
python examples/compare_frameworks.py --all \
    --output data/eval_results/comparison.json

# Parse the result:
import json
data = json.load(open('data/eval_results/comparison.json'))
winner = data.get('recommended')
print(f'Recommended: {winner}')
for a in data['approaches']:
    gate = 'βœ“' if a['passed_production_gate'] else 'βœ—'
    print(f"  {gate} {a['approach']}: F1={a['weighted_f1']:.4f}")
β˜…

Key Takeaways
& Next Steps

What Makes an Agent Production-Ready

Explicit decision-making

The system reasons before acting

Controlled actions

Tools provide real data, not hallucinations

State and memory

Each step knows what came before

Safety and governance

Guardrails enforce business rules at every step

The One-Paragraph Summary


An agent is a system that reasons, acts, and iterates toward a goal using tools and state, rather than producing a single response. Multi-agent systems decompose complex workflows across specialised roles. LangChain enables ReAct-style agents with dynamic tool selection, LangGraph provides deterministic state-machine orchestration for enterprise workflows, and CrewAI enables collaborative multi-agent designs. Agentic systems trade simplicity for control, safety, and auditability.

What to Build Next

πŸ”Œ Real APIs

Replace simulated tools with real threat intelligence, CRM, and invoice APIs

πŸ’Ύ Memory

Use LangGraph checkpointing to persist state across sessions

πŸ”„ Feedback loop

Let human reviewers correct classifications to improve the model

πŸš€ Deploy

Wrap classify_email in a FastAPI endpoint with make compare in CI/CD

Thank You!


πŸ“¦ github.com/ruslanmv/agentic-ai-concepts

πŸ“ Full tutorial: docs/blog.md

🌐 ruslanmv.com


If this course helped, ⭐ the repo and share with your team.