Skip to main content
When an agent needs information, it searches for relevant chunks rather than loading everything into the prompt. This keeps responses focused and efficient.
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.pgvector import PgVector
from agno.vectordb.search import SearchType

knowledge = Knowledge(
    vector_db=PgVector(
        table_name="embeddings",
        db_url=db_url,
        search_type=SearchType.hybrid,
    ),
    max_results=5,
)

results = knowledge.search("What's our return policy?")

How Search Works

1

Query Analysis

The agent analyzes the user’s question to understand what information would help.
2

Search Execution

The system runs vector, keyword, or hybrid search based on configuration.
3

Retrieval

The knowledge base returns the most relevant content chunks.
4

Response Generation

Retrieved information is combined with the question to generate a response.

Search Types

Finds content by meaning, not exact words. When you ask “How do I reset my password?”, it finds documents about “changing credentials” even if those exact words don’t appear.
vector_db = PgVector(
    table_name="embeddings",
    db_url=db_url,
    search_type=SearchType.vector,
)
Best for: Conceptual questions where users phrase things differently than your docs. Classic text search that matches exact words and phrases. Uses your database’s full-text search or keyword matching capabilities.
vector_db = PgVector(
    table_name="embeddings",
    db_url=db_url,
    search_type=SearchType.keyword,
)
Best for: Specific terms, product names, error codes, technical identifiers. Combines vector similarity with keyword matching. Usually the best choice for production.
from agno.knowledge.reranker.cohere import CohereReranker

vector_db = PgVector(
    table_name="embeddings",
    db_url=db_url,
    search_type=SearchType.hybrid,
    reranker=CohereReranker(),  # Optional: improves result ordering
)
Best for: Most real-world applications where you want both semantic understanding and exact-match precision.
Start with hybrid search and add a reranker for best results.

Agentic vs Traditional RAG

Traditional RAG always searches with the exact user query and injects results into the prompt. Agentic RAG lets the agent decide when to search, reformulate queries, and run follow-up searches if needed.
# Always searches, always injects results
results = knowledge.search(user_query)
context = "\n\n".join([d.content for d in results])
response = llm.generate(user_query + "\n" + context)
With Agentic RAG, the agent can:
  • Skip searching when it already knows the answer
  • Reformulate queries for better results
  • Run multiple searches to gather complete information
  • Combine results from different searches

Filtering Results

Filter searches by metadata to target specific content:
# Add content with metadata
knowledge.insert(
    path="policies/",
    metadata={"department": "hr", "type": "policy", "year": 2024}
)

# Search with filters
results = knowledge.search(
    query="vacation policy",
    filters={"department": "hr", "type": "policy"}
)

# Use filters with agents
agent.print_response(
    "What's our vacation policy?",
    knowledge_filters={"department": "hr"}
)
For complex filtering with OR, NOT, and comparisons, see Filtering.

Custom Retrieval Logic

Override the default search behavior with a custom retriever:
async def my_retriever(query: str, num_documents: int = 5, filters: dict = None, **kwargs):
    # Reformulate query
    expanded_query = query.replace("vacation", "paid time off PTO")

    # Run search
    docs = await knowledge.asearch(expanded_query, max_results=num_documents, filters=filters)

    return [d.to_dict() for d in docs]

agent = Agent(
    knowledge=knowledge,
    knowledge_retriever=my_retriever,
)

Improving Search Quality

Chunk Size

How you split content affects retrieval precision:
Chunk SizeTrade-off
Small (1000-3000 chars)More precise, but may miss context
Default (5000 chars)Balanced precision and context
Large (8000+ chars)More context, but less targeted
Semantic chunkingSplits at natural topic boundaries

Embedding Model

Your embedder converts text into vectors that capture meaning. The right choice depends on your content:
TypeUse Case
General-purpose (OpenAI, Gemini)Works well for most content
Domain-specificBetter for specialized fields like medical or legal
MultilingualRequired for non-English or mixed-language content
See Embedders for available options.

Metadata

Rich metadata enables better filtering:
# Good: specific, consistent, filterable
metadata = {
    "department": "engineering",
    "document_type": "runbook",
    "service": "payments",
    "last_updated": "2024-01-15",
}

# Bad: vague, inconsistent
metadata = {"type": "doc", "id": "12345"}

Content Structure

Well-organized content searches better:
  • Use clear headings and sections
  • Include relevant terminology naturally
  • Add summaries at the top of long documents
  • Use descriptive filenames (hr_vacation_policy_2024.pdf not document1.pdf)

Testing

Test with real queries to validate search quality:
test_queries = [
    "What's our vacation policy?",
    "How do I submit expenses?",
    "Remote work guidelines",
]

for query in test_queries:
    results = knowledge.search(query)
    print(f"{query} -> {results[0].content[:100]}..." if results else "No results")

Next Steps