codebase_search

ⓘ Configuration Required

The codebase_search tool is part of the Codebase Indexing feature module and requires additional configuration before use, including deployment of an embedding service provider and vector database.

The codebase_search tool performs semantic retrieval on the entire codebase based on AI embedding technology. Unlike traditional text retrieval, this tool can understand the semantic connotation of retrieval queries and locate relevant code even without exact keyword matches.

Parameter Description

This tool supports the following input parameters:

query (required): Natural language query statement describing the retrieval requirements
path (optional): Directory path to limit retrieval scope to a specific directory in the codebase

Function Description

This tool searches the indexed codebase based on semantic similarity rather than exact text matching. It can discover code blocks that are conceptually related to the query statement, even if they do not contain the exact words used in the retrieval. Search results return relevant code snippets with file paths, line numbers, and similarity scores.

Applicable Scenarios

When the intelligent code assistant needs to locate code related to specific functions in the project
When finding code implementation patterns or similar code structures
When searching for conceptual code patterns such as exception handling and authentication
When exploring unfamiliar codebases to understand function implementation logic
When locating related code that may be affected by code changes or refactoring

Core Features

Semantic Understanding Capability: Retrieves code based on semantic connotation, not just exact keyword matching
Cross-Project Retrieval: Retrieval scope covers the entire indexed codebase, not just open files
Contextual Results: Returned code snippets include file paths and line numbers for quick navigation and location
Similarity Scoring: Results are sorted by relevance with similarity scores in the 0-1 range
Scope Filtering: Supports limiting retrieval to specific directories via the optional path parameter
Intelligent Ranking: Results are sorted by semantic relevance to the query statement
Interface Integration: Search results support syntax highlighting and include navigation links
Performance Optimization: Fast vector-based retrieval with configurable result count limits

Prerequisites

This tool can only be used after the codebase indexing function has been correctly configured with the following:

Function Enabled: Codebase indexing function must be configured in system settings
Embedding Service Provider: OpenAI API key must be configured
Vector Database: Qdrant instance must be running and accessible
Index Status: Codebase must have completed index construction (status: "Indexed" or "Indexing")

Limitations

Dependency on Pre-Configuration: Operation depends on external services (embedding service provider + Qdrant vector database)
Index Dependency: Only indexed code blocks can be retrieved
Result Count Limit: Maximum 50 results returned per search to ensure performance
Similarity Threshold: Only results with similarity above threshold are returned (default threshold 0.4, customizable)
File Size Limit: Only files smaller than 1MB and successfully indexed are retrieved
Language Support: Retrieval effectiveness depends on Tree-sitter's programming language parsing capabilities

Execution Process

When the codebase_search tool is called, it executes the following process:

1. Availability Verification

Verify that the code index manager has been instantiated and is available
Confirm that the codebase indexing function has been enabled in system settings
Check the validity of index configuration (including API key, Qdrant service address)
Verify that the current index status supports retrieval operations

2. Query Statement Processing

Receive natural language query statement and generate corresponding embedding vector
Use the same embedding service provider (OpenAI) as configured for code indexing
Convert the semantic connotation of the query statement into a mathematical vector representation

3. Vector Retrieval Execution

Retrieve similar code embedding vectors in the Qdrant vector database
Use cosine similarity algorithm to locate the most relevant code blocks
Apply minimum similarity threshold (default 0.4, customizable) to filter results
Limit results to 50 items for optimal performance

4. Path Filtering (if path parameter specified)

Filter retrieval results to files in the specified directory
Use standardized path comparison to ensure filtering accuracy
Maintain semantic relevance ranking within the filtered scope

5. Result Processing and Formatting

Convert absolute file paths to workspace-relative paths
Structurally organize results by "file path, line number range, similarity score, code content"
Adapt to both AI parsing and interface display format requirements, supporting syntax highlighting

6. Dual Output Format

AI-Compatible Format: Structured text format containing query statement, file path, score, and code snippet
Interface Display Format: JSON format supporting syntax highlighting and code navigation functionality

Search Query Best Practices

Effective Query Patterns

Recommended: Conceptual and specific descriptions

xml

<codebase_search>
<query>User authentication and password validation logic</query>
</codebase_search>

Recommended: Function-centered descriptions

xml

<codebase_search>
<query>Database connection pool configuration implementation</query>
</codebase_search>

Recommended: Problem-oriented descriptions

xml

<codebase_search>
<query>Exception handling mechanism for API requests</query>
</codebase_search>

Not recommended: Overly general descriptions

xml

<codebase_search>
<query>Function</query>
</codebase_search>

Well-Adapted Query Types

Function description: "File upload processing logic", "Email validation rules"
Technical pattern: "Singleton pattern implementation", "Factory method usage scenarios"
Business domain: "User profile management logic", "Payment processing flow"
Architecture component: "Middleware configuration methods", "Database migration scripts"

Directory Scope Limitation

The retrieval scope can be focused on a specific directory in the codebase using the optional path parameter:

Retrieve code within API module:

xml

<codebase_search>
<query>Interface request validation middleware</query>
<path>src/api</path>
</codebase_search>

Retrieve code within test files:

xml

<codebase_search>
<query>Mock data construction patterns</query>
<path>tests</path>
</codebase_search>

Retrieve code within specific function directories:

xml

<codebase_search>
<query>Component state management logic</query>
<path>src/components/auth</path>
</codebase_search>

Result Interpretation

Similarity Score Range Explanation

0.8-1.0: Highly relevant matches, likely the target retrieval content
0.6-0.8: Good relevance, strong conceptual matching
0.4-0.6: Potentially relevant, requires manual verification
Below 0.4: Too low similarity, filtered out and not returned

Result Structure Explanation

Each search result contains the following information:

File path: Workspace-relative path of the file containing matching code
Score: Similarity score indicating relevance (value range 0.4-1.0)
Line number range: Start and end line numbers of the matching code block
Code snippet: Actual code content matching the query statement

Practical Application Examples

When implementing new features, the intelligent code assistant retrieves "authentication middleware" to understand existing implementation patterns before writing new code;
When debugging issues, the intelligent code assistant retrieves "exception handling in API calls" to locate relevant exception handling patterns in the project;
When refactoring code, the intelligent code assistant retrieves "database transaction implementation patterns" to ensure consistent implementation of all database operations;
When accessing a new codebase, the intelligent code assistant retrieves "configuration loading logic" to understand the application's startup initialization process.

Tool Usage Examples

Retrieve authentication-related code across the entire project:

xml

<codebase_search>
<query>User login and authentication logic</query>
</codebase_search>

Retrieve database-related code in a specific directory:

xml

<codebase_search>
<query>Database connection and query execution logic</query>
<path>src/data</path>
</codebase_search>

Retrieve exception handling patterns in API code:

xml

<codebase_search>
<query>HTTP error response and exception handling mechanism</query>
<path>src/api</path>
</codebase_search>

Retrieve test tools and mock data construction-related code:

xml

<codebase_search>
<query>Test environment setup and mock data generation</query>
<path>tests</path>
</codebase_search>

Retrieve configuration and environment initialization-related code:

xml

<codebase_search>
<query>Environment variable and application configuration loading logic</query>
</codebase_search>

codebase_search ​

Parameter Description ​

Function Description ​

Applicable Scenarios ​

Core Features ​

Prerequisites ​

Limitations ​

Execution Process ​

1. Availability Verification ​

2. Query Statement Processing ​

3. Vector Retrieval Execution ​

4. Path Filtering (if path parameter specified) ​

5. Result Processing and Formatting ​

6. Dual Output Format ​

Search Query Best Practices ​

Effective Query Patterns ​

Well-Adapted Query Types ​

Directory Scope Limitation ​

Result Interpretation ​

Similarity Score Range Explanation ​

Result Structure Explanation ​

Practical Application Examples ​

Tool Usage Examples ​