codebase_search
ⓘ Configuration Required
The codebase_search tool is part of the Codebase Indexing feature module and requires additional configuration before use, including deployment of an embedding service provider and vector database.
The codebase_search tool performs semantic retrieval on the entire codebase based on AI embedding technology. Unlike traditional text retrieval, this tool can understand the semantic connotation of retrieval queries and locate relevant code even without exact keyword matches.
Parameter Description
This tool supports the following input parameters:
query(required): Natural language query statement describing the retrieval requirementspath(optional): Directory path to limit retrieval scope to a specific directory in the codebase
Function Description
This tool searches the indexed codebase based on semantic similarity rather than exact text matching. It can discover code blocks that are conceptually related to the query statement, even if they do not contain the exact words used in the retrieval. Search results return relevant code snippets with file paths, line numbers, and similarity scores.
Applicable Scenarios
- When the intelligent code assistant needs to locate code related to specific functions in the project
- When finding code implementation patterns or similar code structures
- When searching for conceptual code patterns such as exception handling and authentication
- When exploring unfamiliar codebases to understand function implementation logic
- When locating related code that may be affected by code changes or refactoring
Core Features
- Semantic Understanding Capability: Retrieves code based on semantic connotation, not just exact keyword matching
- Cross-Project Retrieval: Retrieval scope covers the entire indexed codebase, not just open files
- Contextual Results: Returned code snippets include file paths and line numbers for quick navigation and location
- Similarity Scoring: Results are sorted by relevance with similarity scores in the 0-1 range
- Scope Filtering: Supports limiting retrieval to specific directories via the optional path parameter
- Intelligent Ranking: Results are sorted by semantic relevance to the query statement
- Interface Integration: Search results support syntax highlighting and include navigation links
- Performance Optimization: Fast vector-based retrieval with configurable result count limits
Prerequisites
This tool can only be used after the codebase indexing function has been correctly configured with the following:
- Function Enabled: Codebase indexing function must be configured in system settings
- Embedding Service Provider: OpenAI API key must be configured
- Vector Database: Qdrant instance must be running and accessible
- Index Status: Codebase must have completed index construction (status: "Indexed" or "Indexing")
Limitations
- Dependency on Pre-Configuration: Operation depends on external services (embedding service provider + Qdrant vector database)
- Index Dependency: Only indexed code blocks can be retrieved
- Result Count Limit: Maximum 50 results returned per search to ensure performance
- Similarity Threshold: Only results with similarity above threshold are returned (default threshold 0.4, customizable)
- File Size Limit: Only files smaller than 1MB and successfully indexed are retrieved
- Language Support: Retrieval effectiveness depends on Tree-sitter's programming language parsing capabilities
Execution Process
When the codebase_search tool is called, it executes the following process:
1. Availability Verification
- Verify that the code index manager has been instantiated and is available
- Confirm that the codebase indexing function has been enabled in system settings
- Check the validity of index configuration (including API key, Qdrant service address)
- Verify that the current index status supports retrieval operations
2. Query Statement Processing
- Receive natural language query statement and generate corresponding embedding vector
- Use the same embedding service provider (OpenAI) as configured for code indexing
- Convert the semantic connotation of the query statement into a mathematical vector representation
3. Vector Retrieval Execution
- Retrieve similar code embedding vectors in the Qdrant vector database
- Use cosine similarity algorithm to locate the most relevant code blocks
- Apply minimum similarity threshold (default 0.4, customizable) to filter results
- Limit results to 50 items for optimal performance
4. Path Filtering (if path parameter specified)
- Filter retrieval results to files in the specified directory
- Use standardized path comparison to ensure filtering accuracy
- Maintain semantic relevance ranking within the filtered scope
5. Result Processing and Formatting
- Convert absolute file paths to workspace-relative paths
- Structurally organize results by "file path, line number range, similarity score, code content"
- Adapt to both AI parsing and interface display format requirements, supporting syntax highlighting
6. Dual Output Format
- AI-Compatible Format: Structured text format containing query statement, file path, score, and code snippet
- Interface Display Format: JSON format supporting syntax highlighting and code navigation functionality
Search Query Best Practices
Effective Query Patterns
Recommended: Conceptual and specific descriptions
<codebase_search>
<query>User authentication and password validation logic</query>
</codebase_search>Recommended: Function-centered descriptions
<codebase_search>
<query>Database connection pool configuration implementation</query>
</codebase_search>Recommended: Problem-oriented descriptions
<codebase_search>
<query>Exception handling mechanism for API requests</query>
</codebase_search>Not recommended: Overly general descriptions
<codebase_search>
<query>Function</query>
</codebase_search>Well-Adapted Query Types
- Function description: "File upload processing logic", "Email validation rules"
- Technical pattern: "Singleton pattern implementation", "Factory method usage scenarios"
- Business domain: "User profile management logic", "Payment processing flow"
- Architecture component: "Middleware configuration methods", "Database migration scripts"
Directory Scope Limitation
The retrieval scope can be focused on a specific directory in the codebase using the optional path parameter:
Retrieve code within API module:
<codebase_search>
<query>Interface request validation middleware</query>
<path>src/api</path>
</codebase_search>Retrieve code within test files:
<codebase_search>
<query>Mock data construction patterns</query>
<path>tests</path>
</codebase_search>Retrieve code within specific function directories:
<codebase_search>
<query>Component state management logic</query>
<path>src/components/auth</path>
</codebase_search>Result Interpretation
Similarity Score Range Explanation
- 0.8-1.0: Highly relevant matches, likely the target retrieval content
- 0.6-0.8: Good relevance, strong conceptual matching
- 0.4-0.6: Potentially relevant, requires manual verification
- Below 0.4: Too low similarity, filtered out and not returned
Result Structure Explanation
Each search result contains the following information:
- File path: Workspace-relative path of the file containing matching code
- Score: Similarity score indicating relevance (value range 0.4-1.0)
- Line number range: Start and end line numbers of the matching code block
- Code snippet: Actual code content matching the query statement
Practical Application Examples
- When implementing new features, the intelligent code assistant retrieves "authentication middleware" to understand existing implementation patterns before writing new code;
- When debugging issues, the intelligent code assistant retrieves "exception handling in API calls" to locate relevant exception handling patterns in the project;
- When refactoring code, the intelligent code assistant retrieves "database transaction implementation patterns" to ensure consistent implementation of all database operations;
- When accessing a new codebase, the intelligent code assistant retrieves "configuration loading logic" to understand the application's startup initialization process.
Tool Usage Examples
Retrieve authentication-related code across the entire project:
<codebase_search>
<query>User login and authentication logic</query>
</codebase_search>Retrieve database-related code in a specific directory:
<codebase_search>
<query>Database connection and query execution logic</query>
<path>src/data</path>
</codebase_search>Retrieve exception handling patterns in API code:
<codebase_search>
<query>HTTP error response and exception handling mechanism</query>
<path>src/api</path>
</codebase_search>Retrieve test tools and mock data construction-related code:
<codebase_search>
<query>Test environment setup and mock data generation</query>
<path>tests</path>
</codebase_search>Retrieve configuration and environment initialization-related code:
<codebase_search>
<query>Environment variable and application configuration loading logic</query>
</codebase_search>