|
Note
|
This version of the template contains some help and explanations. It is used for familiarization with arc42 and the understanding of the concepts. For documentation of your own system you use better the plain version. |
dacli - Architecture Documentation
About arc42
arc42, the template for documentation of software and system architecture.
Template Version {revnumber}. {revremark}, {revdate}
Created, maintained and © by Dr. Peter Hruschka, Dr. Gernot Starke and contributors. See https://arc42.org.
1. Introduction and Goals
1.1. Requirements Overview
Large Language Models (LLMs) face significant challenges when interacting with extensive documentation projects. The primary issues are:
-
Token Limitations: Large, single-file documents exceed the context window of most models.
-
Lack of Structure Awareness: LLMs cannot navigate or understand the hierarchical structure of a documentation project (e.g., chapters, sections).
-
Inefficient Access: Reading entire files is token-inefficient when only small sections are needed.
-
Difficult Manipulation: Modifying specific parts of a document is cumbersome and error-prone.
dacli (Documentation Access CLI) aims to solve these problems by providing structured, content-aware tools for interacting with AsciiDoc and Markdown projects. Available as both a CLI tool and MCP server, it enables efficient navigation, reading, and modification of complex documentation.
1.2. Quality Goals
The architecture will prioritize the following key quality goals, derived from the non-functional requirements:
Goal |
Description |
Performance |
API calls for typical navigation and read operations must respond in under 2 seconds. Pre-processing during startup is acceptable. |
Data Integrity & Reliability |
Changes to documents must be atomic. No data loss shall occur during file modifications, even in case of errors. |
Usability |
The system must be fully compliant with the Model Context Protocol (MCP) to ensure seamless integration for developers and architects. |
Scalability |
The server must handle large documentation projects of up to 600 pages without significant performance degradation. |
1.3. Stakeholders
The primary stakeholders of dacli are:
Role/Name |
Contact |
Expectations |
Software Developer |
Development Team |
Uses dacli to analyze and maintain code documentation with LLM assistance. |
Software Architect |
Architecture Team |
Uses dacli to manage and update large-scale architecture documents (e.g., arc42) with LLMs. |
Documentation Engineer |
Documentation Team |
Manages complex documentation projects, relying on dacli for efficient navigation and maintenance. |
2. Architecture Constraints
This chapter outlines the constraints that shape the architecture of dacli.
2.1. Technical Constraints
The system must adhere to the following technical constraints, derived directly from the PRD:
Constraint |
Description |
File-System Based |
The solution must not require a database. All data and state are to be managed directly on the file system. |
Human-Readable Files |
Source documentation files (AsciiDoc, Markdown) must remain human-readable and editable with standard text editors at all times. |
Toolchain Compatibility |
The system must work with existing AsciiDoc and Markdown toolchains without requiring proprietary formats or modifications. |
Version Control Integration |
All operations must be compatible with standard Git workflows, ensuring that file changes can be tracked, committed, and reverted. |
Python Package Management |
Python dependencies must be managed using uv (https://github.com/astral-sh/uv) for fast, reliable dependency resolution and virtual environment management. |
2.2. Organizational and Process Constraints
Constraint |
Description |
Workflow Integration |
The solution must integrate seamlessly into existing developer workflows without imposing significant process changes. |
No External Services |
The system must be self-contained and not rely on any external or third-party services for its core functionality. |
Phased Development |
The project is developed in phases (Core Engine, MCP Integration, CLI), requiring a modular architecture that supports incremental delivery. |
2.3. Conventions
To ensure consistency and quality, the following conventions will be followed:
Convention |
Description |
MCP First |
The API design and implementation must be fully compliant with the Model Context Protocol (MCP) standard. This is a primary design driver. |
Stateless Principle |
The core server logic will be designed to be as stateless as possible, treating the file system as the single source of truth for all content and structure. |
Standard Markup |
The parsers will adhere to common AsciiDoc and Markdown standards. Support for non-standard or esoteric language features is a low priority. |
Atomic Operations |
All file modification operations must be designed to be atomic to prevent data corruption and ensure file consistency. |
3. Context and Scope
This chapter describes the system’s boundaries, its users, and its interactions with external systems.
3.1. Business Context
From a business perspective, dacli acts as a specialized middleware that enables technical users to interact with documentation projects more effectively. It abstracts away the complexity of file-based document structures and provides both a CLI for direct use and an MCP server for LLM integration.
3.2. Technical Context
On a technical level, the MCP server is spawned as a subprocess by an MCP-compliant client and communicates via stdio (standard input/output). The CLI tool can be invoked directly from the shell. Both interfaces interact directly with the file system to read documentation source files and write back modifications.
4. Solution Strategy
This chapter outlines the fundamental architectural decisions and strategies to meet the requirements defined in the previous chapters.
4.1. Core Architectural Approach: In-Memory Index with File-System-as-Truth
The core of the architecture is a dual approach:
-
In-Memory Index: On startup, the server parses the entire documentation project and builds a lightweight, in-memory index of the document structure (files, sections, line numbers, includes). This index is the key to achieving the Performance goals (PERF-1), as it allows for near-instant lookups of content locations without repeatedly reading files from disk.
-
File System as the Single Source of Truth: The system is stateless. The file system holds the definitive state of the documentation at all times. All modifications are written directly back to the source files. This approach satisfies the constraints of Human-Readable Files and Version Control Integration. It also simplifies the architecture by avoiding the need for a database (Constraint: File-System Based).
4.2. Technology Decisions
To implement this strategy, the following technology stack is proposed. The choices are guided by the need for strong text processing capabilities, a robust ecosystem, and fast development.
Component |
Technology |
Justification |
Language |
Python 3.12+ |
Excellent for text processing, large standard library, strong community support, and mature libraries for parsing. |
Package Manager |
uv |
Ultra-fast Python package installer and resolver. Provides deterministic builds via |
MCP Framework |
FastMCP |
High-level framework for building MCP servers in Python. Simplifies tool registration, handles MCP protocol details, and provides stdio transport. |
CLI Framework |
Click |
Mature Python CLI framework for building the |
Document Parsing |
Custom Parser Logic |
Custom parsers handle AsciiDoc/Markdown specifics, especially resolving includes and tracking line numbers accurately. Off-the-shelf libraries often lack the required granularity. This directly addresses the risk of Format Variations. |
4.3. Development Environment Setup
The project uses uv for Python environment and dependency management:
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install dependencies
uv sync
# Run the server
uv run dacli-mcp
# Add a new dependency
uv add <package-name>
# Add a development dependency
uv add --dev <package-name>
The pyproject.toml defines all dependencies, and uv.lock ensures reproducible builds across all environments.
4.4. Achieving Key Quality Goals
The architectural strategy directly addresses the top quality goals defined in Chapter 10.
Strategy |
Quality Goal Addressed |
How it is achieved |
In-Memory Structure Index |
Performance (PERF-1, PERF-2) |
Read operations query the fast in-memory index for file locations instead of parsing files on every request. |
Atomic Write-Through Cache |
Reliability (REL-1, REL-3) |
A File System Handler component implements atomic writes by using temporary files and backups. This prevents file corruption. |
MCP-Compliant Tools (FastMCP) |
Usability (USAB-1) |
FastMCP handles MCP protocol compliance, tool registration, and stdio transport. Tools are strongly typed with clear parameter definitions. |
Dual Interface (CLI + MCP) |
Usability (USAB-2) |
Both CLI (for direct use) and MCP server (for LLM integration) share the same core logic, ensuring consistent behavior. |
Stateless, File-Based Design |
Scalability (SCAL-1) & Reliability |
By keeping the server stateless, scaling becomes simpler (less state to manage). It also improves reliability as there is no complex database state to corrupt or manage. |
5. Building Block View
This chapter describes the static decomposition of the system into its key building blocks. We use the C4 model to illustrate the structure at different levels of detail.
5.1. Level 2: System Containers
The dacli system provides two interfaces: a CLI tool for direct command-line usage and an MCP server for LLM integration. The file system serves as the system’s database (see ADR-001).
5.2. Level 3: Components of the MCP Server
We now zoom into the MCP Server container. It is composed of several components, each with a distinct responsibility.
5.3. Level 3: Components of the CLI
The CLI tool (dacli) provides the same documentation operations as the MCP Server, accessible via command-line. It shares the core components (parsers, index, file handler) and adds a presentation layer for terminal output.
| Group | Commands |
|---|---|
Navigation |
|
Search & Elements |
|
Manipulation |
|
Meta-Information |
|
5.4. Document Parser Architecture
Both the MCP Server and CLI containers use the same "Document Parsers" component. Internally, it contains two independent parser implementations with distinct architectures, sharing utility functions via parser_utils.
| Aspect | AsciiDoc Parser | Markdown Parser |
|---|---|---|
File Discovery |
Follows |
Scans folder hierarchy ( |
Structure Source |
Heading levels ( |
Heading levels ( |
Include Support |
Full recursive resolution with cycle detection |
None (folder hierarchy replaces includes) |
Metadata |
Document attributes ( |
YAML frontmatter |
Unique Feature |
Attribute substitution in paths, source mapping across includes |
Numeric prefix sorting ( |
Key classes:
-
AsciidocStructureParser — Parses
.adocfiles. Delegates to an include resolver (recursive, with circular-include detection) and an attribute handler (:attr:substitution in text and include paths). -
MarkdownStructureParser — Parses
.mdfiles by scanning folder hierarchies. Supports YAML frontmatter for metadata. -
parser_utils — Shared functions:
slugify,strip_doc_extension,find_section_by_path,collect_all_sections.
For the internal structure of each parser (Level 4: Code), see the component specifications:
-
AsciiDoc Parser Specification — includes state machine diagram, include resolution algorithm, and data models
-
Markdown Parser Specification — includes state machine diagram, folder hierarchy rules, and data models
5.5. Component Responsibilities
The following table maps components to the MCP tools:
| Component | Responsibility | MCP Tools |
|---|---|---|
MCP Tools / CLI Commands |
Expose all functionality via MCP protocol or command line |
get_structure, get_section, get_sections_at_level, search, get_elements, get_metadata, validate_structure, update_section, insert_content, get_dependencies |
Service Layer |
Shared business logic for content manipulation, validation, metadata |
(Internal - used by both CLI and MCP) |
Document Parsers |
Parse AsciiDoc/Markdown, resolve includes, track line numbers |
(Internal - used during initialization) |
Structure Index |
In-memory index for fast lookups of sections and elements |
(Internal - supports all read operations) |
File System Handler |
Atomic file read/write operations with backup strategy |
(Internal - supports write operations, ADR-004) |
5.6. Data Models
The following core data models are used across components (see API Specification for details):
-
SectionLocation - File path and line range for a section
-
Section - Hierarchical document section with metadata
-
Element - Typed content element (table, code, diagram)
-
SearchResult - Search hit with context and relevance score
-
Metadata - Document or section metadata
-
ValidationResult - Structure validation outcome
6. Runtime View
This chapter illustrates how the system’s components collaborate at runtime to fulfill key use cases.
6.1. Scenario: Reading a Document Section
This is the most common read operation. A client requests the content of a specific section using its hierarchical path.
6.2. Scenario: Updating a Document Section
This scenario shows the critical write operation. The process must be atomic to ensure data integrity, as required by quality goal REL-1. This is achieved by writing to a temporary file first.
6.3. Scenario: Server Initialization
When the server starts, it needs to parse the entire documentation project to build an in-memory index of the structure. This enables fast lookups for subsequent requests.
7. Deployment View
This chapter describes the infrastructure and environment for dacli.
7.1. Deployment Strategy
The application is designed to be lightweight and self-contained, in line with its constraints (no external services). Two deployment strategies are supported:
7.1.1. Local Development / MCP Integration
For local development and MCP client integration, the server runs directly via uv:
# Install dependencies and run MCP server
uv sync
uv run dacli-mcp --docs-root /path/to/docs
# Or use CLI directly
uv run dacli --docs-root /path/to/docs structure
# Configure in MCP client (e.g., Claude Desktop)
# ~/.config/claude/claude_desktop_config.json
{
"mcpServers": {
"dacli": {
"command": "uv",
"args": ["run", "dacli-mcp"],
"cwd": "/path/to/dacli",
"env": {
"PROJECT_PATH": "/path/to/documentation"
}
}
}
}
7.1.2. Docker Deployment
For containerized environments, the application can be packaged into a Docker container:
FROM python:3.12-slim
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev
COPY src/ ./src/
# MCP server uses stdio transport, no port needed
CMD ["uv", "run", "dacli-mcp"]
|
Note
|
The MCP server communicates via stdio (standard input/output), not HTTP. The container is typically invoked by an MCP client that manages the stdio streams. |
7.2. Production Environment
For production use, dacli runs as a subprocess managed by the MCP client. The typical deployment involves:
8. Cross-cutting Concepts
This chapter describes concepts that are relevant across multiple parts of the architecture.
8.1. Security
Security is addressed through standard, well-understood mechanisms.
-
Transport Security: All communication with the server (API and MCP) must be secured with HTTPS.
-
Execution Environment: The server is assumed to run in a trusted, non-hostile environment. It has direct file system access, which is a powerful capability. Access to the server should be controlled by network rules.
-
Authentication/Authorization: The PRD does not specify any multi-user or authentication requirements. The server is treated as a single-tenant system. If needed in the future, standard token-based authentication (e.g., API keys, OAuth2) could be added at the API gateway level or within FastAPI.
8.2. Error Handling
The error handling strategy is designed to be robust and developer-friendly, supporting the quality goals of Reliability and Usability.
-
API Errors: Invalid requests (e.g., bad paths, malformed content) will result in standard HTTP error codes (4xx) with a descriptive JSON body, as required by USAB-2.
-
Server Errors: Unexpected internal errors will result in HTTP 5xx codes. All such errors will be logged with a full stack trace for debugging.
-
Data Integrity: File corruption is prevented through the atomic write mechanism detailed in ADR-004.
8.3. Logging and Monitoring
-
Logging: The application will use structured logging (e.g., JSON format) and log to
stdout. This allows for easy integration with modern log aggregation tools like the ELK stack, Splunk, or cloud-based logging services. Log levels (DEBUG, INFO, WARN, ERROR) will be used to control verbosity. -
Monitoring: FastAPI can be easily instrumented with Prometheus middleware to expose key metrics (e.g., request latency, error rates, memory usage of the index). This allows for proactive monitoring and alerting.
9. Architecture Decisions
This chapter records the most important architectural decisions using the ADR format by Michael Nygard.
9.1. ADR-001: File-System as Single Source of Truth
Status: Accepted (2025-09-18)
Context: The PRD requires that the system integrates with existing Git workflows, that files remain human-readable, and that there are no database dependencies. We need a simple, robust way to store the documentation content that honors these constraints.
Decision:
The file system will be treated as the single source of truth. The server will not have its own persistent state. All content and structure information is derived directly from the .adoc and .md files within the project directory.
Consequences:
-
Simplifies the architecture immensely. No database schema migrations or data synchronization logic needed.
-
Inherently compatible with Git and other version control systems.
-
Developers can still use their favorite text editors.
-
Queries that are not based on the document’s natural hierarchy may be inefficient to answer.
-
The system’s performance is tied to file system performance.
9.1.1. Pugh Matrix: Storage Strategy
| Criterion | File System (Baseline) | SQLite | Key-Value Store |
|---|---|---|---|
Git Integration |
0 |
- |
- |
Human-Readable Files |
0 |
- |
- |
No Database Dependency |
0 |
- |
- |
Query Flexibility |
0 |
+ |
+ |
Implementation Simplicity |
0 |
- |
- |
Total |
0 |
-3 |
-2 |
Legend: + better than baseline, 0 same as baseline, - worse than baseline
9.2. ADR-002: In-Memory Index for Performance
Status: Accepted (2025-09-18)
Context: The quality goal PERF-1 requires API calls to respond in under 2 seconds. Reading and parsing text files from disk on every request would be too slow for large projects, as identified in the runtime analysis.
Decision: On startup, the server will perform a one-time scan of the entire project directory. It will parse all documentation files and build an "In-Memory Structure Index". This index will hold metadata about each document, including section names, hierarchical paths, and the start/end line numbers for each section in its source file. Read requests will consult this index to find the exact byte range to read from a file.
Consequences:
-
Read operations (
get_section) are extremely fast, as they become simple dictionary lookups followed by a targeted file read. -
Enables efficient implementation of structure-aware APIs like
get_structure. -
Increased memory consumption, proportional to the size of the documentation project.
-
Slower server startup time due to the initial indexing phase.
-
A mechanism to detect external file changes (file watching) is needed to keep the index from becoming stale.
9.2.1. Pugh Matrix: Indexing Strategy
| Criterion | In-Memory Index (Baseline) | No Index | Persistent Disk Index |
|---|---|---|---|
Read Performance |
0 |
- - |
0 |
Startup Time |
0 |
+ |
+ |
Memory Efficiency |
0 |
+ |
+ |
Implementation Simplicity |
0 |
+ |
- |
Stateless Design |
0 |
+ |
- |
Cache Invalidation |
0 |
+ |
- |
Total |
0 |
+2 |
-2 |
Note: Despite scoring lower, In-Memory Index was chosen because read performance is the critical quality goal (PERF-1). The "No Index" approach would violate performance requirements.
9.3. ADR-003: Technology Stack (Python/FastAPI)
Status: Accepted (2025-09-18)
Context: A programming language and web framework are needed to build the MCP API Server. The choice must align with the need for rapid development, strong text-processing capabilities, and high performance for an I/O-bound application.
Decision: The backend will be implemented in Python. The FastAPI framework will be used to build the web server and API endpoints.
Consequences:
-
Python has an exceptional ecosystem for text processing and data manipulation.
-
FastAPI provides high performance for I/O-bound tasks, data validation, and automatic OpenAPI/Swagger documentation, which helps achieve USAB-1 and USAB-2.
-
The large talent pool for Python simplifies maintenance.
-
Python’s GIL can be a limitation for CPU-bound tasks, but this application is primarily I/O-bound (reading files, network requests).
9.3.1. Pugh Matrix: Technology Stack
| Criterion | Python/FastAPI (Baseline) | Node.js/Express | Go/Gin | Java/Spring |
|---|---|---|---|---|
Text Processing |
0 |
0 |
- |
0 |
Development Speed |
0 |
0 |
- |
- |
I/O Performance |
0 |
0 |
+ |
0 |
Auto API Documentation |
0 |
- |
- |
0 |
Ecosystem Maturity |
0 |
0 |
0 |
+ |
Memory Footprint |
0 |
0 |
+ |
- |
Talent Pool |
0 |
0 |
- |
+ |
Total |
0 |
-1 |
-2 |
0 |
Python/FastAPI chosen for its balanced strengths in text processing, rapid development, and automatic API documentation generation.
9.4. ADR-004: Atomic Writes via Temporary Files
Status: Accepted (2025-09-18)
Context:
The quality goal REL-1 (Atomic Writes) is critical to prevent file corruption during update operations. A failure (e.g., disk full, application crash) during a file write could leave a document in an unrecoverable, partially-written state.
Decision:
The File System Handler component will implement atomic writes using a backup-and-replace strategy:
-
Create a backup of the original file (e.g.,
doc.adoc→doc.adoc.bak). -
Write all intended changes to a new temporary file (e.g.,
doc.adoc.tmp). -
If the write is successful, atomically rename/move the temporary file to replace the original file.
-
Delete the backup file.
-
If any step fails, restore the original file from the backup and delete the temporary file.
Consequences:
-
Guarantees that the primary file is never in a corrupted state.
-
Relatively simple to implement and understand.
-
Slightly higher I/O overhead for each write operation (copy, write, move). This is an acceptable trade-off for the gain in reliability.
9.4.1. Pugh Matrix: Write Strategy
| Criterion | Temp File + Backup (Baseline) | Journaling | In-Place with Locking |
|---|---|---|---|
Crash Safety |
0 |
+ |
- |
Implementation Simplicity |
0 |
- - |
+ |
I/O Overhead |
0 |
- |
+ |
Power Loss Protection |
0 |
+ |
- |
Debugging/Recovery |
0 |
- |
0 |
Total |
0 |
-2 |
0 |
Temp File + Backup chosen for its balance of reliability and implementation simplicity. In-Place with Locking was rejected due to crash/power loss vulnerability.
9.5. ADR-005: Custom Parser for Include Resolution
Status: Accepted (2025-09-18)
Context:
A core feature is the ability to map a hierarchical path (e.g., chapter-1.section-2) to a precise location in a source file. This is complicated by AsciiDoc’s include::[] directive, as content from multiple files is logically part of one document. Existing parsers often flatten the document, losing this critical source-map information.
Decision: A custom document parser will be developed. This parser will be responsible for:
-
Parsing the AsciiDoc/Markdown syntax.
-
Recognizing and recursively resolving
include::[]directives. -
Building an Abstract Syntax Tree (AST) that retains the original file path and line numbers for every single element of the document.
Consequences:
-
Provides full control over the parsing process, ensuring the crucial source-map information is preserved.
-
Allows for tailored error handling of malformed documents or circular includes.
-
Significant development and maintenance effort compared to using an off-the-shelf library. This is the most complex component of the system.
9.5.1. Pugh Matrix: Parsing Strategy
| Criterion | Custom Parser (Baseline) | Existing Library (e.g., asciidoctor.py) |
|---|---|---|
Source-Map Preservation |
0 |
- - |
Include Resolution Control |
0 |
- |
Circular Include Detection |
0 |
- |
Development Effort |
0 |
+ |
Maintenance Effort |
0 |
+ |
Community Support |
0 |
+ |
Total |
0 |
0 |
Despite equal scoring, Custom Parser was chosen because source-map preservation is a hard requirement. Existing libraries fundamentally cannot provide this capability, making them unsuitable regardless of other benefits.
9.6. ADR-006: uv for Python Package Management
Status: Accepted (2026-01-20)
Context: Python projects require dependency management for reproducible builds. Traditional tools like pip, pipenv, and poetry have limitations in speed, reliability, or lock file handling. The project needs fast dependency resolution for efficient CI/CD and a deterministic lock file for reproducible deployments.
Decision:
Use uv (https://github.com/astral-sh/uv) as the Python package manager and virtual environment tool. All dependencies are defined in pyproject.toml and locked in uv.lock.
Consequences:
-
10-100x faster dependency resolution compared to pip/poetry.
-
Deterministic
uv.lockensures identical environments across development, CI, and production. -
Integrated virtual environment management (
uv synccreates venv automatically). -
Compatible with standard
pyproject.tomlformat. -
Relatively new tool (2024), but rapidly maturing with strong community adoption.
-
Requires uv installation on developer machines and CI environments.
9.6.1. Pugh Matrix: Package Manager
| Criterion | uv (Baseline) | pip + venv | Poetry | Pipenv |
|---|---|---|---|---|
Resolution Speed |
0 |
- - |
- |
- |
Lock File Quality |
0 |
- - |
0 |
0 |
Reproducibility |
0 |
- |
0 |
0 |
pyproject.toml Support |
0 |
- |
0 |
- |
Ecosystem Maturity |
0 |
+ |
+ |
0 |
CI/CD Integration |
0 |
0 |
0 |
- |
Total |
0 |
-5 |
0 |
-2 |
uv chosen for its superior speed and deterministic builds. Poetry is a viable alternative but slower.
9.7. ADR-007: dataclasses for Data Models
Status: Accepted (2026-01-20)
Context:
The core data models (SourceLocation, Section, Element, CrossReference, Document) need to be defined for the parser and API layers. These models must be serializable to JSON for API responses and should be type-safe. Two main options exist in the Python ecosystem: Python’s built-in dataclasses module and the third-party pydantic library.
Decision:
Use Python’s built-in dataclasses for all core data models. JSON serialization will be achieved using dataclasses.asdict() combined with custom serialization for Path objects.
Consequences:
-
Zero additional dependencies (dataclasses is part of the standard library since Python 3.7).
-
Consistent with the specification documents which use dataclasses in all examples.
-
Simpler, more lightweight implementation.
-
Type hints are supported and enforced by IDE/mypy.
-
Manual validation required (no automatic validation like Pydantic provides).
-
Custom serialization needed for non-JSON-native types (Path, datetime).
-
If advanced validation becomes necessary, migration to Pydantic is straightforward since both use similar patterns.
9.7.1. Pugh Matrix: Data Model Implementation
| Criterion | dataclasses (Baseline) | Pydantic | attrs |
|---|---|---|---|
Standard Library |
0 |
- |
- |
Spec Conformity |
0 |
- |
- |
JSON Serialization |
0 |
+ |
0 |
Automatic Validation |
0 |
+ |
0 |
Learning Curve |
0 |
- |
- |
Runtime Overhead |
0 |
- |
0 |
FastAPI Integration |
0 |
+ |
- |
Total |
0 |
0 |
-2 |
dataclasses chosen for simplicity, zero dependencies, and spec conformity. Pydantic scored equally but adds complexity not needed for the current scope. Migration to Pydantic remains possible if advanced validation becomes necessary.
9.8. ADR-008: Cross-Document Path Uniqueness Strategy
Status: Accepted (2026-01-22)
Context: The current path generation creates section paths based solely on heading titles (slugified). This works well within a single document, but when multiple documents are indexed together, sections with identical titles receive identical paths. This causes:
-
Only the first section with a given path is indexed; subsequent ones are rejected with a warning
-
Search results are incomplete (Issue #131)
-
Sections become inaccessible via API (Issue #130)
-
Root documents all have empty path
""(Issue #129)
Example: Both guides/installation.adoc and guides/advanced.adoc have a section == Prerequisites, resulting in path collision at prerequisites.
Issue #123 (now fixed) addressed within-document duplicates by appending -2, -3 suffixes. This ADR addresses the cross-document case.
Note: Backwards compatibility is not a requirement for this decision.
Decision: We will implement Option A: File Prefix in Path as the solution.
Section paths will include the relative file path (without extension) as a prefix, separated by a colon:
guides/installation:prerequisites guides/advanced:prerequisites
For root sections (document titles), the path will be the file path alone:
guides/installation guides/advanced
Consequences:
Positive:
-
All sections are guaranteed uniquely addressable (file paths are always unique)
-
Path clearly indicates file location - useful for navigation
-
Simple mental model:
file:sectionformat -
Search returns all matching sections
-
LLM-friendly: paths are self-documenting
-
Single identifier (no composite keys needed)
Negative:
-
All existing paths change (acceptable since backwards compatibility not required)
-
Paths become longer
-
Clients need to handle the new format
9.8.1. Pugh Matrix: Path Uniqueness Strategy
Note: Backwards compatibility removed as criterion per stakeholder decision.
| Criterion | Current (Baseline) | A: File Prefix | B: Doc Title Prefix | C: Global Disambig | D: File Filter |
|---|---|---|---|---|---|
Unique Addressability |
0 |
+ |
0 1 |
+ |
+ |
Path Readability |
0 |
0 |
+ |
- |
0 |
API Simplicity |
0 |
0 |
0 |
0 |
- |
Implementation Complexity |
0 |
0 |
- 2 |
0 |
- |
Search Correctness |
0 |
+ |
+ |
+ |
+ |
LLM Usability |
0 |
+ |
+ |
- |
0 |
Total |
0 |
+3 |
+2 |
0 |
0 |
Legend: + better than baseline, 0 same as baseline, - worse than baseline
1 Document titles may not be unique, causing potential collisions 2 Requires handling of duplicate document titles
9.8.2. Option Details
Option A: File Prefix in Path (Selected)
guides/installation:prerequisites guides/advanced:prerequisites
Paths include relative file path (without extension) as prefix, separated by colon. Guarantees uniqueness since file paths are always unique. Clear indication of file location. Best score in Pugh Matrix.
Option B: Document Title Prefix
installation-guide.prerequisites advanced-guide.prerequisites
Uses parent document title (slugified) as prefix. More readable but depends on unique document titles - if two files have the same title, collisions remain. Requires additional disambiguation logic.
Option C: Global Auto-Disambiguation
prerequisites prerequisites-2
Extends Issue #123 fix globally across all documents. Simple but arbitrary ordering based on parse order. Path meaning unclear without context. Poor LLM usability.
Option D: File-Scoped Paths with Filter
Path: prerequisites File: guides/installation.adoc
Keep current path format, add file parameter to APIs. Requires composite key (file + path) for uniqueness. More complex API. No longer recommended since backwards compatibility is not required.
9.8.3. Path Format Specification
<relative-file-path-without-extension>:<section-path>
Examples:
guides/installation:prerequisites
guides/installation:prerequisites.python-version
api/reference:endpoints.get-section
index (root section of index.adoc)
guides/advanced (root section)
9.8.4. Implementation Notes
-
Path generation in parsers:
-
Compute relative path from docs-root to file
-
Remove file extension
-
Prepend to existing section path with
:separator -
Root sections (level 0) use file path only (no trailing
:)
-
-
StructureIndex changes:
-
Paths are now guaranteed unique
-
Remove duplicate detection warnings for paths
-
Update path-based lookups
-
-
API changes:
-
All endpoints accept new path format
-
Response
location.filebecomes redundant but kept for clarity
-
9.9. ADR-009: No Built-In LLM-Based Ask Feature
Status: Accepted (2026-02-07)
Context:
Issue #186 proposed adding an experimental dacli ask "question" command that uses an LLM to answer questions about the documentation. The idea was to iterate through all documentation files, pass each file’s content together with the question and accumulated findings to an LLM, and consolidate the results into a final answer.
Several approaches were evaluated during implementation:
-
Section-based iteration: Iterate through all sections (~460 in a typical project), one LLM call per section. Result: ~460 LLM calls, far too slow.
-
File-based iteration: Iterate through all files (~35-50), one LLM call per file. Result: ~50 LLM calls. With Claude Code CLI (~12s per call) this took ~10 minutes. With the Anthropic API (~4s per call) still ~3-4 minutes.
-
Two-call approach: Send all section titles to the LLM to select relevant ones, then send only those sections for answering. Faster, but essentially replicates what the calling LLM already does.
The fundamental insight: dacli is designed to be used by LLMs (both via MCP and CLI). The calling LLM can already:
-
Call
get_structure()/dacli structureto see all section titles -
Call
get_section()/dacli section <path>to read relevant sections -
Answer the question itself with full context
The ask command is therefore redundant — it implements a slower, less capable version of what the calling LLM already does natively.
Decision: We will not implement a built-in LLM-based ask feature. Instead, we rely on the calling LLM to use dacli’s existing navigation and content access tools to answer questions.
A future RAG-based approach (Retrieval-Augmented Generation) could be considered as a separate feature. RAG would use vector embeddings for fast similarity search and only require a single LLM call for the final answer. This would be significantly faster but would not cover the entire documentation (only the top-k matching chunks).
Consequences:
Positive:
-
No additional LLM dependency in dacli (no API keys, no subprocess calls)
-
No performance bottleneck from iterative LLM calls
-
Simpler codebase — fewer moving parts
-
The calling LLM has better context and can make more informed decisions about relevance
Negative:
-
CLI users without LLM integration cannot ask natural language questions directly
-
No single-command convenience for documentation Q&A
9.9.1. Pugh Matrix: Documentation Q&A Strategy
| Criterion | No Feature (Baseline) | A: Iterative LLM | B: Two-Call LLM | C: RAG |
|---|---|---|---|---|
Response Time |
0 |
— |
0 |
+ |
Answer Quality |
0 |
0 |
0 |
- |
Coverage (% of docs checked) |
0 |
+ |
0 |
- |
Implementation Complexity |
0 |
— |
- |
- |
External Dependencies |
0 |
— |
— |
- |
Redundancy with Calling LLM |
0 |
— |
- |
0 |
Total |
0 |
-7 |
-3 |
-2 |
Legend: + better than baseline, 0 same, - worse, — much worse
9.9.2. Option Details
No Feature (Selected)
Rely on the calling LLM to use existing dacli tools (get_structure, get_section, search) to answer questions. Zero additional complexity. Works today.
Option A: Iterative LLM (Rejected) Pass every file/section to an LLM one by one. Complete coverage but prohibitively slow (3-10 minutes). Requires LLM provider configuration. Essentially duplicates the calling LLM’s job.
Option B: Two-Call LLM (Rejected) First call: LLM selects relevant sections from title list. Second call: LLM answers from selected content. Faster than A but still redundant with calling LLM behavior.
Option C: RAG (Future Consideration) Build vector index of documentation chunks, retrieve top-k similar chunks for a question, answer in one LLM call. Fast (~2-3 seconds) but incomplete coverage. Could be a valuable future addition for human CLI users.
9.10. ADR-010: No Web UI — dacli is CLI and MCP Only
Status: Accepted (2026-02-07)
Context: The original architecture documents (Quality Requirements, Concepts) assumed a Web UI component for dacli — specifically a document structure visualization (#12) and a real-time diff display (#13). Quality scenario USAB-3 described a "web UI" showing red/green diffs after modifications.
However, dacli has evolved into a pure CLI tool and MCP server designed for LLM integration. The primary consumers are:
-
LLMs using dacli via MCP (Model Context Protocol)
-
LLMs using dacli via CLI/Bash tools
-
Developers using the CLI directly
A Web UI would be a fundamentally different product — it requires frontend development (HTML/CSS/JS), serving static files, browser compatibility, and ongoing maintenance. This is outside dacli’s core mission of providing structured, programmatic access to documentation.
Tools like docToolchain already provide web-based documentation rendering. Adding a Web UI to dacli would duplicate that effort without clear benefit.
Decision: dacli will not include a Web UI. The project scope is limited to CLI and MCP server interfaces. References to a Web UI in existing architecture documents will be removed.
Consequences:
Positive:
-
Clear, focused scope — CLI and MCP only
-
No frontend dependencies or maintenance burden
-
Simpler deployment (no static file serving, no browser compatibility)
-
Architecture documents accurately reflect the actual system
Negative:
-
No visual diff display for documentation changes (LLMs and developers use the API/CLI output instead)
-
No interactive tree visualization (LLMs use
get_structureprogrammatically)
9.11. ADR-011: Risk Classification - dacli CLI (Tier 2)
Status: Accepted (2026-02-11)
Deciders: Development Team + Claude Code
Context:
The dacli CLI module requires risk classification according to the Risk Radar framework to determine appropriate security and quality assurance measures. The assessment evaluates five dimensions:
| Dimension | Score | Level | Evidence |
|---|---|---|---|
Code Type |
2 |
Business Logic |
Click commands, service layer orchestration ( |
Language |
2 |
Dynamically typed |
Python 3.12+ — 100% |
Deployment |
1 |
Internal tool |
Command-line tool for documentation teams |
Data Sensitivity |
1 |
Internal business data |
Operates on internal documentation |
Blast Radius |
2 |
Data loss (recoverable) |
Could corrupt docs, recoverable from git |
Decision:
Classify dacli CLI as Tier 2 — Extended Assurance (determined by max(Code Type=2, Language=2, Blast Radius=2)).
This tier requires:
-
Tier 1 measures: Linter & formatter, pre-commit hooks, dependency vulnerability scanning, CI with automated tests
-
Tier 2 measures: SAST (CodeQL), property-based testing (Hypothesis), code quality gates (SonarCloud), AI-assisted code review, PR review policy with sampling
Consequences:
Positive:
-
Clear security baseline established for the project
-
Comprehensive testing strategy (713 automated tests + property-based tests with 1,100+ generated cases)
-
Automated gates prevent common vulnerabilities (dependency CVEs, code quality issues)
-
SonarCloud integration provides continuous quality monitoring
-
PR review policy balances thoroughness with development velocity (20-30% sampling)
Negative:
-
Additional CI pipeline duration (~2-3 minutes for SAST and quality gate checks)
-
Developer onboarding overhead (pre-commit hooks, review policy understanding)
-
Maintenance burden for Tier 2 tooling (CodeQL queries, SonarCloud configuration)
9.11.1. Pugh Matrix: Risk Tier Selection
| Criterion | Tier 2 (Baseline) | Tier 1 (Lower) | Tier 3 (Higher) |
|---|---|---|---|
Code Complexity Coverage |
0 |
- |
+ |
Language Risk Mitigation |
0 |
- |
+ |
Data Loss Prevention |
0 |
- |
+ |
Development Velocity |
0 |
+ |
- |
CI/CD Pipeline Complexity |
0 |
+ |
- |
Total |
0 |
-1 |
0 |
Legend: + better than baseline, 0 same as baseline, - worse than baseline
Tier 1 rejected: Insufficient coverage for business logic complexity and blast radius (data loss risk). Missing SAST and property-based testing would leave gaps in quality assurance.
Tier 3 rejected: Not cost-justified. The module is an internal tool without public-facing deployment or sensitive PII. Branch protection and fuzzing would be overkill for the current risk profile.
Tier 2 selected: Balanced approach matching the actual risk profile (business logic + dynamic typing + recoverable data loss). Provides strong automated gates without excessive overhead.
9.12. ADR-012: Risk Classification - dacli-mcp (Tier 2)
Status: Accepted (2026-02-11)
Deciders: Development Team + Claude Code
Context:
The dacli-mcp MCP server module requires risk classification according to the Risk Radar framework to determine appropriate security and quality assurance measures. The assessment evaluates five dimensions:
| Dimension | Score | Level | Evidence |
|---|---|---|---|
Code Type |
2 |
Business Logic |
MCP tools, service layer ( |
Language |
2 |
Dynamically typed |
Python 3.12+ — 100% |
Deployment |
1 |
Internal tool |
MCP server for LLM integration in internal workflows |
Data Sensitivity |
1 |
Internal business data |
Operates on internal documentation |
Blast Radius |
2 |
Data loss (recoverable) |
Could corrupt docs, recoverable from git |
Note: Initial consideration was given to Code Type score 3 (API/Database Queries) since the module exposes API endpoints (MCP tools). However, user confirmed score 2 as these are internal service APIs without public exposure or direct database access.
Decision:
Classify dacli-mcp as Tier 2 — Extended Assurance (determined by max(Code Type=2, Language=2, Blast Radius=2)).
This tier requires the same mitigation measures as dacli CLI (both modules share the same codebase):
-
Tier 1 measures: Linter & formatter, pre-commit hooks, dependency vulnerability scanning, CI with automated tests
-
Tier 2 measures: SAST (CodeQL), property-based testing (Hypothesis), code quality gates (SonarCloud), AI-assisted code review, PR review policy with sampling
Consequences:
Positive:
-
Consistent risk management across both module entry points (CLI and MCP server)
-
Shared codebase benefits from unified quality gates and testing strategy
-
MCP tools benefit from the same 713 automated tests + property-based tests
-
FastMCP framework integration validated by comprehensive test suite
Negative:
-
MCP-specific edge cases may need additional test coverage beyond shared tests
-
Tool invocation patterns (JSON-RPC) differ from CLI patterns, requiring careful validation
9.12.1. Pugh Matrix: Risk Tier Selection
| Criterion | Tier 2 (Baseline) | Tier 1 (Lower) | Tier 3 (Higher) |
|---|---|---|---|
API Exposure Risk |
0 |
- |
+ |
Language Risk Mitigation |
0 |
- |
+ |
Data Loss Prevention |
0 |
- |
+ |
Development Velocity |
0 |
+ |
- |
Tool Integration Complexity |
0 |
+ |
- |
Total |
0 |
-1 |
0 |
Legend: + better than baseline, 0 same as baseline, - worse than baseline
Tier 1 rejected: Insufficient for an API-like interface (MCP tools). Missing SAST and property-based testing would leave gaps in tool invocation validation and edge case coverage.
Tier 3 rejected: Not cost-justified. The MCP server is for internal LLM integration, not public-facing. The deployment context (internal tool) doesn’t warrant branch protection, fuzzing, or penetration testing.
Tier 2 selected: Appropriate for internal API-like interfaces with business logic. Provides SAST coverage for potential injection vulnerabilities and property-based tests for tool parameter validation without excessive overhead.
Shared codebase note: Both dacli CLI and dacli-mcp modules share the same source code (src/dacli/). Entry points differ (dacli.cli:cli vs dacli.main:main), but risk profile and protection measures are identical. All mitigations are applied repository-wide.
9.13. ADR-013: Security Mitigations - Tier 2 Implementation
Status: Accepted (2026-02-11)
Deciders: Development Team + Claude Code
Context:
Following the Tier 2 risk classification for both dacli CLI and dacli-mcp modules (see ADR-011 and ADR-012), the project requires implementation of comprehensive security and quality assurance measures. The Risk Radar framework mandates cumulative mitigations: all Tier 1 measures plus all Tier 2 measures.
Both modules share the same codebase (src/dacli/), so mitigations are applied repository-wide rather than per module.
Decision:
Implement all required Tier 1 and Tier 2 mitigation measures as repository-wide protections:
Tier 1 — Automated Gates:
-
Linter & Formatter: Ruff configured in
pyproject.tomlwith enforced rules (E, F, I, N, W, UP) and 100-character line length -
Pre-Commit Hooks: Configured via
.pre-commit-config.yaml(commit 68d6ae4) with Ruff checks -
Dependency Vulnerability Scanning:
pip-auditintegrated in CI pipeline (commit fee56b6) -
CI Build & Unit Tests: GitHub Actions workflow (
.github/workflows/test.yml) running 713 automated tests with coverage reporting
Tier 2 — Extended Assurance:
-
SAST (Static Application Security Testing): CodeQL workflow with
security-extendedquery suite (commit fead47e), runs on upstream repository only -
AI-Assisted Code Review: Claude Code review workflow (
.github/workflows/claude-code-review.yml) for automated PR analysis -
Property-Based Testing: Hypothesis framework (commit 87a965d) with 11 property-based tests generating 1,100+ test cases (
tests/test_property_based.py) -
Code Quality Gate: SonarCloud integration (commit fb4c8ad) via
.github/workflows/sonarcloud.ymlandsonar-project.properties -
PR Review Policy with Sampling: Risk-based review policy documented in
.github/PR_REVIEW_POLICY.md(commit efb868f):-
100% review for security-sensitive changes, breaking changes, architecture changes
-
20-30% sampling for bug fixes, refactoring, tests, documentation
-
Auto-merge eligible: non-security dependency updates, formatting fixes, PATCH version bumps
-
Security Fixes Applied:
-
cryptographyupgraded from 46.0.3 → 46.0.5 (CVE-2026-26007 mitigation) -
pipupgraded from 24.0 → 26.0.1 -
Commit: 7766e90
Consequences:
Positive:
-
100% mitigation coverage for both Tier 1 (4/4 measures) and Tier 2 (5/5 measures)
-
Zero known vulnerabilities (pip-audit clean)
-
Comprehensive test coverage: 713 unit/integration tests + 11 property-based tests (1,100+ generated cases)
-
Continuous quality monitoring: SonarCloud provides ongoing code quality metrics and technical debt tracking
-
Automated security scanning: CodeQL runs on every push to main, catching potential vulnerabilities before production
-
Efficient review process: Sampling policy (20-30%) balances thoroughness with development velocity
-
Developer experience: Pre-commit hooks catch issues locally before CI, reducing feedback loop time
Negative:
-
CI pipeline duration increase: ~2-3 minutes added for SAST (CodeQL) and quality gate (SonarCloud) checks
-
Developer onboarding overhead: New contributors must understand pre-commit hooks, review policy, and quality standards
-
Maintenance burden:
-
CodeQL query suite updates needed for new Python versions
-
SonarCloud project configuration requires manual setup and token management
-
Hypothesis tests may need strategy refinement as edge cases are discovered
-
-
External service dependencies:
-
SonarCloud outages block PRs (mitigated by making check non-blocking in fork workflow)
-
CodeQL only runs on upstream repository (not on fork PRs)
-
-
False positive handling: SAST tools may flag intentional patterns (e.g., dynamic code in MCP server), requiring suppression annotations
9.13.1. Pugh Matrix: Mitigation Strategy
| Criterion | Repository-wide (Baseline) | Module-specific | Tier 1 Only |
|---|---|---|---|
Implementation Simplicity |
0 |
- |
+ |
Coverage Completeness |
0 |
0 |
- |
Maintenance Burden |
0 |
- |
+ |
Risk Mitigation Effectiveness |
0 |
0 |
- |
Compliance with Tier 2 |
0 |
0 |
- |
Total |
0 |
-2 |
-1 |
Legend: + better than baseline, 0 same as baseline, - worse than baseline
Module-specific approach rejected: Both modules share the same codebase (src/dacli/). Applying mitigations per module would duplicate CI checks, complicate maintenance, and provide no additional risk reduction.
Tier 1 only rejected: Insufficient coverage for Tier 2 classification. Missing SAST, property-based testing, and quality gates would leave critical gaps in security and quality assurance.
Repository-wide Tier 1+2 selected: Simplest implementation, consistent protection across all entry points (CLI and MCP), compliant with Risk Radar tier requirements.
9.13.2. Alternative Mitigation Measures Considered
Static Type Checking (mypy):
-
Rejected for Tier 1: Python project without strict typing. Retrofitting type annotations to 64+ files would be high effort with moderate benefit. FastMCP framework uses dynamic features that complicate type checking.
-
Future consideration: May be added incrementally as codebase matures, but not required for current Tier 2 classification.
Fuzzing (AFL, cargo-fuzz):
-
Not required for Tier 2: Fuzzing is a Tier 3 measure. The project’s internal tool deployment context and recoverable data loss blast radius don’t justify the complexity and CI time cost of continuous fuzzing.
Branch Protection:
-
Not required for Tier 2: Branch protection (required status checks, mandatory reviews) is a Tier 3 measure. Current PR review policy with sampling (20-30%) provides adequate oversight for internal tool risk profile.
9.13.3. Implementation Timeline
All mitigations implemented between 2026-02-09 and 2026-02-11 as part of PR #279:
-
Pre-commit hooks: commit 68d6ae4
-
Dependency vulnerability scanning (pip-audit): commit fee56b6
-
Security fixes (cryptography, pip): commit 7766e90
-
CodeQL SAST workflow: commit fead47e
-
Property-based tests (Hypothesis): commit 87a965d
-
SonarCloud quality gate: commit fb4c8ad
-
PR review policy: commit efb868f
Verification: All 713 tests passing, CI green, pip-audit clean, CodeQL and SonarCloud integrated.
9.13.4. Module-Specific Notes
PR Review Policy - Differential Application:
While most mitigations are truly repository-wide, the PR review policy applies differentially based on change type (not module):
-
100% mandatory review:
-
Security-sensitive changes (auth, crypto, file system ops with user paths)
-
Breaking changes (public API, CLI interface, configuration format)
-
Architecture changes (new components, core parsers, data model)
-
Release preparation (MINOR/MAJOR version bumps)
-
-
20-30% sampling review:
-
Bug fixes (prioritize critical bugs)
-
Internal refactoring (prioritize complex changes)
-
Test additions (prioritize property-based/integration tests)
-
Documentation updates (prioritize user-facing docs)
-
-
Auto-merge eligible:
-
Dependency updates (PATCH, non-security, passing CI)
-
Formatting/linting fixes (no logic changes)
-
PATCH version bumps (small fixes, no API changes)
-
This differential approach ensures critical changes receive thorough review while maintaining development velocity for lower-risk changes.
10. Quality Requirements
This chapter defines the most important quality requirements for the system. Each requirement is specified as a concrete, measurable scenario.
10.1. Performance
The system must provide fast access to documentation content, even in large projects.
ID |
Quality Goal |
Scenario |
Measurement |
PERF-1 |
Response Time |
When a user requests a typical section via |
Response time < 2 seconds for a 10-page section within a 600-page project. |
PERF-2 |
Indexing Time |
When the server starts, it indexes the entire documentation project. |
Initial indexing of a 600-page project completes in < 60 seconds. |
PERF-3 |
Low Overhead |
While the server is idle, it shall consume minimal system resources. |
CPU usage < 5% and a stable, non-growing memory footprint. |
10.2. Reliability and Data Integrity
The system must be robust and guarantee that no data is lost or corrupted.
ID |
Quality Goal |
Scenario |
Measurement |
REL-1 |
Atomic Writes |
When a user updates a section ( |
The file on disk is either the original version or the fully updated version, never a partially written or corrupted state. A backup/restore mechanism is used. |
REL-2 |
Error Handling |
When a user provides a malformed path to an API call (e.g., |
The API returns a structured error message (e.g., HTTP 400) with a clear explanation, without crashing the server. |
REL-3 |
Data Integrity |
After a series of 100 random but valid modification operations, the document structure remains valid and no content is lost. |
A validation check ( |
10.3. Usability
The system must be easy to use for its target audience of developers and architects.
ID |
Quality Goal |
Scenario |
Measurement |
USAB-1 |
MCP Compliance |
A developer uses a standard MCP client to connect to the server and request the document structure. |
The server responds with a valid structure as defined in the MCP specification, without requiring any custom client-side logic. |
USAB-2 |
Intuitiveness |
A developer can successfully perform the top 5 use cases (e.g., get section, update section, search) by only reading the API documentation. |
90% success rate in user testing with the target audience. |
|
Note
|
USAB-3 (Web UI diff display) was removed per ADR-010 — dacli has no Web UI. |
10.4. Scalability
The system must be able to handle large documentation projects.
ID |
Quality Goal |
Scenario |
Measurement |
SCAL-1 |
Project Size |
The server processes a large documentation project composed of multiple files. |
The system successfully indexes and handles a 600-page AsciiDoc project with response times still within the defined performance limits (PERF-1). |
SCAL-2 |
Concurrent Access |
While one client is reading a large section, a second client initiates a request to modify a different section. |
Both operations complete successfully without deadlocks or data corruption. The modification is correctly applied. |
11. Risks and Technical Debts
This chapter documents known risks and technical debts.
11.1. Known Risks
This list is taken from the PRD’s risk assessment.
Risk Level |
Description |
Mitigation Strategy |
High |
Include Resolution Complexity: Circular includes or complex, deeply nested dependency chains could lead to infinite loops or poor performance during parsing. |
The custom parser must include robust cycle detection and set a reasonable limit for include depth. |
High |
File Corruption: Concurrent modifications or application crashes could corrupt files. |
This is directly mitigated by the atomic write strategy (ADR-004). |
High |
Performance: The in-memory index for very large projects (>600 pages) could consume excessive memory or slow down startup. |
This is mitigated by the current design for the specified scale. If projects grow larger, a move to a persistent index (alternative in ADR-002) might be necessary. |
Medium |
Format Variations: Different dialects of AsciiDoc/Markdown could be parsed incorrectly. |
The custom parser will focus on a well-defined, common subset of the languages. Comprehensive testing with real-world documents is required. |
11.2. Technical Debts
Technical debt consists of design or implementation choices that are expedient in the short term but may lead to future costs.
Item |
Description |
Consequence / Repayment Strategy |
Custom Parser (ADR-005) |
Building a custom parser is a significant undertaking and creates a complex component that must be maintained. |
Consequence: High maintenance cost. Repayment: As AsciiDoc parsing libraries in Python mature, periodically re-evaluate whether this component can be replaced with a standard, community-maintained library. |
No File Watching |
The in-memory index (ADR-002) does not automatically update if files are changed externally (e.g., by a user in a text editor). |
Consequence: The index can become stale, leading to incorrect data being served. Repayment: Implement a file-watching mechanism that triggers a re-indexing of changed files. This was deferred to reduce initial complexity. |
11.3. Technical Debt Tracking
Implementation-related technical debt is tracked as GitHub issues with the tech-debt prefix. These issues document gaps between specification and implementation that are deferred for future work.
| Issue | Description |
|---|---|
AsciiDoc ifdef/ifndef conditional support |
|
Include options and attribute substitution in paths |
|
Additional features from spec (end_line, reserved frontmatter fields, content extraction) |
|
File→Sections mapping, rebuild() method, element index within section |
|
Note
|
For the current list of all tech-debt issues, see the GitHub Issues with tech-debt label. |
12. Glossary
Term |
Definition |
ADR |
Architecture Decision Record. A document that captures an important architectural decision and its context and consequences. |
AST |
Abstract Syntax Tree. A tree representation of the abstract syntactic structure of source code. In our case, of a documentation project. |
Atomic Write |
An operation that is guaranteed to either complete fully or not at all, preventing partially-written, corrupted data. |
Hierarchical Path |
A human-readable path used to identify a specific section within the documentation project, e.g., |
MCP |
Model Context Protocol. A specification for how LLM-based agents should interact with external tools and data sources. |
Structure Index |
The in-memory data structure that holds the metadata of the entire documentation project for fast lookups. |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.