include::path/to/file.adoc[]
include::{includedir}/file.adoc[]
include::chapter.adoc[leveloffset=+1]
include::code.py[lines=5..10]
AsciiDoc Parser - Component Specification
1. Introduction
This specification defines the AsciiDocParser - a lightweight component for parsing AsciiDoc documents. The parser is intentionally tailored to the requirements of this project and is not a complete Asciidoctor-compatible parser.
1.1. Purpose
The AsciiDocParser serves to:
-
Capture document structure: Extract sections and build hierarchical structure
-
Resolve includes: Process
include::[]directives recursively with source mapping -
Identify elements: Recognize code blocks, tables, images, admonitions, and PlantUML as addressable blocks
-
Manage attributes: Set document attributes and resolve them in paths/content
-
Capture cross-references: Collect
[anchor]andxref:[]for link validation -
Source file mapping: Capture line numbers and source file for each element
1.2. Scope Limitations
The parser is not a complete Asciidoctor renderer. It:
-
Does not render HTML/PDF
-
Does not parse inline formatting (bold, italic, monospace)
-
Does not analyze table contents in detail
-
Does not process complex list structures
-
Does not support
ifeval::[]conditional evaluation
2. Technical Debt
|
TD-ADOC-001: ifeval Conditional Not Supported The Priority: Low (not required for MVP) |
3. Supported AsciiDoc Features
3.1. Fully Supported
| Feature | Description | Usage |
|---|---|---|
Sections |
|
Document structure |
Document Header |
Title and attributes before first content |
Metadata |
Document Attributes |
|
Configuration, metadata |
Attribute References |
|
Dynamic values |
Include Directive |
|
Document composition |
Source Blocks |
|
Element extraction |
PlantUML Blocks |
|
Element extraction |
Images |
|
Element extraction |
Admonitions |
|
Element extraction |
Cross-References |
Link capture |
3.2. Recognized (Not Parsed in Detail)
| Feature | Description | Treatment |
|---|---|---|
Tables |
|
Recognized as block, content not analyzed |
Listing Blocks |
|
Recognized as generic block |
Sidebar Blocks |
|
Recognized as block |
Example Blocks |
|
Recognized as block |
Quote Blocks |
|
Recognized as block |
3.3. Not Supported
-
ifeval::[]conditional evaluation - see Technical Debt -
Inline formatting (
bold,italic,mono) -
Footnotes
-
Bibliography
-
Index entries
-
Complex table formatting (colspan, rowspan)
-
Passthrough blocks (
)
4. Include Resolution
The AsciiDocParser supports recursive resolution of include::[] directives with complete source mapping.
4.1. Syntax
4.2. Attribute Substitution in Paths
Attributes are substituted before path resolution:
:includedir: chapters
:lang: en
include::{includedir}/{lang}/intro.adoc[]
// Resolves to: chapters/en/intro.adoc
4.3. Include Options
| Option | Description | Support |
|---|---|---|
|
Increase section level by n |
✓ Full |
|
Decrease section level by n |
✓ Full |
|
Include only lines n to m |
✓ Full |
|
Include only tagged region |
✗ Not supported |
|
Add indentation |
✗ Not supported |
4.4. Source Mapping
For each element, the original source file and line number are captured:
@dataclass
class SourceLocation:
"""Position in source document."""
file: Path # Original file (not resolved include file)
line: int # 1-based line number in this file
resolved_from: Path | None # If included via include directive
4.5. Circular Includes
The parser detects and prevents circular include chains:
Scenario: Circular includes are detected
Given File A contains "include::B.adoc[]"
And File B contains "include::A.adoc[]"
When the parser processes File A
Then a CircularIncludeError is thrown
And the include chain is specified in the error
5. Document Attributes
5.1. Syntax
// Set attribute
:author: John Doe
:revdate: 2024-01-15
:imagesdir: ./images
// Unset attribute
:!draft:
// Attribute reference
The author is {author}.
5.2. Standard Attributes
| Attribute | Description | Default Value |
|---|---|---|
|
Document type (article, book, etc.) |
|
|
Base path for images |
|
|
Base path for includes |
|
|
Global section level offset |
|
5.3. jbake Attributes
For integration with jbake, the following attributes are extracted as metadata:
:jbake-title: My Document
:jbake-type: page_toc
:jbake-status: published
:jbake-menu: main
:jbake-order: 5
6. Extractable Elements
6.1. Source Blocks (Code)
[source,python]
.Optional Title
def hello(): print("Hello, World!")
| Attribute | Value |
|---|---|
|
|
|
|
|
|
|
Source file and line number |
|
Raw content (without delimiter) |
6.2. PlantUML Blocks
[plantuml, diagram-name, svg]
----
@startuml
Alice -> Bob: Hello
@enduml
----
| Attribute | Value |
|---|---|
|
|
|
|
|
|
|
Source file and line number |
|
PlantUML source code |
6.3. Tables
.Table Title
[cols="1,2,3"]
|===
| Header 1 | Header 2 | Header 3
| Cell 1 | Cell 2 | Cell 3
|===
| Attribute | Value |
|---|---|
|
|
|
|
|
Number of columns (from cols attribute or header) |
|
Number of data rows |
|
Source file and line number |
| Table contents are not parsed in detail. Only structural metadata is captured. |
6.4. Images
// Block image
image::path/to/image.png[Alt Text, 400, 300]
// With title
.Image Title
image::diagram.svg[Architecture Diagram]
| Attribute | Value |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
Source file and line number |
6.5. Admonitions
NOTE: This is a note.
WARNING: This is a warning.
[TIP]
====
This is a multi-line tip.
With multiple paragraphs.
====
| Attribute | Value |
|---|---|
|
|
|
|
|
Source file and line number |
|
Admonition content (raw text) |
7. Cross-References
The parser captures all cross-references for later link validation.
7.1. Syntax
// Internal reference
<<section-anchor>>
<<section-anchor,Custom Text>>
// External reference (xref)
xref:other-file.adoc#anchor[Link Text]
xref:other-file.adoc[]
7.2. Captured Information
@dataclass
class CrossReference:
"""A captured cross-reference."""
type: Literal["internal", "external"]
target: str # Anchor or file#anchor
text: str | None # Optional link text
source_location: SourceLocation
8. Data Models
8.1. AsciidocDocument
@dataclass
class AsciidocDocument:
"""Represents a parsed AsciiDoc document."""
file_path: Path
title: str
attributes: dict[str, str]
sections: list[AsciidocSection]
elements: list[AsciidocElement]
cross_references: list[CrossReference]
includes: list[IncludeInfo] # All resolved includes
8.2. AsciidocSection
@dataclass
class AsciidocSection:
"""A section in the document."""
title: str
level: int # 0-5 (0 = document title)
anchor: str | None # [[anchor]] if present
source_location: SourceLocation
path: str # Hierarchical path
children: list[AsciidocSection]
8.3. AsciidocElement
@dataclass
class AsciidocElement:
"""An extractable element."""
type: Literal["code", "plantuml", "mermaid", "ditaa", "table", "image", "admonition", "list"]
source_location: SourceLocation
attributes: dict[str, Any] # Type-specific attributes
parent_section: str # Path of containing section
8.4. IncludeInfo
@dataclass
class IncludeInfo:
"""Information about a resolved include."""
source_location: SourceLocation # Where the include is located
target_path: Path # Resolved target path
options: dict[str, str] # leveloffset, lines, etc.
9. Parser Behavior
9.1. Attribute Resolution
Attributes are resolved during parsing:
-
Set standard attributes (
doctype,imagesdir, etc.) -
Parse and set header attributes
-
For each attribute reference
{name}, insert the current value -
Treat unknown attributes as empty string (with warning)
9.2. Error Handling
| Situation | Behavior |
|---|---|
Include file not found |
|
Circular include |
|
Invalid attribute syntax |
Log warning, ignore line |
Invalid UTF-8 encoding |
|
Empty file |
Return empty document (no sections) |
File without sections |
Entire content as implicit root section |
9.3. Performance Requirements
-
Parsing a single file (without includes): < 50ms
-
Parsing a document with 50 include files: < 2s
-
Memory consumption: < 10KB per parsed file (without content)
-
Include depth: max. 20 levels (configurable)
10. Acceptance Criteria
10.1. AC-ADOC-01: Section Extraction
Scenario: Sections are correctly extracted
Given an AsciiDoc file with the following content:
"""
= Main Title
== Chapter 1
Text...
== Chapter 2
=== Subchapter
"""
When the parser processes the file
Then 4 sections are extracted
And the hierarchy is:
| path | level |
| /main-title | 0 |
| /main-title/chapter-1 | 1 |
| /main-title/chapter-2 | 1 |
| /main-title/chapter-2/subchapter | 2 |
10.2. AC-ADOC-02: Attribute Resolution
Scenario: Attributes are correctly resolved
Given an AsciiDoc file with the following content:
"""
:author: John Doe
:project: MCP Server
= {project} Documentation
Author: {author}
"""
When the parser processes the file
Then the document title is "MCP Server Documentation"
And attributes["author"] is "John Doe"
10.3. AC-ADOC-03: Include Resolution
Scenario: Includes are recursively resolved
Given a main file "main.adoc":
"""
= Main Document
\include::chapter.adoc[leveloffset=+1]
"""
And an include file "chapter.adoc":
"""
= Chapter
Chapter content.
"""
When the parser processes "main.adoc"
Then the document contains 2 sections
And the section "Chapter" has level 1 (due to leveloffset)
And the source_location of "Chapter" points to "chapter.adoc"
10.4. AC-ADOC-04: Circular Include Detection
Scenario: Circular includes are detected
Given a file "a.adoc" with "include::b.adoc[]"
And a file "b.adoc" with "include::a.adoc[]"
When the parser processes "a.adoc"
Then a CircularIncludeError is thrown
And the error message contains "a.adoc -> b.adoc -> a.adoc"
10.5. AC-ADOC-05: Source Block Extraction
Scenario: Source blocks are extracted
Given an AsciiDoc file with a Python source block
When the parser processes the file
Then elements contains an entry of type "code"
And its language equals "python"
And source_location points to the correct file and line
10.6. AC-ADOC-06: PlantUML Extraction
Scenario: PlantUML blocks are extracted as their own type
Given an AsciiDoc file with:
"""
[plantuml, my-diagram, svg]
----
@startuml
A -> B
@enduml
----
"""
When the parser processes the file
Then elements contains an entry of type "plantuml"
And name equals "my-diagram"
And format equals "svg"
10.7. AC-ADOC-07: Admonition Extraction
Scenario: Admonitions are extracted
Given an AsciiDoc file with "WARNING: Important notice"
When the parser processes the file
Then elements contains an entry of type "admonition"
And admonition_type equals "WARNING"
10.8. AC-ADOC-08: Cross-Reference Capture
Scenario: Cross-references are captured
Given an AsciiDoc file with:
"""
See <<section-a>> and xref:other.adoc#anchor[Link].
"""
When the parser processes the file
Then cross_references contains 2 entries
And the first is type="internal", target="section-a"
And the second is type="external", target="other.adoc#anchor"
10.9. AC-ADOC-09: Attribute Substitution in Include Paths
Scenario: Attributes in include paths are resolved
Given an AsciiDoc file with:
"""
:chaptersdir: chapters
\include::{chaptersdir}/intro.adoc[]
"""
And a file "chapters/intro.adoc" exists
When the parser processes the file
Then "chapters/intro.adoc" is successfully included
11. Interfaces
11.1. Parser Interface
class AsciidocParser:
"""Parser for AsciiDoc documents."""
def __init__(self, base_path: Path, max_include_depth: int = 20):
"""
Initializes the parser.
Args:
base_path: Base path for relative include resolution
max_include_depth: Maximum include depth
"""
...
def parse_file(self, file_path: Path) -> AsciidocDocument:
"""
Parses an AsciiDoc file with include resolution.
Raises:
FileNotFoundError: File does not exist
CircularIncludeError: Circular include detected
IncludeNotFoundError: Include file not found
"""
...
def get_section(self, doc: AsciidocDocument, path: str) -> AsciidocSection | None:
"""Finds a section by its hierarchical path."""
...
def get_elements(
self,
doc: AsciidocDocument,
element_type: str | None = None
) -> list[AsciidocElement]:
"""Returns all elements, optionally filtered by type."""
...
def validate_cross_references(
self,
doc: AsciidocDocument
) -> list[ValidationError]:
"""Checks all cross-references for validity."""
...
12. Implementation Notes
12.1. Regex Patterns
# Section (Level 0-5)
SECTION_PATTERN = r'^(={1,6})\s+(.+?)(?:\s+=*)?$'
# Document attribute
ATTRIBUTE_PATTERN = r'^:([a-zA-Z0-9_-]+):\s*(.*)$'
# Unset attribute
ATTRIBUTE_UNSET_PATTERN = r'^:!([a-zA-Z0-9_-]+):$'
# Attribute reference
ATTRIBUTE_REF_PATTERN = r'\{([a-zA-Z0-9_-]+)\}'
# Include directive
INCLUDE_PATTERN = r'^include::(.+?)\[(.*?)\]$'
# Source block start
SOURCE_BLOCK_PATTERN = r'^\[source(?:,\s*(\w+))?\]$'
# PlantUML block start
PLANTUML_PATTERN = r'^\[plantuml(?:,\s*([^,\]]+))?(?:,\s*(\w+))?\]$'
# Mermaid block start (Issue #122)
MERMAID_PATTERN = r'^\[mermaid(?:,\s*([^,\]]+))?(?:,\s*(\w+))?\]$'
# Ditaa block start (Issue #122)
DITAA_PATTERN = r'^\[ditaa(?:,\s*([^,\]]+))?(?:,\s*(\w+))?\]$'
# Block delimiter
BLOCK_DELIMITER = r'^(-{4,}|={4,}|\*{4,}|_{4,})$'
# Block image
BLOCK_IMAGE_PATTERN = r'^image::(.+?)\[(.*)?]$'
# Admonition (short form)
ADMONITION_SHORT_PATTERN = r'^(NOTE|TIP|WARNING|CAUTION|IMPORTANT):\s*(.+)$'
# Cross-reference (internal)
XREF_INTERNAL_PATTERN = r'<<([^,>]+)(?:,([^>]+))?>>`
# Cross-reference (external)
XREF_EXTERNAL_PATTERN = r'xref:([^#\[]+)?(?:#([^\[]+))?\[([^\]]*)\]'
# Anchor
ANCHOR_PATTERN = r'^\[\[([^\]]+)\]\]$'
# Table start/end
TABLE_DELIMITER = r'^\|===$'
12.2. State Machine for Parsing
12.3. Include Resolution Algorithm
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.