Documentation Index
Fetch the complete documentation index at: https://docs.dataharbor.co/llms.txt
Use this file to discover all available pages before exploring further.
Markdown Input
DataHarbor supports Markdown as an upstream input format. Before controls run, it normalizes the document into DataHarbor’s canonical JSON model.Use this page when your source data is Markdown. If you want to return Markdown to clients, see Output Formatting.
How DataHarbor thinks about Markdown
DataHarbor treats Markdown as a document, not as a tree of headings, paragraphs, lists, and tables. That means:- The document body is preserved as a single
contentstring - YAML front matter, when present, is parsed into a
frontmatterobject - Headings, lists, and tables are not inferred into separate JSON fields
- Controls operate on the normalized JSON model, not on a Markdown AST
Markdown normalization is intentionally document-oriented. This avoids inventing structure that may not match your document’s meaning and keeps the pipeline consistent with DataHarbor’s canonical JSON model.
Markdown As Input
When Markdown input is detected
DataHarbor recognizes Markdown input when any of the following is true:- The upstream response uses
Content-Type: text/markdown - You explicitly set
input_format: markdownin your Virtual API Configuration - The upstream
Content-Typeis missing or unrecognized and body sniffing identifies the payload as Markdown
input_format: markdown when the upstream API serves Markdown with a missing, generic, or incorrect content type.
Normalized shape
Markdown normalizes to a top-level JSON object.Document with front matter
Document without front matter
Front matter rules
DataHarbor only recognizes front matter when all of these rules are met:- The opening fence must be
---on the very first line of the document - The closing fence must be
---or... - The front matter block must be a YAML mapping
- The Markdown body begins after the closing fence line
- A
---horizontal rule later in the document - An opening line like
--- extra - Front matter that appears after any body content
Front matter limitations
Front matter uses the same JSON-compatible YAML subset as DataHarbor’s YAML input normalizer. Supported front matter values include:- Strings
- Numbers
- Booleans
- Nulls
- Nested mappings and sequences
frontmatter and still preserves the document body in content.
Examples of unsupported front matter constructs include:
- Anchors and aliases
- Tags
- Merge keys
- Non-finite float values such as
InfinityandNaN - Excessively deep nesting
What controls can target
Once normalized, controls can target the Markdown payload like any other JSON object.- Use
contentto operate on the entire Markdown body as a single string - Use paths like
frontmatter.titleorfrontmatter.tagsto target structured metadata from front matter
What controls cannot target
Controls cannot address headings, paragraphs, list items, or table cells as first-class fields inside the Markdown body. For example, DataHarbor does not create JSON paths like these:content.sections[0].headingcontent.lists[1].items[2]content.tables[0].rows[3].email
Markdown Detection Nuances
When the upstreamContent-Type is missing or unrecognized, DataHarbor uses body sniffing.
Markdown is recognized when the body:
- Starts with a Markdown heading like
# Heading, or - Starts with a valid front matter block and has non-empty Markdown body content after the closing fence
input_format: markdown or send Content-Type: text/markdown.
Best Practices
- Send
Content-Type: text/markdownwhenever you control the upstream API - Use front matter for structured metadata you want to target in controls
- Treat
contentas a whole-document field, not as parsed sections - Use
input_format: markdownwhen the upstream content type is missing, generic, or ambiguous
When To Use Markdown
Markdown works best when:- Your upstream data is primarily document text
- You want to preserve human-authored prose
- You only need lightweight structured metadata in front matter
- Your consumers are people or agents that benefit from readable text output
- You need field-level controls inside the body itself
- You need stable machine-oriented structure throughout the document
- You expect a byte-for-byte round-trip from source to response
Next Steps
Input Normalization
See how Markdown fits into the broader normalization stage
Output Formatting
Return Markdown, CSV, YAML, or JSON after controls run
YAML Reference
Configure
input_format and default_output_format
