Markdown Input

DataHarbor supports Markdown as an upstream input format. Before controls run, it normalizes the document into DataHarbor’s canonical JSON model.

Use this page when your source data is Markdown. If you want to return Markdown to clients, see Output Formatting.

Markdown input does not behave like a fully structured schema. DataHarbor treats it as a document with optional front matter, not as an AST of headings, lists, and tables.

How DataHarbor thinks about Markdown

DataHarbor treats Markdown as a document, not as a tree of headings, paragraphs, lists, and tables. That means:

The document body is preserved as a single content string
YAML front matter, when present, is parsed into a frontmatter object
Headings, lists, and tables are not inferred into separate JSON fields
Controls operate on the normalized JSON model, not on a Markdown AST

Markdown normalization is intentionally document-oriented. This avoids inventing structure that may not match your document’s meaning and keeps the pipeline consistent with DataHarbor’s canonical JSON model.

Markdown As Input

When Markdown input is detected

DataHarbor recognizes Markdown input when any of the following is true:

The upstream response uses Content-Type: text/markdown
You explicitly set input_format: markdown in your Virtual API Configuration
The upstream Content-Type is missing or unrecognized and body sniffing identifies the payload as Markdown

Use input_format: markdown when the upstream API serves Markdown with a missing, generic, or incorrect content type.

version: "0.3"
input_format: markdown
objects:
  _default:
    controls:
      - type: allow
        fields: [frontmatter.title, content]

Normalized shape

Markdown normalizes to a top-level JSON object.

Document with front matter

---
title: API Guide
tags:
  - rest
  - auth
draft: false
---
# API Guide

This guide covers authentication.

Normalizes to:

{
  "frontmatter": {
    "title": "API Guide",
    "tags": ["rest", "auth"],
    "draft": false
  },
  "content": "# API Guide\n\nThis guide covers authentication."
}

Document without front matter

# Hello World

Some markdown content.

Normalizes to:

{
  "content": "# Hello World\n\nSome markdown content."
}

Front matter rules

DataHarbor only recognizes front matter when all of these rules are met:

The opening fence must be --- on the very first line of the document
The closing fence must be --- or ...
The front matter block must be a YAML mapping
The Markdown body begins after the closing fence line

The following are intentionally not treated as front matter:

A --- horizontal rule later in the document
An opening line like --- extra
Front matter that appears after any body content

Front matter limitations

Front matter uses the same JSON-compatible YAML subset as DataHarbor’s YAML input normalizer. Supported front matter values include:

Strings
Numbers
Booleans
Nulls
Nested mappings and sequences

Unsupported YAML constructs are not partially preserved. If front matter is malformed or uses YAML features that DataHarbor does not support, DataHarbor omits frontmatter and still preserves the document body in content. Examples of unsupported front matter constructs include:

Anchors and aliases
Tags
Merge keys
Non-finite float values such as Infinity and NaN
Excessively deep nesting

Front matter failures are intentionally soft. DataHarbor does not reject the whole Markdown document when front matter is malformed or unsupported. Instead, it drops frontmatter and continues with content only.

What controls can target

Once normalized, controls can target the Markdown payload like any other JSON object.

Use content to operate on the entire Markdown body as a single string
Use paths like frontmatter.title or frontmatter.tags to target structured metadata from front matter

version: "0.3"
input_format: markdown
objects:
  _default:
    controls:
      - type: allow
        fields: [frontmatter.title, frontmatter.tags, content]

What controls cannot target

Controls cannot address headings, paragraphs, list items, or table cells as first-class fields inside the Markdown body. For example, DataHarbor does not create JSON paths like these:

content.sections[0].heading
content.lists[1].items[2]
content.tables[0].rows[3].email

If you need fine-grained structural control over document internals, use a structured upstream format such as JSON, CSV, or YAML instead of raw Markdown.

Markdown Detection Nuances

When the upstream Content-Type is missing or unrecognized, DataHarbor uses body sniffing. Markdown is recognized when the body:

Starts with a Markdown heading like # Heading, or
Starts with a valid front matter block and has non-empty Markdown body content after the closing fence

This means front-matter-only documents are ambiguous during sniffing. If the payload is only a fenced YAML block with no Markdown body, declare input_format: markdown or send Content-Type: text/markdown.

Best Practices

Send Content-Type: text/markdown whenever you control the upstream API
Use front matter for structured metadata you want to target in controls
Treat content as a whole-document field, not as parsed sections
Use input_format: markdown when the upstream content type is missing, generic, or ambiguous

When To Use Markdown

Markdown works best when:

Your upstream data is primarily document text
You want to preserve human-authored prose
You only need lightweight structured metadata in front matter
Your consumers are people or agents that benefit from readable text output

Markdown is usually the wrong choice when:

You need field-level controls inside the body itself
You need stable machine-oriented structure throughout the document
You expect a byte-for-byte round-trip from source to response

Next Steps

Input Normalization

See how Markdown fits into the broader normalization stage

Output Formatting

Return Markdown, CSV, YAML, or JSON after controls run

YAML Reference

Configure input_format and default_output_format

Getting Started

Documentation Index

​Markdown Input

​How DataHarbor thinks about Markdown

​Markdown As Input

​When Markdown input is detected

​Normalized shape

​Document with front matter

​Document without front matter

​Front matter rules

​Front matter limitations

​What controls can target

​What controls cannot target

​Markdown Detection Nuances

​Best Practices

​When To Use Markdown

​Next Steps

Input Normalization

Output Formatting

YAML Reference

Markdown Input

How DataHarbor thinks about Markdown

Markdown As Input

When Markdown input is detected

Normalized shape

Document with front matter

Document without front matter

Front matter rules

Front matter limitations

What controls can target

What controls cannot target

Markdown Detection Nuances

Best Practices

When To Use Markdown

Next Steps