Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.dataharbor.co/llms.txt

Use this file to discover all available pages before exploring further.

Markdown Input

DataHarbor supports Markdown as an upstream input format. Before controls run, it normalizes the document into DataHarbor’s canonical JSON model.
Use this page when your source data is Markdown. If you want to return Markdown to clients, see Output Formatting.
Markdown input does not behave like a fully structured schema. DataHarbor treats it as a document with optional front matter, not as an AST of headings, lists, and tables.

How DataHarbor thinks about Markdown

DataHarbor treats Markdown as a document, not as a tree of headings, paragraphs, lists, and tables. That means:
  • The document body is preserved as a single content string
  • YAML front matter, when present, is parsed into a frontmatter object
  • Headings, lists, and tables are not inferred into separate JSON fields
  • Controls operate on the normalized JSON model, not on a Markdown AST
Markdown normalization is intentionally document-oriented. This avoids inventing structure that may not match your document’s meaning and keeps the pipeline consistent with DataHarbor’s canonical JSON model.

Markdown As Input

When Markdown input is detected

DataHarbor recognizes Markdown input when any of the following is true:
  • The upstream response uses Content-Type: text/markdown
  • You explicitly set input_format: markdown in your Virtual API Configuration
  • The upstream Content-Type is missing or unrecognized and body sniffing identifies the payload as Markdown
Use input_format: markdown when the upstream API serves Markdown with a missing, generic, or incorrect content type.
version: "0.3"
input_format: markdown
objects:
  _default:
    controls:
      - type: allow
        fields: [frontmatter.title, content]

Normalized shape

Markdown normalizes to a top-level JSON object.

Document with front matter

---
title: API Guide
tags:
  - rest
  - auth
draft: false
---
# API Guide

This guide covers authentication.
Normalizes to:
{
  "frontmatter": {
    "title": "API Guide",
    "tags": ["rest", "auth"],
    "draft": false
  },
  "content": "# API Guide\n\nThis guide covers authentication."
}

Document without front matter

# Hello World

Some markdown content.
Normalizes to:
{
  "content": "# Hello World\n\nSome markdown content."
}

Front matter rules

DataHarbor only recognizes front matter when all of these rules are met:
  • The opening fence must be --- on the very first line of the document
  • The closing fence must be --- or ...
  • The front matter block must be a YAML mapping
  • The Markdown body begins after the closing fence line
The following are intentionally not treated as front matter:
  • A --- horizontal rule later in the document
  • An opening line like --- extra
  • Front matter that appears after any body content

Front matter limitations

Front matter uses the same JSON-compatible YAML subset as DataHarbor’s YAML input normalizer. Supported front matter values include:
  • Strings
  • Numbers
  • Booleans
  • Nulls
  • Nested mappings and sequences
Unsupported YAML constructs are not partially preserved. If front matter is malformed or uses YAML features that DataHarbor does not support, DataHarbor omits frontmatter and still preserves the document body in content. Examples of unsupported front matter constructs include:
  • Anchors and aliases
  • Tags
  • Merge keys
  • Non-finite float values such as Infinity and NaN
  • Excessively deep nesting
Front matter failures are intentionally soft. DataHarbor does not reject the whole Markdown document when front matter is malformed or unsupported. Instead, it drops frontmatter and continues with content only.

What controls can target

Once normalized, controls can target the Markdown payload like any other JSON object.
  • Use content to operate on the entire Markdown body as a single string
  • Use paths like frontmatter.title or frontmatter.tags to target structured metadata from front matter
version: "0.3"
input_format: markdown
objects:
  _default:
    controls:
      - type: allow
        fields: [frontmatter.title, frontmatter.tags, content]

What controls cannot target

Controls cannot address headings, paragraphs, list items, or table cells as first-class fields inside the Markdown body. For example, DataHarbor does not create JSON paths like these:
  • content.sections[0].heading
  • content.lists[1].items[2]
  • content.tables[0].rows[3].email
If you need fine-grained structural control over document internals, use a structured upstream format such as JSON, CSV, or YAML instead of raw Markdown.

Markdown Detection Nuances

When the upstream Content-Type is missing or unrecognized, DataHarbor uses body sniffing. Markdown is recognized when the body:
  • Starts with a Markdown heading like # Heading, or
  • Starts with a valid front matter block and has non-empty Markdown body content after the closing fence
This means front-matter-only documents are ambiguous during sniffing. If the payload is only a fenced YAML block with no Markdown body, declare input_format: markdown or send Content-Type: text/markdown.

Best Practices

  • Send Content-Type: text/markdown whenever you control the upstream API
  • Use front matter for structured metadata you want to target in controls
  • Treat content as a whole-document field, not as parsed sections
  • Use input_format: markdown when the upstream content type is missing, generic, or ambiguous

When To Use Markdown

Markdown works best when:
  • Your upstream data is primarily document text
  • You want to preserve human-authored prose
  • You only need lightweight structured metadata in front matter
  • Your consumers are people or agents that benefit from readable text output
Markdown is usually the wrong choice when:
  • You need field-level controls inside the body itself
  • You need stable machine-oriented structure throughout the document
  • You expect a byte-for-byte round-trip from source to response

Next Steps

Input Normalization

See how Markdown fits into the broader normalization stage

Output Formatting

Return Markdown, CSV, YAML, or JSON after controls run

YAML Reference

Configure input_format and default_output_format