Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.dataharbor.co/llms.txt

Use this file to discover all available pages before exploring further.

Input Normalization

Different source formats. One controls pipeline. Before DataHarbor can apply privacy controls or transforms, it normalizes the upstream payload into a canonical JSON model.
Upstream payload → Normalize to canonical JSON → Apply controls → Format output → Respond
Normalization is the stage that makes one controls pipeline work across JSON, CSV, YAML, and Markdown. See Data Pipeline for the full request flow.

Supported input formats

FormatContent-TypeNormalized shape
JSONapplication/json, application/*+jsonParsed directly
CSVtext/csvArray of objects keyed by header columns
YAMLtext/yaml, application/yaml, application/x-yamlJSON-compatible mappings, sequences, and scalars
Markdowntext/markdownDocument object with content and optional frontmatter

When to set input_format

Most of the time, DataHarbor can infer the source format from the upstream Content-Type header. Set input_format only when the upstream service returns a missing, generic, or incorrect content type.
version: "0.3"
input_format: markdown
objects:
  _default:
    controls:
      - type: allow
        fields: [frontmatter.title, content]

CSV normalization

CSV input is parsed as headered rectangular data. The first row defines the column names, and each later row becomes an object.
name,age,active
Alice,30,true
Bob,25,false
Normalizes to:
[
  { "name": "Alice", "age": 30, "active": true },
  { "name": "Bob", "age": 25, "active": false }
]
Type coercion is conservative: obvious booleans, integers, and decimals are converted. Everything else stays a string.

YAML normalization

YAML input is normalized using the JSON-compatible subset of YAML.
name: Alice
tags:
  - admin
  - owner
active: true
Normalizes to:
{
  "name": "Alice",
  "tags": ["admin", "owner"],
  "active": true
}
Advanced YAML constructs that do not map cleanly to JSON — including anchors, aliases, tags, merge keys, multi-document streams, and non-finite floats — are rejected with a descriptive error.

Markdown normalization

Markdown input is handled in document mode. DataHarbor preserves the body as content and extracts YAML front matter into frontmatter when present.
---
title: API Guide
tags:
  - auth
---
# API Guide

Authentication details here.
Normalizes to:
{
  "frontmatter": {
    "title": "API Guide",
    "tags": ["auth"]
  },
  "content": "# API Guide\n\nAuthentication details here."
}
See Markdown Input for full front matter rules, limitations, and detection nuances.

Body sniffing

When the upstream Content-Type header is missing or unrecognized and no input_format is declared, DataHarbor inspects the response body to infer the format.
  • Starts with { or [ → JSON
  • Two rows with matching comma-separated field counts → CSV
  • Starts with --- or a key: mapping → YAML
  • Starts with valid front matter plus non-empty body, or starts with # → Markdown
If detection is inconclusive, DataHarbor defaults to JSON.

Why normalization matters

  • Controls always run against normalized JSON, not raw upstream bytes.
  • The same field targeting rules work across source formats.
  • Output formatting happens later, so you can normalize from one format and respond in another.

Next steps

Data Pipeline

See where normalization fits in the request flow

Output Formatting

Learn how the final governed payload is rendered

Markdown Input

Dive into Markdown-specific rules and limitations