Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.dataharbor.co/llms.txt

Use this file to discover all available pages before exploring further.

Virtual APIs

One source of truth. Many governed views. A Virtual API is a governed view of an existing dataset. Instead of cloning data or building custom pipelines for each consumer, you define a declarative Virtual API Configuration that applies redaction, tokenization, and transformations at the field level — automatically, per route.
A Virtual API is also a pipeline definition: DataHarbor fetches upstream data, normalizes it into canonical JSON, matches the correct object definition, runs your ordered controls, then formats the response for delivery. See Data Pipeline for the full request flow.

How it works

  1. Enroll your data source (HTTP and GraphQL today, data lakes coming soon)
  2. Define your Virtual API Configuration — object definitions, controls, and access rules
  3. Publish — your Virtual API is instantly available
  4. Monitor — observe usage and schema changes over time

Basic configuration

A Virtual API Configuration defines which fields to protect, organized by the source objects they apply to:
version: "0.3"
objects:
  customers:
    controls:
      - type: redact
        fields: [ssn, date_of_birth]
      - type: anonymize
        fields: [email]
This configuration redacts ssn and date_of_birth and anonymizes email for any request matching the customers route — such as /api/customers or /api/customers/456. In the current runtime, controls live under named object definitions inside objects. There is no top-level controls list in the executed Virtual API Configuration.
You’ll see fields written two ways in our examples — both are valid YAML and produce identical results:
# Inline style
fields: [first_name, last_name]

# Multi-line style
fields:
  - first_name
  - last_name

Controls are a pipeline

Controls execute top-to-bottom as an ordered pipeline. This means you can create a field with a transform and then apply a privacy control to it in a later step:
version: "0.3"
objects:
  customers:
    controls:
      # Step 1: Combine name fields into a single field
      - type: combine
        fields: [first_name, last_name]
        into: full_name
        separator: " "
        remove_source: true

      # Step 2: Hash the combined name
      - type: hash
        fields: [full_name]

      # Step 3: Redact email
      - type: redact
        fields: [email]
In this example, full_name is created by the combine, then hashed — and the original first_name and last_name fields are removed. Order matters: each control sees the result of the controls above it. This ordered controls list runs after input normalization and before output formatting.

Object definitions and route matching

Each key under objects is a named object definition that matches against URL path segments. DataHarbor matches the object name to the last (or second-to-last) segment of the request path.
GraphQL sources are the exception: object names come from the response shape under data, not from URL path segments. See GraphQL Sources for the GraphQL-specific matching model.
version: "0.3"
objects:
  addresses:
    controls:
      - type: redact
        fields: [name]
  users:
    controls:
      - type: redact
        fields: [metadata.tag]
Request pathMatched object
/v1/addressesaddresses
/v1/addresses/1addresses
/api/users/123users

Targeting nested and array fields

Target nested fields using dot notation, and array items using [arrayName] syntax:
version: "0.3"
objects:
  orders:
    controls:
      - type: redact
        fields: [name, '[addresses].name', '[addresses].city']
This redacts the top-level name, plus the name and city fields inside every item in the addresses array — all within a single object definition. Use dot notation only in v0.3. Slash notation like metadata/tag is rejected. If an endpoint returns a root array, the matched object’s controls run against each object element automatically. See Field Targeting for the shared path rules used by Data Control and Data Transform. For nested objects, use dot notation:
controls:
  - type: redact
    fields: [metadata.tag, metadata.viewCount]

Multiple views from one source

One enrolled API can power many governed views, each with different control configurations:
# Partner View — redact identifiers
version: "0.3"
objects:
  customers:
    controls:
      - type: redact
        fields: [ssn, date_of_birth, email]
# Analytics View — tokenize for correlation
version: "0.3"
objects:
  customers:
    controls:
      - type: tokenize
        fields: [email, phone]
      - type: redact
        fields: [ssn]
# AI Training View — anonymize for model training
version: "0.3"
objects:
  customers:
    controls:
      - type: redact
        fields: ['[addresses].name']
      - type: anonymize
        fields: [metadata.viewCount]
No duplicated data. No cloned APIs.

The _default object

_default is an optional catch-all that applies when no named object matches the request route:
version: "0.3"
objects:
  addresses:
    controls:
      - type: redact
        fields: [name]
  _default:
    controls:
      - type: redact
        fields: [description]
Precedence: When a request matches a named object (e.g., addresses), only that object’s controls apply. When no named object matches, _default controls apply.

Spec options

OptionDefaultDescription
input_formatAuto-detectedExpected upstream format. Values: json, csv, yaml, markdown
default_output_formatjsonDesired response format. Values: json, markdown, csv, yaml
failOnUnmatchedObjectfalseReject requests that don’t match any defined object
useStrictNameMatchingtrueWhen false, match field names case-insensitively
version: "0.3"
default_output_format: markdown
failOnUnmatchedObject: true
objects:
  addresses:
    controls:
      - type: redact
        fields: [name]
Set default_output_format: markdown to make every response from this Virtual API agent-friendly. You can also request Markdown per-request by appending .md to the URL. See Output Formatting for details.
If your upstream API serves Markdown, set input_format: markdown when the upstream Content-Type is missing or unreliable. See Markdown Input for document-mode normalization, front matter rules, and limitations.

Versioning

Every change to a Virtual API creates a new version. You can:
  • View the full version history
  • Compare configurations between versions
  • Roll back to a previous version
  • Audit which version was active at any point in time

Lifecycle

StateDescription
ActiveLive and accepting requests
ExpiredPast expiration date, no longer accepting requests
InactiveManually disabled, instant shutdown

Next steps

Data Pipeline

See normalization, object matching, controls, and formatting end to end

Data Control

Redaction, tokenization, anonymization

YAML Reference

Look up config keys and pipeline options