Documentation Index
Fetch the complete documentation index at: https://docs.dataharbor.co/llms.txt
Use this file to discover all available pages before exploring further.
Virtual APIs
One source of truth. Many governed views. A Virtual API is a governed view of an existing dataset. Instead of cloning data or building custom pipelines for each consumer, you define a declarative Virtual API Configuration that applies redaction, tokenization, and transformations at the field level — automatically, per route.A Virtual API is also a pipeline definition: DataHarbor fetches upstream data, normalizes it into canonical JSON, matches the correct object definition, runs your ordered controls, then formats the response for delivery. See Data Pipeline for the full request flow.
How it works
- Enroll your data source (HTTP and GraphQL today, data lakes coming soon)
- Define your Virtual API Configuration — object definitions, controls, and access rules
- Publish — your Virtual API is instantly available
- Monitor — observe usage and schema changes over time
Basic configuration
A Virtual API Configuration defines which fields to protect, organized by the source objects they apply to:ssn and date_of_birth and anonymizes email for any request matching the customers route — such as /api/customers or /api/customers/456.
In the current runtime, controls live under named object definitions inside objects. There is no top-level controls list in the executed Virtual API Configuration.
Controls are a pipeline
Controls execute top-to-bottom as an ordered pipeline. This means you can create a field with a transform and then apply a privacy control to it in a later step:full_name is created by the combine, then hashed — and the original first_name and last_name fields are removed. Order matters: each control sees the result of the controls above it.
This ordered controls list runs after input normalization and before output formatting.
Object definitions and route matching
Each key underobjects is a named object definition that matches against URL path segments. DataHarbor matches the object name to the last (or second-to-last) segment of the request path.
GraphQL sources are the exception: object names come from the response shape under
data, not from URL path segments. See GraphQL Sources for the GraphQL-specific matching model.| Request path | Matched object |
|---|---|
/v1/addresses | addresses |
/v1/addresses/1 | addresses |
/api/users/123 | users |
Targeting nested and array fields
Target nested fields using dot notation, and array items using[arrayName] syntax:
name, plus the name and city fields inside every item in the addresses array — all within a single object definition.
Use dot notation only in v0.3. Slash notation like metadata/tag is rejected.
If an endpoint returns a root array, the matched object’s controls run against each object element automatically.
See Field Targeting for the shared path rules used by Data Control and Data Transform.
For nested objects, use dot notation:
Multiple views from one source
One enrolled API can power many governed views, each with different control configurations:The _default object
_default is an optional catch-all that applies when no named object matches the request route:
addresses), only that object’s controls apply. When no named object matches, _default controls apply.
Spec options
| Option | Default | Description |
|---|---|---|
input_format | Auto-detected | Expected upstream format. Values: json, csv, yaml, markdown |
default_output_format | json | Desired response format. Values: json, markdown, csv, yaml |
failOnUnmatchedObject | false | Reject requests that don’t match any defined object |
useStrictNameMatching | true | When false, match field names case-insensitively |
If your upstream API serves Markdown, set
input_format: markdown when the upstream Content-Type is missing or unreliable. See Markdown Input for document-mode normalization, front matter rules, and limitations.Versioning
Every change to a Virtual API creates a new version. You can:- View the full version history
- Compare configurations between versions
- Roll back to a previous version
- Audit which version was active at any point in time
Lifecycle
| State | Description |
|---|---|
| Active | Live and accepting requests |
| Expired | Past expiration date, no longer accepting requests |
| Inactive | Manually disabled, instant shutdown |
Next steps
Data Pipeline
See normalization, object matching, controls, and formatting end to end
Data Control
Redaction, tokenization, anonymization
YAML Reference
Look up config keys and pipeline options

