Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.dataharbor.co/llms.txt

Use this file to discover all available pages before exploring further.

Core Concepts

Virtual API

A Virtual API is a published, governed view of an existing dataset. It’s the fundamental unit in DataHarbor. Each Virtual API:
  • Points to a single enrolled data source
  • Applies data controls, transforms, and access rules
  • Can be delivered via REST, MCP, or Data Lake
  • Is versioned, auditable, and revocable

Object Definitions

Virtual APIs use a YAML configuration with an objects key containing named object definitions. Each object definition specifies the ordered controls pipeline for that resource type. See Virtual APIs for a full example and Data Pipeline for how object matching fits into request processing.

URL Matching

DataHarbor automatically applies the correct object definition based on the API URL. Object names must match URL segments exactly. For a request to /properties/{propertyId}/inspections/{inspectionId}/issues, the system:
  1. Parses the URL path segments
  2. Checks whether the last path segment matches a defined object name — in this case, issues
  3. If the last segment does not match, checks the second-to-last path segment instead
Only one object definition is selected per request. If no named object matches, _default is used when present.

The _default Object

_default is an optional catch-all for any object not explicitly defined.
version: "0.3"
objects:
  properties:
    controls:
      - type: redact
        fields: [ownerName]
  _default:
    controls:
      - type: redact
        fields: [createdBy]
When a request matches an undefined object, _default controls apply. When a request matches a named object, only that named object’s controls apply.

Control Blocks

Control Blocks are declarative rules applied to a Virtual API. Three categories:
BlockPurpose
Data ControlPrivacy operations: redact, tokenize, anonymize, mask, hash
Data TransformField transformations: combine, coalesce, delete
Access ControlAccess rules: geo restrictions, expiration, shutdown
Control Blocks are composable. Apply as many as needed to a single Virtual API.

Data Pipeline

Every request flows through the same high-level pipeline:
  1. Authenticate and authorize the caller
  2. Fetch the upstream payload
  3. Normalize it into DataHarbor’s canonical JSON model
  4. Match the correct object definition
  5. Run the ordered controls pipeline
  6. Format the result for delivery
See Data Pipeline for the full end-to-end flow.

Control Set

A Control Set is the complete declarative configuration of a Virtual API — source, controls, delivery, and access rules combined.

Enrollment

Enrollment is the process of connecting an existing data source to DataHarbor. DataHarbor currently supports REST APIs, including JSON, CSV, YAML, and Markdown payloads; data lakes are coming soon.

Delivery

Delivery is how consumers access a Virtual API:
  • REST API — Standard HTTP endpoint with API key auth
  • MCP Server — Model Context Protocol endpoint for AI agents
  • Data Lake — Scheduled sync to Fabric, Databricks, Snowflake, BigQuery

Governance

Every Virtual API includes governance primitives:
  • Versioning — Track what changed, when, under which policy
  • Expiration — Set end dates for access
  • Revocation — Cut off access instantly
  • Audit trail — Correlate access with policy versions
  • Organization Authorizations — Authorize partner organizations for relay access to a Virtual API by org ID. The partner’s existing marketplace key works automatically; usage is billed to their quota.