NAV
shell python

Introduction

Welcome to the Tree Schema API!

The Tree Schema API gives you programatic access to just about every resource within Tree Schema. The Tree Schema API is designed to give you the ability to keep your data catalog up to date by integrating Tree Schema directly into your ETL jobs, model pipelines and analytical workflows.

We have language bindings depicted in Shell and Python (with more on the way!) but all of the interfaces are built around REST so you can interact with Tree Schema from any You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.

All API requests are made to the following host:

https://api.treeschema.com/catalog

Make sure to properly authenticate!

API Overview

Authentication

Use base64 encoding to create the authentication string for your account. You will need to concatenate your email, a colon and your secret key before base64 encoding the full string. Add the Authorization key to your headers with the Basic prefix.

SECRET_KEY=your_secret_key
TREE_SCHEMA_EMAIL=your_email
ENCODED_SECRET=$(echo -n "$TREE_SCHEMA_EMAIL:$SECRET_KEY" | openssl base64)

curl -H "Authorization: Basic $ENCODED_SECRET" \
"https://api.treeschema.com/catalog/search?term=dev"

import base64
import requests as r

creds = (your_email + ':' + your_secret_key).encode('utf-8')
encoded_creds = base64.b64encode(creds).decode('utf-8')

headers = {
    'Authorization': 'Basic ' + encoded_creds
}

resp = r.get(..., headers=headers)

Authorization is done using a combination of the email used for your Tree Schema account and your user secret key. Your organization owner will first need to enable programatic access for your org and once that is done you can access your personal secret key from your user profile.

Tree Schema expects for your secret key to be included in all API requests to the server in a header Authorization that looks like the following:

Authorization: Basic your_encoded_secret

You can view detailed instructions on how to generate your API keys in our help and documentation.

Pagination

An example of a meta response object with a next page

{
  "meta": {
    "current_page": 2,
    "next_page": 3,
    "total_cnt": 123
  },
  ...
}

An example of a meta response object without a next page

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 5
  },
  ...
}

When retrieving a list of objects with a [GET] request, results are being paginated by Tree Schema.

All paginated responses return 1000 results per request.

Meta Response Object

Field Data Type Description
current_page integer The number for the current page
next_page integer The number for the next page, if there is a next page, this will be null if there is not a next page
total_cnt integer The total count of objects returned for the given API

Meta information is returned for all queries that contain pagination. The meta object will respond with the page number for the next page and the total

Additional Headers

HTTP headers:

{ "Content-Type": "application/json" }

Every POST, PUT and DELETE HTTP request sent to the Tree Schema Public API must specify the Content-Type entity header to application/json.

Data Stores

Data stores are containers for your data, they can be databases, file stores, dashboard tools and more. They are where your data physically (or virtually) resides. You can create and retrieve data stores.

Data Store Object

The data store object

{
  "data_store_id": 18,
  "name": "Kafka Prod Cluster",
  "type": "kafka",
  "other_type": null,
  "created_ts": "2020-09-23 18:16:16",
  "updated_ts": "2020-09-23 18:16:16",
  "description_markup": "<p>This is the Kafka cluster.</p>",
  "description_raw": "This is the Kafka cluster.",
  "steward": {
    "user_id": 1,
    "name": "Grant",
    "email": "grant@treeschema.com"
  },
  "tech_poc": {
    "user_id": 2,
    "name": "Asher",
    "email": "asher@treeschema.com"
  },
  "details": {
    "bootstrap_servers": "1.3.5.7:22"
  }
}

The Data Store object is returned when you GET a single or multiple data store(s). It is also returned when you create a data store. An example of the data store object can be seen to the right.

Data Store Object Fields

Field Data Type Description
data_store_id integer The ID used to uniquely represent the data store, the same ID can be found in the Tree Schema GUI, the URL for the data store will contain the data store ID
name string The name of the data store
type string The type of the data store
other_type string The more detailed type, if provided
created_ts timestamp The timestamp that the data store was created
updated_ts timestamp The timestamp that the data store was updated
description_markup string An HTML string that represents the full markup description
description_raw string The data store description that has had all markup removed
steward User Object] The data steward assigned to the data store
tech_poc User Object] The technical point of contact assigned to the data store
details object An object that can contain any arbitrary key/value pairs for the data store. Details will include information such as host and port, if the data store is connected to a data base, but users can also add arbitary key/value pairs of information and they will be returned as well.

Get All Data Stores

import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores'

headers = {'Authorization': 'Basic your_encoded_secret'}
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores"

Retrieve all data stores in your organization.

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 5
  },
  "data_stores": [
    {
      "data_store_id": 18,
      "name": "Kafka Prod Cluster",
      "type": "kafka",
      "other_type": null,
      "created_ts": "2020-09-23 18:16:16",
      "updated_ts": "2020-09-23 18:16:16",
      "description_markup": "<p>This is the Kafka cluster.</p>",
      "description_raw": "This is the Kafka cluster.",
      "steward": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      },
      "tech_poc": {
        "user_id": 2,
        "name": "Asher",
        "email": "asher@treeschema.com"
      },
      "details": {
        "bootstrap_servers": "1.3.5.7:22"
      }
    }
  ]
}

HTTPs Request

GET /data-stores

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores
name null The name of the data store

Path Parameters

There are no path parameters for this endpoint.

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
data_stores list[Data Store Object] A list of data store objects

Response Codes

Value Description
200 Retrieved all data stores

Get A Data Store

import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1'

headers = {'Authorization': 'Basic your_encoded_secret'}
resp = r.get(url, headers=headers)
resp.json()

BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores/1"

Retrieve a specific data stores from your organization.

Returns the object:

{
  "data_store": {
    "data_store_id": 1,
    "name": "Oracle DB",
    "type": "oracle",
    "other_type": "",
    "created_ts": "2020-08-15 17:15:24",
    "updated_ts": "2020-08-15 17:15:24",
    "description_markup": null,
    "description_raw": null,
    "steward": {
      "user_id": 2,
      "name": "Asher",
      "email": "asher@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "details": {
      "host": "oracle.host",
      "port": 1521,
      "servicename": "dbschema"
    }
  }
}

HTTPs Request

GET /data-stores/{data_store_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store to retrieve.

Body

There is no body for this endpoint.

Response

Field Data Type Description
data_store Data Store Object A data store object

Response Codes

Value Description
200 Successfully retrieved data store
404 The data store ID requested could not be found

Create A Data Store

To create the data store

import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores'

new_ds_data = {
    'name': "My API Data Store",
    'type': 'postgres',
    'tech_poc': 2,
    'description': 'This data store was created via an API'
}
resp = r.post(url, json=new_ds_data, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "My API Data Store - From Shell", "type": "other", "other_type": "some other value", "tech_poc": 2, "description": "This data store was created via an API"}' \
$BASE_URL/data-stores

Create a new data store. If the name of the data store you are trying to create already exists then the existing data store will be returned.

Returns the object:

{
  "data_store": {
    "data_store_id": 20,
    "name": "My API Data Store",
    "type": "postgres",
    "other_type": null,
    "created_ts": "2020-09-29 14:51:14",
    "updated_ts": "2020-09-29 14:51:14",
    "description_markup": "<p>This data store was created via an API</p>",
    "description_raw": "This data store was created via an API",
    "steward": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 2,
      "name": "Asher",
      "email": "asher@treeschema.com"
    },
    "details": {}
  }
}

HTTPs Request

POST /data-stores

Query Parameters

There are no query parameters for this endpoint

Path Parameters

There are no path parameters for this endpoint.

Body

Field Required Description
name Yes The name of the data store
type Yes The type of data store, must be one of: dynamodb, kafka, mongodb, mysql, oracle, other, postgres, redis, redshift or s3
other_type No A more descriptive type of data store that can augment the field type if the value other is chosen
description No The description to give the data store
tech_poc No The ID for the user to assign as the technical point of contact for this data store, if no value is provided the user executing The API will be used
steward No The ID for the user to assign as the steward for this data store, if no value is provided the user executing the API will be used

Response

Field Data Type Description
data_store Data Store Object A data store object

Response Codes

Value Description
200 A data store with the same name already exists
201 Data Store Created
400 A malformed request was made, descriptions of the error will be provided in the body

Get Tags for a Data Store

To get existing tags from a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/tags'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/data-stores/19/tags"

Get exsiting tags

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ]
}

HTTPs Request

GET /data-stores/{data_store_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store to get the tag(s) for

Body

There is no body object for this endpoint

Response

Field Data Type Description
tags List[string] The list of tags for the data store

Response Codes

Value Description
200 The list of tags was retrieved successfully
400 A malformed request was made, descriptions of the error will be provided in the body

Tag A Data Store

To tag a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/tags'

tags = {'tags': ['api tag', 'schema tag', 'pii', 'mktg']}

resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "schema tag", "pii", "mktg"]}' \
"$BASE_URL/data-stores/19/tags"

Add a tag to a data store.

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ],
  "tag_statuses": [
    "added",
    "added",
    "added",
    "added"
  ]
}

HTTPs Request

POST /data-stores/{data_store_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store to add the tag(s) to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
tags List[string] The list of tags that were processed
tag_statuses List[string] The status for each tag processed, statuses match the same index position as their corresponding tag. Values include added and exists.

Response Codes

Value Description
200 All of the tags requested already existed for the data store
201 At least one of the tags requested was added
400 A malformed request was made, descriptions of the error will be provided in the body

Remove Tags from a Data Store

To remove one or more tags from a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/tags'
tags = {'tags': ['api tag', 'mktg']}

resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
$BASE_URL/data-stores/1/tags

Remove one or more tags from a data store.

Returns the object:

{
  "removed_tags": [
    "api tag",
    "mktg"
  ]
}

HTTPs Request

DELETE /data-stores/{data_store_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema to add tags to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
removed_tags List[string] The list of tags that were removed

Response Codes

Value Description
200 The tags were removed successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store could not be found, descriptions of the error will be provided in the body

Upload a dbt Manifest File

To upload a manifest file

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/46/dbt/parse-manifest'

file_loc = "./sample_files/manifest.json"
files = {'manifest_file': open(file_loc,'rb')}

# Note the additional 'Accept' parameter for the file!
headers = {
    'Accept': 'application/octet-stream',
    'Authorization': 'Basic ' + encoded_creds
}

resp = r.post(url, files=files, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Accept: application/octet-stream" \
-H "Content-Type: multipart/form-data" \
-F 'manifest_file=@./target/manifest.json' \
"$BASE_URL/data-stores/46/dbt/parse-manifest"

You may upload a dbt manifest file to a data store in order to allow Tree Schema to automatically extract the schemas, fields, descriptions, tags and lineage from your dbt output. This is the first of 2 required steps in order to save the results in Tree Schema. This step parses the dbt file, the second step saves the results. Optionally, you can view the parsed results before saving.

Learn more about the dbt processing in the Tree Schema help documentation.

Returns the object:

{
  "dbt_process_id": "b4000641-eed7-4345-9ba0-7701f77ce568"
}

HTTPs Request

POST /data-stores/{data_store_id}/dbt/parse-manifest

Headers

This endpoint requires different header parameters than the other APIs since it uploads a file.

Header Description
Authorization The standard authorization header, as defined above
Accept Set to application/octet-stream
Content-Type Set to multipart/form-data

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that the dbt manifest belongs to. dbt processes run within the context of a database and this should correspond to the same database that you have already defined in Tree Schema.

Body

Field Required Description
manifest_file Yes A file object of the dbt manifest.json

Response

Field Data Type Description
dbt_process_id string A unique ID for the process that will parse the dbt file. This will be used in subsequent calls to retrieve the status and to persist the results from the parsed file.

Response Codes

Value Description
201 The request created a new process to parse the file
400 A malformed request was made, descriptions of the error will be provided in the body

Data Schemas

Schemas are the heart and soul of a Data Catalog. They describe the shape, structure and format of the data. You may typically have data schemas represented as a table, a JSON or Parquet file, or a CSV but a Data Schema is really just a reference to a structured set of fields.

All schemas reside within a data store, therefore, in order to interact with a data schema you must know the data store that it belongs to.

Data Schema Object

The data schema object

{
  "data_schema_id": 16,
  "name": "My API Schema",
  "type": "table",
  "schema_loc": null,
  "created_ts": "2020-09-23 14:56:02",
  "updated_ts": "2020-09-23 14:56:02",
  "description_markup": null,
  "description_raw": null,
  "steward": {
    "user_id": 1,
    "name": "Grant",
    "email": "grant@treeschema.com"
  },
  "tech_poc": {
    "user_id": 1,
    "name": "Grant",
    "email": "grant@treeschema.com"
  }
}

The Data Schema object is returned when you GET a single or multiple data schema(s) from a data store. It is also returned when you create a new data schema. An example of the data schema object can be seen to the right.

Data Schema Object Fields

Field Data Type Description
data_schema_id integer The ID used to uniquely represent the data schema, the same ID can be found in the Tree Schema GUI, the URL for the data schema will contain the data schema ID
name string The name of the data schema
type string The type of the data schema
schema_loc string The location where the schema resides, this is used primarily for object data stores, such as s3. The schema location would represent the path to the directory where the schmema exists. For most schemas, the schema_loc will be the same as the name.
created_ts timestamp The timestamp that the data store was created
updated_ts timestamp The timestamp that the data store was updated
description_markup string An HTML string that represents the full markup description
description_raw string The data store description that has had all markup removed
steward User Object] The data steward assigned to the data store
tech_poc User Object] The technical point of contact assigned to the data store

Get All Schemas from Data Store

To get all schemas for a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores/1/schemas"

List all schemas for a data store.

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 4
  },
  "data_schemas": [
    {
      "data_schema_id": 16,
      "name": "public.session_info",
      "type": "table",
      "schema_loc": "public.session_info",
      "created_ts": "2020-09-23 14:56:02",
      "updated_ts": "2020-09-23 14:56:02",
      "description_markup": null,
      "description_raw": null,
      "steward": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      }
    },
    {
      "data_schema_id": 7,
      "name": "public.device_info",
      "type": "table",
      "schema_loc": "public.device_info",
      "created_ts": "2020-08-15 22:10:17",
      "updated_ts": "2020-08-15 22:10:17",
      "description_markup": null,
      "description_raw": null,
      "steward": {
        "user_id": 2,
        "name": "Asher",
        "email": "asher@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      }
    }
  ]
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores
name null The name of the data schema

Path Parameters

Parameter Description
data_store_id The ID for the data store that you are listing schemas for

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
data_schemas list[Data Schema Object] A list of data schema objects

Response Codes

Response Codes

Value Description
200 Retrieved all data schemas for the data store
404 The data store ID requested could not be found

Get a Schema

To get a single schemas from a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores/1/schemas/1"

Get a single schema from a data store.

Returns the object:

{
  "data_schema": {
    "data_schema_id": 1,
    "name": "public.session_info",
    "type": "table",
    "schema_loc": "public.session_info",
    "created_ts": "2020-08-15 17:16:10",
    "updated_ts": "2020-08-15 17:16:10",
    "description_markup": null,
    "description_raw": null,
    "steward": {
      "user_id": 1,
      "name": "Asher",
      "email": "asher@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  }
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas/{data_schema_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that you are listing schemas for
data_schema_id The ID for the data schema that exists within the data store

Body

There is no body for this endpoint.

Response

Field Data Type Description
data_schema Data Schema Object A data store object

Response Codes

Value Description
200 Retrieved the data schema from the data store
404 The data store ID requested could not be found or the schema requested does not exist within the data store

Create a Schema

To create a schema in a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas'
new_schema = {
    'name': "My API Schema",
    'type': 'table',
    'description': 'This schema is created via API'
}

resp = r.post(url, json=new_schema, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "My API Schema - Shell", "type": "table", "description": "This schema is created via API"}' \
$BASE_URL/data-stores/1/schemas

Create a data schema. Since a schema must reside within a data store the data store that you want to contain the schema must be specified in the URL path. If a schema with the same name (case insensitive) already exists within the data store then the existing schema is returned and no updates are made.

Returns the object:

{
  "data_schema": {
    "data_schema_id": 501,
    "name": "My New API Schema",
    "type": "table",
    "schema_loc": "My New API Schema",
    "created_ts": "2020-09-29 16:07:16",
    "updated_ts": "2020-09-29 16:07:16",
    "description_markup": "<p>This schema is created via API</p>",
    "description_raw": "This schema is created via API",
    "steward": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  }
}

HTTPs Request

POST /data-stores/{data_store_id}/schemas

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that will contain the schema being created

Body

Field Required Description
name Yes The name of the data schema
type Yes The type of data schema, must be one of: avro, csv, csv_other, json, parquet, other, table, view or tsv
description No The description to give the schema
schema_loc No The location where the schema resides, this is used primarily for object data stores, such as s3. The schema location would represent the path to the directory where the schmema exists. For most schemas, the schema_loc will be the same as the name. If a schema_loc is not provided then the value will be set as value provided for the name
tech_poc No The ID for the user to assign as the technical point of contact for this data schema, if no value is provided the user executing The API will be used
steward No The ID for the user to assign as the steward for this data schema, if no value is provided the user executing the API will be used

Response

Field Data Type Description
data_schema Data Schema Object A data schema object

Response Codes

Value Description
409 A data schema with the same name already exists and was returned instead of creating a new object
201 Data Schema Created
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found

Update A Schema

To update a single schema in a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1'
updates = {
    'description': "A new description",
    'type': 'parquet',
    'tech_poc': '1',
    'steward': 2
}

resp = r.post(url, json=updates, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"description": "New Schema description", "type": "view", "tech_poc": "1", "steward": 2}' \
$BASE_URL/data-stores/1/schemas/1

Update a single scehma in a data store. You can update the description, schema type, tech POC and steward.

Returns the updated object:

{
  "data_schema": {
    "data_schema_id": 1,
    "name": "DS1",
    "type": "view",
    "schema_loc": "DS1",
    "created_ts": "2021-01-29 14:39:10",
    "updated_ts": "2021-02-01 12:37:43",
    "description_markup": "<p>New Schema description</p>",
    "description_raw": "New Schema description",
    "data_store_id": 1,
    "steward": {
      "user_id": 2,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Asher",
      "email": "asher@treeschema.com"
    }
  }
}

HTTPs Request

POST /data-stores/{data_store_id}/schemas/{data_schema_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested

Body

One or more of the following fields must be provided.

Field Required Description
type No The type of the field, valid values are avro, csv, csv_other, json, parquet, other, table, view or tsv
description No The new description for the field, this will override any existing description
tech_poc No The ID for the user to assign as the technical point of contact for this data field, if no value is provided the user executing The API will be used
steward No The ID for the user to assign as the steward for this data field, if no value is provided the user executing the API will be used

Response

Field Data Type Description
data_schema Data Schema Object A data schema object

Response Codes

Value Description
200 The data schema was updated successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found or the data schema does not exist within the data store

Delete Schemas

To delete multiple schemas from a data store

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas'
delete_schemas = {'schema_ids': [501, 502]}

resp = r.delete(url, json=delete_schemas, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"schema_ids": [8, 9]}' \
$BASE_URL/data-stores/1/schemas

There is no response in the body for this request

Deprecates data schemas that exist within a data store. In order to delete the schemas, the schema IDs must exist within the data store specified in the path parameters. If multiple schema IDs are provided and some exist within the data store but others do not exist within the data store then only those that exist within the data store will be deleted.

HTTPs Request

DELETE /data-stores/{data_store_id}/schemas

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that will contain the schema being deleted

Body

Field Required Description
schema_ids list[integer] A list of IDs that corresponds to the schemas to be deleted.

Response

There is no response body for this endpoint.

Response Codes

Value Description
200 The schemas provided were deleted from the data store
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found

Get Tags for a Data Schema

To get existing tags from a data schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/schemas/20/tags'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/data-stores/19/schemas/19/tags"

Get exsiting tags

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ]
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas/{data_schema_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema to add tags to
data_schema_id The ID for the data schema to have the tags added to

Body

There is no body object for this endpoint

Response

Field Data Type Description
tags List[string] The list of tags for the data store

Response Codes

Value Description
200 The list of tags was retrieved successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store or schema could not be found, descriptions of the error will be provided in the body

Tag A Schema

To tag a data schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/tags'
tags = {'tags': ['api tag', 'schema tag', 'pii', 'mktg2']}

resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "schema tag", "pii", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/1/tags

Add one or more tags to a data schema.

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ],
  "tag_statuses": [
    "added",
    "added",
    "added",
    "added"
  ]
}

HTTPs Request

POST /data-stores/{data_store_id}/schemas/{data_schema_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema to add tags to
data_schema_id The ID for the data schema to have the tags added to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
tags List[string] The list of tags that were processed
tag_statuses List[string] The status for each tag processed, statuses match the same index position as their corresponding tag

Response Codes

Value Description
200 All of the tags requested already existed for the data store
201 At least one of the tags requested was added
400 A malformed request was made, descriptions of the error will be provided in the body

Remove Tags from a Schema

To remove one or more tags from a schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/2/tags'
tags = {'tags': ['api tag', 'mktg']}

resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/2/tags

Remove one or more tags from a schema.

Returns the object:

{
  "removed_tags": [
    "api tag",
    "mktg"
  ]
}

HTTPs Request

DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema to add tags to
data_schema_id The ID for the data schema to have the tags added to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
removed_tags List[string] The list of tags that were removed

Response Codes

Value Description
200 The tags were removed successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store or schema could not be found, descriptions of the error will be provided in the body

Data Fields

Data Fields are the most granular part of your catalog that describes the format and data type of our underlying data. Whether your Fields are represented as columns in a table, keys in JSON file, or Structs in a distributed Parquet data set you can capture their meaning and definition with Tree Schema Fields.

All fields reside within a data schema, therefore, in order to interact with a data fields you must know the data schema and data store that it belongs to.

Data Fields Object

The data fields object

{
  "field_id": 1,
  "name": "FIRST_NAME",
  "parent_path": null,
  "full_path_name": "FIRST_NAME",
  "type": "scalar",
  "data_type": "string",
  "data_format": "VARCHAR2",
  "nullable": false,
  "created_ts": "2020-08-15 17:16:11",
  "updated_ts": "2020-08-15 17:16:11",
  "description_markup": null,
  "description_raw": null,
  "steward": {
    "user_id": 1,
    "name": "Grant",
    "email": "grant@treeschema.com"
  },
  "tech_poc": {
    "user_id": 2,
    "name": "Asher",
    "email": "asher@treeschema.com"
  }
}

The Data Fields object is returned when you GET a single or multiple data field(s) from a data schema. It is also returned when you create a new data field. An example of the data field object can be seen to the right.

Data Field Object Fields

Field Data Type Description
field_id integer The ID used to uniquely represent the data schema, the same ID can be found in the Tree Schema GUI, the URL for the data schema will contain the data schema ID
name string The name of the field, for example, this would be the column name if the field is from a table or CSV or it could be a struct name if the field is from a Parquet file
parent_path string The dot-notation path for the parent to this field, this is only provided for fields that are contained within other fields, e.g. {"parent_field": {"child_field": 1}} would be parent_field.child_field
full_path_name string This is a concatenation of the parent path and the name. If the parent path is null then this value is the same as the name
type string Valid values include scalar, object and list
data_type string A JSON compatible data type, values include array, boolean, bytes, null, number, object and string
data_format string A free-form field that describes the format of the data, this could be varchar(32), YYYY-MM-DD, float(16), etc.
nullable boolean Whether or not the field can be null
created_ts timestamp The timestamp that the field was created in Tree Schema
updated_ts timestamp The timestamp that the field was updated in Tree Schema
description_markup string An HTML string that represents the full markup description
description_raw string The field description that has had all markup removed
steward User Object] The data steward assigned to the field
tech_poc User Object] The technical point of contact assigned to the field

Get All Fields from Schema

To get all fields for a data schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields

List all fields for a data schema.

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 3
  },
  "data_fields": [
    {
      "field_id": 1,
      "name": "FIRST_NAME",
      "parent_path": null,
      "full_path_name": "FIRST_NAME",
      "type": "scalar",
      "data_type": "string",
      "data_format": "VARCHAR2",
      "nullable": false,
      "created_ts": "2020-08-15 17:16:11",
      "updated_ts": "2020-08-15 17:16:11",
      "description_markup": null,
      "description_raw": null,
      "steward": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      }
    },
    {
      "field_id": 2,
      "name": "LAST_NAME",
      "parent_path": null,
      "full_path_name": "LAST_NAME",
      "type": "scalar",
      "data_type": "string",
      "data_format": "VARCHAR2",
      "nullable": false,
      "created_ts": "2020-08-15 17:16:11",
      "updated_ts": "2020-08-15 17:16:11",
      "description_markup": null,
      "description_raw": null,
      "steward": {
        "user_id": 2,
        "name": "Asher",
        "email": "asher@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      }
    }
  ]
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores
name name The name of the field

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
data_fields list[Data Field Object] A list of data field objects

Response Codes

Value Description
200 Retrieved all data fields for the data schema
404 The data store ID requested could not be found or the data schema does not exist within the data store

Get A Field

To get a single field from a data schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1

Get a single field from a schema.

Returns the object:

{ 
  "data_field": {
    "field_id": 1,
    "name": "FIRST_NAME",
    "parent_path": null,
    "full_path_name": "FIRST_NAME",
    "type": "scalar",
    "data_type": "string",
    "data_format": "VARCHAR2",
    "nullable": false,
    "created_ts": "2020-08-15 17:16:11",
    "updated_ts": "2020-08-15 17:16:11",
    "description_markup": null,
    "description_raw": null,
    "steward": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  }
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested
data_field_id The ID for the data field being requested

Body

There is no body for this endpoint.

Response

Field Data Type Description
data_field Data Field Object A data field object

Response Codes

Value Description
200 Retrieved the data field for the schema
404 The data store ID requested could not be found or the data schema does not exist within the data store or the data field does not exist within the schema

Create A Field

To create a single field in a data schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields'
new_field = {
    'name': "my_field.sub_field",
    'type': 'scalar',
    'data_type': 'number',
    'data_format': 'integer(16)'
}

resp = r.put(url, json=new_field, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X PUT -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "my_field.sub_field.from_shell", "type": "scalar", "data_type": "number", "data_format": "integer(16)"}' \
$BASE_URL/data-stores/1/schemas/1/fields

Create a single field in a schema. If a field with the same name (case insensitive) already exists within the schema then the existing field is returned and no updates are made.

Returns the object:

{
  "data_field": {
    "field_id": 5453,
    "name": "sub_field",
    "parent_path": "my_field",
    "full_path_name": "my_field.sub_field",
    "type": "scalar",
    "data_type": "number",
    "data_format": "integer(16)",
    "nullable": true,
    "created_ts": "2020-09-29 18:07:09",
    "updated_ts": "2020-09-29 18:07:09",
    "description_markup": null,
    "description_raw": null,
    "steward": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  }
}

HTTPs Request

PUT /data-stores/{data_store_id}/schemas/{data_schema_id}/fields

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested

Body

Field Required Description
name Yes The name of the field
type Yes The type of the field, valid values are scalar, list and object
data_type Yes The data type for the field, this is a representation of the field as a JSON compatible data type, must be one of array, boolean, bytes, null, number, object or string
data_format Yes A free-form field that describes the format of the data, this could be varchar(32), YYYY-MM-DD, float(16), etc.
nullable No Whether or not the field can be null, defaults to True
tech_poc No The ID for the user to assign as the technical point of contact for this data field, if no value is provided the user executing The API will be used
steward No The ID for the user to assign as the steward for this data field, if no value is provided the user executing the API will be used

Response

Field Data Type Description
data_field Data Field Object A data field object

Response Codes

Value Description
200 A data field with the same name already exists in the schema and was returned
201 The data field was created
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found or the data schema does not exist within the data store

Update A Field

To update a single field in a data schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1'
updates = {
    'description': "A new description",
    'type': 'list',
    'data_type': 'array',
    'data_format': 'YYYY-MM-DD',
    'nullable': False,
    'tech_poc': '1',
    'steward': 2
}

resp = r.post(url, json=updates, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"description": "A new description", "type": "list", "data_type": "array", "data_format": "YYYY-MM-DD", "nullable": false, "tech_poc": "1", "steward": 2}' \
$BASE_URL/data-stores/1/schemas/1/fields/1

Update a single field in a schema. You can update any value for a field except for the name.

Returns the updated object:

{
  "data_field": {
    "field_id": 5453,
    "name": "sub_field",
    "parent_path": "my_field",
    "full_path_name": "my_field.sub_field",
    "type": "scalar",
    "data_type": "number",
    "data_format": "integer(16)",
    "nullable": true,
    "created_ts": "2020-09-29 18:07:09",
    "updated_ts": "2020-09-29 18:07:09",
    "description_markup": null,
    "description_raw": null,
    "steward": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  }
}

HTTPs Request

POST /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested
data_field_id The ID for the data field being requested

Body

Field Required Description
type No The type of the field, valid values are scalar, list and object
data_type No The data type for the field, this is a representation of the field as a JSON compatible data type, must be one of array, boolean, bytes, null, number, object or string
data_format No A free-form field that describes the format of the data, this could be varchar(32), YYYY-MM-DD, float(16), etc.
description No The new description for the field, this will override any existing description
nullable No Whether or not the field can be null, defaults to True
tech_poc No The ID for the user to assign as the technical point of contact for this data field, if no value is provided the user executing The API will be used
steward No The ID for the user to assign as the steward for this data field, if no value is provided the user executing the API will be used

Response

Field Data Type Description
data_field Data Field Object A data field object

Response Codes

Value Description
200 The data field was updated successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found or the data schema does not exist within the data store or the data field does not exist within the schema

Delete Multiple Fields

To delete fields from a schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields'
delete_fields = {'field_ids': [5452, 5454]}

resp = r.delete(url, json=delete_fields, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"field_ids": [5453]}' \
$BASE_URL/data-stores/1/schemas/504/fields

There is no response in the body for this request

Deprecates data fields that exist within a data schema. In order to deprecate the fields, the field IDs must exist within the data schema specified in the path parameters. If multiple field IDs are provided and some exist within the data schema but others do not exist within the data schema then only those that exist within the data store will be deleted.

HTTPs Request

DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/fields

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that will contain the field(s) being deleted
data_schema_id The ID for the data schema that will contain the field(s) being deleted

Body

Field Required Description
field_ids list[integer] A list of IDs that corresponds to the fields to be deleted.

Response

There is no response body for this endpoint.

Response Codes

Value Description
200 The fields provided were deleted from the data store
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found or the data schema ID does not exist within the data store ID provided

Delete A Field

To delete a field from a schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1'

resp = r.delete(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1

There is no response in the body for this request

Deprecates data field in the path provided.

HTTPs Request

DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that will contain the field being deleted
data_schema_id The ID for the data schema that will contain the field being deleted
data_field_id The ID for the data field to be deleted

Body

There is no body for this endpoint.

Response

There is no response body for this endpoint.

Response Codes

Value Description
200 The field was deleted from the data store
404 The data store ID requested could not be found or the data schema ID does not exist within the data store ID provided or the data field does not exist within the schema

Get Tags for a Field

To get existing tags for a field

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/schemas/20/fields/20/tags'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/data-stores/19/schemas/19/fields/19/tags"

Get exsiting tags

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ]
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema to add tags to
data_schema_id The ID for the data schema to have the tags added to
data_field_id The ID for the field to add the tags to

Body

There is no body object for this endpoint

Response

Field Data Type Description
tags List[string] The list of tags for the data store

Response Codes

Value Description
200 The list of tags was retrieved successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store, schema or field could not be found, descriptions of the error will be provided in the body

Tag A Field

To tag a field

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/2/fields/4/tags'
tags = {'tags': ['api tag', 'schema tag', 'pii', 'mktg2']}

resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "schema tag", "pii", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/2/fields/5/tags

Add a tag to a data field.

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ],
  "tag_statuses": [
    "added",
    "added",
    "added",
    "added"
  ]
}

HTTPs Request

POST /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema to add tags to
data_schema_id The ID for the data schema to have the tags added to
data_field_id The ID for the field to add the tags to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
tags List[string] The list of tags that were processed
tag_statuses List[string] The status for each tag processed, statuses match the same index position as their corresponding tag

Response Codes

Value Description
200 All of the tags requested already existed for the field
201 At least one of the tags requested was added
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store, schema or field could not be found, descriptions of the error will be provided in the body

Remove Tags from a Field

To remove one or more tags from a field

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/2/fields/4/tags'
tags = {'tags': ['api tag', 'mktg']}

resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/2/fields/5/tags

Remove a tag from a field.

Returns the object:

{
  "removed_tags": [
    "api tag",
    "mktg"
  ]
}

HTTPs Request

DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema to add tags to
data_schema_id The ID for the data schema to have the tags added to
data_field_id The ID for the field to add the tags to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
removed_tags List[string] The list of tags that were removed

Response Codes

Value Description
200 The tags were removed successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store, schema or field could not be found, descriptions of the error will be provided in the body

Field Values

Field values are just that - values for a field. For example, if your field is status_code you may have the values 01, 02, 03, etc. and each of these values has a specific meaning. Field values allow you to capture both the value and the meaning of the value.

All field values reside within a data field, therefore, in order to interact with a field value you must know the data field, data schema, and data store that it belongs to.

Field Value Object

The field value object

{
  "field_value_id": 396,
  "field_value": "01",
  "description_markup": "<p>New customer</p>",
  "description_raw": "New customer",
  "created_ts": "2020-08-15 22:10:18",
  "updated_ts": "2020-08-15 22:10:18"
}

The Field Value object is returned when you GET a single or multiple field value(s) from a data field. It is also returned when you create a new field value. An example of the field value object can be seen to the right.

Field Value Object Fields

Field Data Type Description
field_value_id integer The ID used to uniquely represent the field value
field_value string The value
description_markup string An HTML string that represents the full markup description, this can be null if no description has been provided
description_raw string The field description that has had all markup removed, this can be null if no description has been provided
created_ts timestamp The timestamp that the field was

created in Tree Schema updated_ts | timestamp | The timestamp that the field was updated in Tree Schema

Get All Values for Field

To get all values for a data field

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1/values'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1/values

List all values for a data field.

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 4
  },
  "field_values": [
    {
      "field_value_id": 1,
      "field_value": "01",
      "description_markup": "<p>New customer</p>",
      "description_raw": "New customer",
      "created_ts": "2020-08-15 22:10:18",
      "updated_ts": "2020-08-15 22:10:18"
    },
    {
      "field_value_id": 2,
      "field_value": "02",
      "created_ts": "2020-08-15 22:10:18",
      "updated_ts": "2020-08-15 22:10:18",
      "description_markup": null,
      "description_raw": null
    }
  ]
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores
value null The value of a sample value to retrieve

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested
data_field_id The ID for the data field that the values belong to

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
field_values list[Field Value Object] A list of field value objects

Response Codes

Value Description
200 Retrieved all field values for the field
404 The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema

Get A Sample Value

To get a single valuee for a data field

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1/values/1'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1/values/1

List all values for a data field.

Returns the object:

{
  "field_value": {
    "field_value_id": 1,
    "field_value": "01",
    "description_markup": "<p>New customer</p>",
    "description_raw": "New customer",
    "created_ts": "2020-08-15 22:10:18",
    "updated_ts": "2020-08-15 22:10:18"
  }
}

HTTPs Request

GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values/{field_value_id}

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested
data_field_id The ID for the data field that the values belong to
field_value_id The ID for the specific field value to retrieve

Body

There is no body for this endpoint.

Response

Field Data Type Description
field_value Field Value Object A field value object

Response Codes

Value Description
200 Retrieved all field values for the field
404 The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema

Create A Field Value

To create a value for a data field

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/7/fields/78/values'
new_field_value = {
  'field_value': 'a new value here', 
  'description': 'and a new description'
}

resp = r.put(url, json=new_field_value, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X PUT -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"field_value": "a second value here", "description": "and a new description"}' \
$BASE_URL/data-stores/1/schemas/7/fields/78/values

Create a new value for a field.

Returns the object:

{
  "field_value": {
    "field_value_id": 16323,
    "field_value": "a new value here",
    "created_ts": "2020-09-29 20:51:18",
    "updated_ts": "2020-09-29 20:51:18",
    "description_markup": "<p>and a new description</p>",
    "description_raw": "and a new description"
  }
}

HTTPs Request

PUT /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested
data_field_id The ID for the data field that the values belong to

Body

Field Required Description
field_value Yes The sample value for the field
description No The description for the sample value, an omitted description will be created as null

Response

Field Data Type Description
field_value Field Value Object A field value object

Response Codes

Value Description
201 Created the field value
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema
409 The field value already exists for the field provided

Update a Field Value

To update a value for a data field

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/7/fields/78/values/16324'
new_desc = {'description': 'new description goes here'}

resp = r.post(url, json=new_desc, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"field_value": "a second value here", "description": "and a new description"}' \
$BASE_URL/data-stores/1/schemas/7/fields/78/values/16324

Update a value for a field.

Returns the object:

{
  "field_value": {
    "field_value_id": 16323,
    "field_value": "a new value here",
    "created_ts": "2020-09-29 20:51:18",
    "updated_ts": "2020-09-29 20:51:18",
    "description_markup": "<p>and a new description</p>",
    "description_raw": "and a new description"
  }
}

HTTPs Request

POST /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values/{field_value_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
data_store_id The ID for the data store that contains the schema in the path
data_schema_id The ID for the data schema that contains the fields being requested
data_field_id The ID for the data field that the values belong to
field_value_id The ID of the field value to update

Body

Field Required Description
field_value No The sample value for the field, if omitted the existing field value will remain in place
description No The description for the sample value, if omitted the existing description will remain in place

Response

Field Data Type Description
field_value Field Value Object A field value object

Response Codes

Value Description
200 Retrieved all field values for the field
400 A malformed request was made, descriptions of the error will be provided in the body
404 The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema

Transformations

Creating Transformations in Tree Schema is a critical part of unlocking the true value in your data as it allows you to see how data moves from system to system, identify dependencies in your data flow and to create your data lineage. Transformations describe data movement from field to field between schemas.

Transformation Object

The transformation object

{
  "transformation_id": 25,
  "name": "my api transform #2",
  "type": "some",
  "created_ts": "2020-09-22 17:20:38",
  "updated_ts": "2020-09-22 17:25:34",
  "description_markup": "<p>desc</p>",
  "description_raw": "desc",
  "steward": {
    "user_id": 1,
    "name": "Grant",
    "email": "grant@treeschema.com"
  },
  "tech_poc": {
    "user_id": 1,
    "name": "Grant",
    "email": "grant@treeschema.com"
  }
}

The transformation object by itself is a shell, it is only used to hold transformations links. Once a transformation object is created add transformation links to it in order to build your data lineage!

Transformation Object Fields

Field Data Type Description
transformation_id integer The ID used to uniquely represent the transformation, the same ID can be found in the Tree Schema GUI, the URL for the transformation will contain the transformation ID
name string The name of the transformation
type string The type of the transformation, valid values are batch_process_scheduled, batch_process_triggered, other, pub_sub_event and sql_trigger
created_ts timestamp The timestamp that the transformation was created
updated_ts timestamp The timestamp that the transformation was updated
description_markup string An HTML string that represents the full markup description
description_raw string The transformation description that has had all markup removed
steward User Object] The data steward assigned to the transformation
tech_poc User Object] The technical point of contact assigned to the transformation

Get All Transformations

To get all transformations

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations

List all transformations.

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 2
  },
  "transformations": [
    {
      "transformation_id": 25,
      "name": "My Tansform",
      "type": "batch_process_triggered",
      "created_ts": "2020-09-22 17:20:38",
      "updated_ts": "2020-09-22 17:25:34",
      "description_markup": "<p>desc</p>",
      "description_raw": "desc",
      "steward": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      }
    },
    {
      "transformation_id": 28,
      "name": "My Second Transformation",
      "type": "other",
      "created_ts": "2020-09-22 18:06:17",
      "updated_ts": "2020-09-22 18:06:56",
      "description_markup": null,
      "description_raw": null,
      "steward": {
        "user_id": 2,
        "name": "Asher",
        "email": "asher@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Grant",
        "email": "grant@treeschema.com"
      }
    }
  ]
}

HTTPs Request

GET /transformations

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores
name null The name of the transformation to return

Path Parameters

There are no path parameters for this endpoint.

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
transformations list[Transformation Object] A list of transformation objects

Response Codes

Value Description
200 Retrieved all transformations

Get A Transformation

Get a single transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/25'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/25

Get a single transformation

Returns the object:

{
  "transformation": {
    "transformation_id": 25,
    "name": "My Tansform",
    "type": "batch_process_triggered",
    "created_ts": "2020-09-22 17:20:38",
    "updated_ts": "2020-09-22 17:25:34",
    "description_markup": "<p>desc</p>",
    "description_raw": "desc",
    "steward": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  }
}

HTTPs Request

GET /transformations/{transformation_id}

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores

Path Parameters

Parameter Description
transformation_id The ID for the transformation being retrieved

Body

There is no body for this endpoint.

Response

Field Data Type Description
transformation Transformation Object A transformation object

Response Codes

Value Description
200 Retrieved the transformations
404 The transformaiton requested does not exist

Create A Transformation

Create a new transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations'
new_transform = {
    'name': 'My API Transformation!',
    'type': 'other'
}

resp = r.put(url, json=new_transform, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X PUT -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "My API Transformation2", "type": "other"}' \
$BASE_URL/transformations

Create a transformation

Returns the object:

{
  "transformation": {
    "transformation_id": 25,
    "name": "My Tansform",
    "type": "batch_process_triggered",
    "created_ts": "2020-09-22 17:20:38",
    "updated_ts": "2020-09-22 17:25:34",
    "description_markup": "<p>desc</p>",
    "description_raw": "desc",
    "steward": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    },
    "tech_poc": {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  }
}

HTTPs Request

PUT /transformations

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

There are no path parameters for this endpoint.

Body

Field Required Description
name Yes The name of the transformation
type Yes The type of transformation, alid values are batch_process_scheduled, batch_process_triggered, other, pub_sub_event and sql_trigger
description No The description to give the transformation
tech_poc No The ID for the user to assign as the technical point of contact for this transformation, if no value is provided the user executing The API will be used
steward No The ID for the user to assign as the steward for this transformation, if no value is provided the user executing the API will be used

Response

Field Data Type Description
transformation Transformation Object A transformation object

Response Codes

Value Description
200 Existing transformation retrieved
201 Transformation Created
404 The transformaiton requested does not exist

Delete A Transformation

Delete a transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/36'

resp = r.delete(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/31

Delete a transformation

HTTPs Request

DELETE /transformations/{transformation_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
transformation_id The ID for the transformation to be deleted

Body

There is no body for this endpoint.

Response

There is no response body for this endpoint.

Response Codes

Value Description
200 Transformation deleted
404 The transformaiton requested does not exist

Get Tags for a Transformation

To get existing tags from a data schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/2/tags'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/transformations/1/tags"

Get exsiting tags

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ]
}

HTTPs Request

GET /transformations/{transformation_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
transformation_id The ID for the transformation to add the tag(s) to

Body

There is no body object for this endpoint

Response

Field Data Type Description
tags List[string] The list of tags for the data store

Response Codes

Value Description
200 The list of tags was retrieved successfully
400 A malformed request was made, descriptions of the error will be provided in the body
404 The transformation could not be found, descriptions of the error will be provided in the body

Tag A Transformation

To tag a transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/30/tags'

tags = {'tags': ['api tag', 'transform tag', 'pii', 'mktg']}

resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "transform tag", "pii", "mktg"]}' \
"$BASE_URL/transformations/30/tags"

Add a tag to a transformation.

Returns the object:

{
  "tags": [
    "api tag",
    "schema tag",
    "pii",
    "mktg"
  ],
  "tag_statuses": [
    "added",
    "added",
    "added",
    "added"
  ]
}

HTTPs Request

POST /transformations/{transformation_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
transformation_id The ID for the transformation to add the tag(s) to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
tags List[string] The list of tags that were processed
tag_statuses List[string] The status for each tag processed, statuses match the same index position as their corresponding tag. Values include added and exists.

Response Codes

Value Description
200 All of the tags requested already existed for the transformation
201 At least one of the tags requested was added
400 A malformed request was made, descriptions of the error will be provided in the body

Remove Tags from a Transformation

To remove one or more tags from a transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/30/tags'

tags = {'tags': ['api tag', 'mktg']}

resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
"$BASE_URL/transformations/30/tags"

Remove a tag from a transformation.

Returns the object:

{
  "removed_tags": [
    "api tag",
    "mktg"
  ]
}

HTTPs Request

DELETE /transformations/{transformation_id}/tags

Query Parameters

There are no query parameters for this endpoint

Path Parameters

Parameter Description
transformation_id The ID for the transformation to add the tag(s) to

Body

Field Required Description
tags List[string] A list of string values to add as tags, each tag can be up to 32 characters

Response

Field Data Type Description
removed_tags List[string] The list of tags that were removed

Response Codes

Value Description
200 The tags were removed successfully
400 A malformed request was made, descriptions of the error will be provided in the body

Transformation Links

Transformation links capture how data moves from field to field between your schemas. A single transformation link represents a single field to field movement. A single transformation (which may represent a data pipeline, or ETL / ELT job) will likely contain many transformation links.

The transformation link object

{
  "transformation_link_id": 1,
  "created_ts": "2020-09-22 23:54:26",
  "updated_ts": "2020-09-22 23:54:26",
  "source_data_store_id": 3,
  "source_data_store_name": "Kafka Prod",
  "source_schema_id": 17,
  "source_schema_name": "users-topic.v1",
  "source_field_id": 200,
  "source_field_name": "user_id",
  "target_data_store_id": 4,
  "target_data_store_name": "Redshift",
  "target_schema_id": 469,
  "target_schema_name": "usr.user_info",
  "target_field_id": 5399,
  "target_field_name": "user_id"
}

The transformation link object contains references to all of the data stores, schemas and fields that are associated when data moves from one schema to another, these associations are referred to as the source and target.

Field Data Type Description
transformation_link_id integer The ID used to uniquely represent the transformation link
source_data_store_id integer The unique ID for the data store for the source of the transformation.
source_schema_id integer The unique ID for the schema for the source of the transformation.
source_field_id integer The unique ID for the field for the source of the transformation.
target_data_store_id integer The unique ID for the data store for the target of the transformation.
target_schema_id integer The unique ID for the schema for the target of the transformation.
target_field_id integer The unique ID for the field for the target of the transformation.
created_ts timestamp The timestamp that the transformation link was created
updated_ts timestamp The timestamp that the transformation link was updated

To get all links for a transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/1/links

List all transformation links for a given transformation.

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 4
  },
  "transformation_links": [
    {
      "transformation_link_id": 1,
      "created_ts": "2020-09-22 23:54:26",
      "updated_ts": "2020-09-22 23:54:26",
      "source_data_store_id": 3,
      "source_data_store_name": "Kafka Prod",
      "source_schema_id": 17,
      "source_schema_name": "users-topic.v1",
      "source_field_id": 200,
      "source_field_name": "user_id",
      "target_data_store_id": 4,
      "target_data_store_name": "Redshift",
      "target_schema_id": 469,
      "target_schema_name": "usr.user_info",
      "target_field_id": 5399,
      "target_field_name": "user_id"
    },
    {
      "transformation_link_id": 1,
      "created_ts": "2020-09-22 23:54:26",
      "updated_ts": "2020-09-22 23:54:26",
      "source_data_store_id": 3,
      "source_data_store_name": "Kafka Prod",
      "source_schema_id": 17,
      "source_schema_name": "users-topic.v1",
      "source_field_id": 201,
      "source_field_name": "email",
      "target_data_store_id": 4,
      "target_data_store_name": "Redshift",
      "target_schema_id": 469,
      "target_schema_name": "usr.user_info",
      "target_field_id": 5400,
      "target_field_name": "email"
    }
  ]
}

HTTPs Request

GET /transformations/{transformation_id}/links

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores

Path Parameters

Parameter Description
transformation_id The ID for the transformation to retrieve the links

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
transformation_links list[Transformation Link Object] A list of transformation objects

Response Codes

Value Description
200 Retrieved all transformation links

To a transformation link

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links/1'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/1/links/1

Get a single transformation link for a given transformation.

Returns the object:

{
  "transformation_link": {
      "transformation_link_id": 1,
      "created_ts": "2020-09-22 23:54:26",
      "updated_ts": "2020-09-22 23:54:26",
      "source_data_store_id": 3,
      "source_data_store_name": "Kafka Prod",
      "source_schema_id": 17,
      "source_schema_name": "users-topic.v1",
      "source_field_id": 200,
      "source_field_name": "user_id",
      "target_data_store_id": 4,
      "target_data_store_name": "Redshift",
      "target_schema_id": 469,
      "target_schema_name": "usr.user_info",
      "target_field_id": 5399,
      "target_field_name": "user_id"
    }
}

HTTPs Request

GET /transformations/{transformation_id}/links/{transformation_link_id}

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through data stores

Path Parameters

Parameter Description
transformation_id The ID for the transformation to retrieve the links
transformation_link_id The ID for the transformation link

Body

There is no body for this endpoint.

Response

Field Data Type Description
transformation_link Transformation Link Object A transformation object

Response Codes

Value Description
200 Retrieved the transformation link

Create links for a transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links'

new_links = {
    'links': [
        {
            'source_field_id': 89,
            'target_field_id': 5399
        },
        {
            'source_field_id': 200,
            'target_field_id': 5399
        }
    ]
}

resp = r.post(url, json=new_links, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST \
-H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"links": [{"source_field_id": 89, "target_field_id": 5399}, {"source_field_id": 200, "target_field_id": 5399}]}' \
$BASE_URL/transformations/1/links

Create links for a transformation.

When creating links you only need to link to fields together - a source field and a target field. Tree Schema will infer the schema and data store directly from the field IDs.

Returns the object:

{
  "links": [
    {
      "source_field_id": 89,
      "target_field_id": 5399
    },
    {
      "source_field_id": 200,
      "target_field_id": 5399
    }
  ],
  "link_statuses": [
    "exists",
    "exists"
  ],
  "updated_links": [
    {
      "transformation_link_id": 205,
      "source_field_id": 89,
      "source_field_name": "account_type",
      "source_schema_id": 8,
      "source_schema_name": "public.accounts",
      "source_data_store_id": 3,
      "source_data_store_name": "Postgres Prod",
      "target_field_id": 5399,
      "target_field_name": "acct_type",
      "target_schema_id": 469,
      "target_schema_name": "acct.dvc.raw.01",
      "target_data_store_id": 4,
      "target_data_store_name": "Kafka"
    },
    {
      "transformation_link_id": 206,
      "source_field_id": 200,
      "source_field_name": "user_id",
      "source_schema_id": 17,
      "source_schema_name": "public.users",
      "source_data_store_id": 3,
      "source_data_store_name": "Postgres Prod",
      "target_field_id": 5399,
      "target_field_name": "acct_type",
      "target_schema_id": 469,
      "target_schema_name": "acct.dvc.raw.01",
      "target_data_store_id": 4,
      "target_data_store_name": "Kafka"
    }
  ]

}

HTTPs Request

POST /transformations/{transformation_id}/links

Query Parameters

Parameter Default Description
set_state False If True, the state of the transformation will be set to the links provieded,

any exsisting links in the transformation that are not part of the input will be deprecated and any links that are provided but do not exist in the transformation will be created

Path Parameters

Parameter Description
transformation_id The ID for the transformation to add the links

Body

Check if a given state of links will break data lineage by providing a list of source to target fields.

Field Required Description
links Yes List[Transformation source to target mapping] that represents the source and target for each transformation link

Transformation source to target mapping

Field Required Description
source_field_id Yes The field_id for the source field where data moves from
target_field_id Yes The field_id for the target field where data moves to

Response

Field Data Type Description
links list[Transformation source to target mapping] The same source to target mapping inputs provided as the input
link_statuses list[string]] The status for each link processed, statuses match the same index position as their corresponding link. Values include created, exists and could_not_create.
updated_links list[Transformation Link Object] A list of transformation link objects for each transformation link requested that was created or already exists

Response Codes

Value Description
200 All transformation links processed
201 At least one transformation link was created
400 A malformed request was made, descriptions of the error will be provided in the body

Check Transformation for Breaking Change

Check for breaking changes to a Transformation

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links/check-breaking-change'

links = {
    'link_state': [
        {
            'source_field_id': 1,
            'target_field_id': 2
        },
        {
            'source_field_id': 2,
            'target_field_id': 3
        }
    ]
}

resp = r.post(url, json=links, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST \
-H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"link_state": [{"source_field_id": 1, "target_field_id": 2}, {"source_field_id": 2, "target_field_id": 3}]}' \
$BASE_URL/transformations/1/links/check-breaking-change

Check to see if a change to the links in a transformation will cause a breaking change by passing in a mock "state" of your new transformation. For example, if your transformation contains the following data lineage: A -> B -> C -> D, then it would have the following links: (A, B), (B, C), (C, D).

With this API, check the impact to the data lineage in your entire catalog - not only the data lineage within this single transformation - by creating a new "state" of your transformation. Continuiing with the example above, if you check for breaking changes with the new mock state [(B, C), (C, D)] then you would have removed the link (A, B). In this scenario, the user is explicitly removing the link going into Field B, therefore B is not considered broken. However, everything that is downstream from B would be considered broken.

Returns the object:

{
  "breaking": true,
  "impact_summary": {
    "fields": 1,
    "schemas": 1,
    "data_stores": 1
  },
  "impacted_assets": [
    {
      "field_id": 7,
      "schema_id": 7,
      "data_store_id": 1,
      "impact_chain": [
        {
          "field_id": 6,
          "schema_id": 6,
          "data_store_id": 1
        },
        {
          "field_id": 3,
          "schema_id": 3,
          "data_store_id": 1
        }
      ]
    }
  ]
}

HTTPs Request

POST /transformations/{transformation_id}/links/check-breaking-change

Query Parameters

Parameter Default Description
max_depth 5 The maximum down-stream depth to return results for the branches of the impacted assets. This can be set to 1 in order to see the immediate downstream results. The maximum depth is 10. Larger values will cause longer processing time.

Path Parameters

Parameter Description
transformation_id The ID for the transformation to check the breaking changes

Body

Transformation links are created by providing a list of source to target fields.

Field Required Description
link_state Yes List[Transformation source to target mapping] that represents the source and target for each transformation link for the new state of the transformation.

Transformation source to target mapping

Field Required Description
source_field_id Yes The field_id for the source field where data moves from
target_field_id Yes The field_id for the target field where data moves to

Response

Field Data Type Description
breaking Boolean Whether or not there are any breaking changes to the transformation given the state provided
impact_summary Dict High level metrics on the total number of assets impacted by the breaking changes. This will contain the keys fields, schemas and data_stores. The values for each key are the total number of unique impacted assets for the given type.
impacted_assets list[Impacted Asset] A list of transformation link objects for each transformation link requested that was created or already exists

Impacted Asset

Impacted assets are the data assets that are impacted by a breaking change to data lineage. Tree Schema captures data lineage at the field level, therefore, all Impacted Assets contain the field level identifiers, along with the corresponding schema and data store identifiers.

Impacted Asset Object

Field Data Type Required Description
field_id Integer Yes The ID for the data field that is impacted
schema_id Integer Yes The ID for the data schema associated to the data field
data_store_id Integer Yes The ID for the data store associated to the schema
impact_chain list[Impacted Asset] No This field is only created on the highest level of the original Impact Asset list. This is list of the data assets that would be considered broken by the new state of the transformation links. This list sorted in the order of the data lineage. For example, if field E has an impact chain of [B, C, D] then the actual lineage would be B -> C -> D -> E.

Response Codes

Value Description
200 The request was successful
400 A malformed request was made, descriptions of the error will be provided in the body

Delete a single transformation link

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links'

delete_links = {
    'transform_link_ids': [
        206,205
    ]
}

resp = r.delete(url, json=delete_links, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X DELETE \
-H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"transform_link_ids": [142, 144]}' \
$BASE_URL/transformations/1/links

Delete links for a transformation.

Returns the object:

{
  "links": [
    205,
    206
  ],
  "link_statuses": [
    "deleted",
    "deleted"
  ]
}

HTTPs Request

DELETE /transformations/{transformation_id}/links

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

Parameter Description
transformation_id The ID for the transformation to delete the links

Body

Transformation links are created by providing a list of source to target fields.

Field Required Description
transformation_link_ids Yes List[integer] The list of transformation link IDs to delete

Response

Field Data Type Description
transformation_link_ids List[integer] The list of transformation link IDs submitted to delete
link_statuses list[string]] The status for each link processed, statuses match the same index position as their corresponding link. Values include deleted and could_not_delete.

Response Codes

Value Description
200 All transformation links processed
400 A malformed request was made, descriptions of the error will be provided in the body

======

Users

Access your teammates and assign them as tech pocs and data stewards.

User object

The user object

{
  "user_id": 2,
  "name": "Asher",
  "email": "asher@treeschema.com"
}

User Result Object Fields

Field Data Type Description
user_id integer The ID used to uniquely represent the user
name string The name of the user
email string The user's email

Get All Users

Get all users in your organization

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/users'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/users

Get all users in your organization

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 2
  },
  "users": [
    {
      "user_id": 2,
      "name": "Asher",
      "email": "asher@treeschema.com"
    },
    {
      "user_id": 1,
      "name": "Grant",
      "email": "grant@treeschema.com"
    }
  ]
}

HTTPs Request

GET /users

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through search results
email null A user's email address

Path Parameters

There are no path parameters for this endpoint.

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
users list[User Object] A list of users

Response Codes

Value Description
200 Retrieved the users

Get a User

Get a user in your organization

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/users/1'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/users/1

Get all users in your organization

Returns the object:

{
"user": {
    "user_id": 1,
    "name": "Grant",
    "email": "grant@treeschema.com"
  }
}

HTTPs Request

GET /users/{user_id}

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

There are no path parameters for this endpoint.

Body

There is no body for this endpoint.

Response

Field Data Type Description
user User Object] A users

Response Codes

Value Description
200 Retrieved the users
404 The user requested was not found

Full Catalog Search

Search your entire data catalog from a single place. Can't remember what data store your user_analytics schema sits in? Need a refresher on where that pesky usr_start_dt field is? Search the catalog!

Catalog Search Object

The search result object

{
  "entity_id": 5405,
  "schema_id": 469,
  "data_store_id": 4,
  "name": "device_id",
  "entity_type": "field"
}

The search API is intended to search the following:

The search results are intended to enable a simple and easy way to quickly find the key IDs needed to use your Tree Schema catalog.

Search Result Object Fields

Field Data Type Description
entity_id integer The ID used to uniquely represent the entity, this goes with the entity_type to find a specific item in the catalog
entity_type string The type of entity that the entity_id relates to, possible values include data_store, data_schema, field, and transformation
name string the name of the object
data_store_id integer If the entity resides within a data store, for example a data_schema or field then this field will be populated, otherwise it will be null
schema_id integer If the entity resides within a data schema, for example a field then this field will be populated, otherwise it will be null

Search the Catalog

Search the catalog

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/search?term=usr'

resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/search?term=usr

Search the entire catalog

Returns the object:

{
  "meta": {
    "current_page": 1,
    "next_page": null,
    "total_cnt": 7
  },
  "results": [
    {
      "entity_id": 5405,
      "schema_id": 469,
      "data_store_id": 4,
      "name": "device_id",
      "entity_type": "field"
    },
    {
      "entity_id": 79,
      "schema_id": 7,
      "data_store_id": 1,
      "name": "DEVICE_ID",
      "entity_type": "field"
    }
  ]
}

HTTPs Request

GET /search

Query Parameters

Parameter Default Description
page 1 The page to retrieve when paginating through search results
term None The search term to look for in the data catalog

Path Parameters

There are no path parameters for this endpoint.

Body

There is no body for this endpoint.

Response

Field Data Type Description
meta Meta object A meta object for pagination
results list[Search Results Object] A list of search results

Response Codes

Value Description
200 Retrieved the search results

Batch Load Assets

You can make batch requests to load data stores, schemas and fields to save time with the API overhead that comes with requesting individual items.

Batch Load Response Object

The batch load result object

{
  "data_stores": [
    {"DATA_STORE_OBJECT"},
    {"DATA_STORE_OBJECT"},
    ...
  ],
  "data_schemas": [
    {"DATA_SCHEMA_OBJECT"},
    {"DATA_SCHEMA_OBJECT"},
    ...
  ],
  "data_fields": [
    {"DATA_FIELD_OBJECT"},
    {"DATA_FIELD_OBJECT"},
    ...
  ]
}

The batch load API is intended to retrieve the information you need about data stores, schemas and fields when you already have the ID for the corresponding assets available. The API will always return the required parent assets for each data asset that you request. As an example, if you make a request to batch load three fields then the response will contain the following:

Consider the following example. The data assets in Tree Schema are as follows:

The request is made to Tree Schema to batch load FIELD_1, FIELD_2, FIELD_3 and SCHEMA_3. In this example, the following assets would be returned:

Batch Request Assets

Batch requests to Tree Schema

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/batch-assets'

assets = {
    'assets': [
        {'type': 'schema', 'id': 2},
        {'type': 'data_store', 'id': 1},
        {'type': 'field', 'id': 1}
    ]
}

resp = r.get(url, json=assets, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'


curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"assets": [{"type": "schema", "id": 2}, {"type": "data_store", "id": 1}, {"type": "field", "id": 1}]}' \
$BASE_URL/batch-assets

Batch request objects from Tree Schema

Returns the object:

{
  "data_stores": [
    {
      "data_store_id": 1,
      "name": "My API DS #3 AWS S3",
      "type": "s3",
      "other_type": null,
      "created_ts": "2021-02-01 04:32:53",
      "updated_ts": "2021-02-01 04:32:53",
      "description_markup": null,
      "description_raw": null,
      "steward": {
        "user_id": 1,
        "name": "Grant",
        "email": "gramt@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Asher",
        "email": "asher@treeschema.com"
      },
      "details": {}
    }
  ],
  "data_schemas": [
    {
      "data_schema_id": 1,
      "name": "My API Schema #1",
      "type": "csv",
      "schema_loc": "My API Schema #1",
      "created_ts": "2021-02-01 04:56:43",
      "updated_ts": "2021-02-01 04:59:16",
      "description_markup": "<p>This is an updated description</p>",
      "description_raw": "This is an updated description",
      "data_store_id": 1,
      "steward": {
        "user_id": 1,
        "name": "Grant",
        "email": "gramt@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Asher",
        "email": "asher@treeschema.com"
      },
    }
  ],
  "data_fields": [
    {
      "field_id": 1,
      "name": "my_field",
      "parent_path": null,
      "full_path_name": "my_field",
      "type": "list",
      "data_type": "array",
      "data_format": "YYYY-MM-DD",
      "nullable": true,
      "created_ts": "2021-02-01 05:14:13",
      "updated_ts": "2021-02-01 15:09:27",
      "description_markup": "<p>--- NEW DESC ---</p>",
      "description_raw": "--- NEW DESC ---",
      "data_schema_id": 1,
      "steward": {
        "user_id": 1,
        "name": "Grant",
        "email": "gramt@treeschema.com"
      },
      "tech_poc": {
        "user_id": 1,
        "name": "Asher",
        "email": "asher@treeschema.com"
      },
    }
  ]
}

HTTPs Request

POST /batch-assets

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

There are no path parameters for this endpoint.

Body

Field Type Description
assets list[Batch Asset Request Object] A batch request objects

Batch Asset Request Object

Field Type Description
type String Must be of type data_store, schema or field
id Integer The ID of the data asset

Response

Field Data Type Description
data_stores list[Data Store Object] A list of data stores requested, or parents of other data assets requested
data_schemas list[Data Schema Object] A list of data schemas requested, or parents of other data assets requested
data_fields list[Data Field Object] A list of data fields requested

Response Codes

Value Description
200 Retrieved the batch response
400 Invalid requests, errors will be provided in the body

dbt

Tree Schema can process your dbt file in order to ingest your existing metadata into Tree Schema. The first step is to upload a manifest file to a data store, this is because all dbt processing occurs within a data store. To upload the file to a data store use the data store dbt endpoint.

Once the file has been uploaded you can retrieve the status of the parsing process and trigger the step to save the underlying results.

Get dbt Parse Status

Retrieve the status

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/dbt/parse-results'

params = {'dbt_process_id': dbt_process_id}

resp = r.get(url, params=params, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/dbt/parse-results?$DBT_PROCESS_ID

This status check uses the dbt_process_id that is returned from the data store dbt endpoint.

Returns the object:

{
  "dbt_process_id": "72c83743-cad1-48b4-9916-78bc02083772",
  "found": true,
  "status": "success",
  "error_msg": null,
  "dbt_schemas": [
    {
      "schema_name": "TS_SCHEMA1.cust_mkt_segment",
      "schema_type": "view",
      "schema_status": "exists"
    },
    {
      "schema_name": "TS_SCHEMA1.cnt_segment",
      "schema_type": "view",
      "schema_status": "exists"
    }
  ],
  "dbt_lineage": [
    {
      "source_schema_name": "TS_SCHEMA1.cust_mkt_segment",
      "target_schema_name": "TS_SCHEMA1.max_segment"
    },
    {
      "source_schema_name": "TS_SCHEMA1.cust_mkt_segment",
      "target_schema_name": "TS_SCHEMA1.segment_total"
    }
  ]
}


HTTPs Request

GET /dbt/parse-results

Query Parameters

Parameter Default Description
dbt_process_id None The dbt process ID that was returned when starting the parsing process

Path Parameters

There are no path parameters for this endpoint.

Body

There is no body for this endpoint.

Response

Field Data Type Description
dbt_process_id string The same dbt_process_id provided as part of the request
found boolean Whether or not the dbt_process_id was found, since the file processing occurs asynchronously it is possible that the dbt_process_id will not be found if the request for the parsing status occurs immediately after the uploading of the file.
status string The status of the upload, this will be waiting if found is False or if the file processing has not yet completed. It will be success if it has completed parsing successfully, error if an error occurred or processed if the data has already been saved.
error_msg string The error that occurred during processing, only provided if the status is error, otherwise it is null
dbt_schemas List[Dict] A list of schema objects found in the dbt manifest file. The name and type of schema will be provideda as well as whether or not the schema already exists in Tree Schema.
dbt_lineage List[Dict] A list of lineage objects found in the dbt manifest file.

Response Codes

Value Description
200 Retrieved the status results

Save dbt Results

Save the results from parsing the dbt manifest file

import requests as r

BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/dbt/save-results'

data = {
  'dbt_process_id': dbt_process_id,
  'add_schemas_fields': False,
  'update_descriptions': True,
  'update_tags': True,
  'add_lineage': True
}
resp = r.post(url, json=data, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'

curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"dbt_process_id": "$DBT_PROCESS_ID", "add_schemas_fields": false, "update_descriptions": true, "update_tags": true, "add_lineage": false}' \
$BASE_URL/dbt/save-results

Saves the results from the parsed manifest.json file using the options provided by the user. The user can choose to add the schemas and fields, update descriptions, update tags and add lineage from the manifest into Tree Schema. For more detials on these options see the dbt documentation in Tree Schema.

Returns the object:

{
   "dbt_process_id": "b4000641-eed7-4345-9ba0-7701f77ce568"
}

HTTPs Request

GET /dbt/save-results

Query Parameters

There are no query parameters for this endpoint.

Path Parameters

There are no path parameters for this endpoint.

Body

There is no body for this endpoint.

Field Required Description
dbt_process_id Yes The dbt process ID that was returned when starting the parsing process
add_schemas_fields No Whether or not to add schemas and fields if they do not exist within Tree Schema. Defaults to False as it is generally better to first allow Tree Schema to auto discover what exists.
update_descriptions No Whether or not to update descriptions of the schemas and fields in Tree Schema. This only applies if the corresponding schemas and fields in the manifest file contain descriptions. Defaults to False as this could overwrite descriptions that have been updated in Tree Schema. It is generally good to use set this to True on the initial load to bootstrap your documentation.
update_tags No Whether or not to update tags for the schemas and fields in Tree Schema. This only applies if the corresponding schemas and fields in the manifest file contain tags. Defaults to True.
add_lineage No Whether or not to add the data lineage for the schemas and fields in Tree Schema. This only applies if the corresponding schemas and fields in the manifest file contain tags. Defaults to True.

Response

Field Data Type Description
dbt_process_id string The same dbt_process_id provided as part of the request

Response Codes

Value Description
201 Request successfully received

Errors

These are common errors that apan across the Tree Schema API Requests.

Error Code Meaning
400 Bad Request -- Your request is invalid.
401 Unauthorized -- Your API key is wrong.
403 Forbidden -- The resource requested is hidden for administrators only.
404 Not Found -- The specified resource could not be found.
406 Not Acceptable -- You requested a format that isn't json.
429 Too Many Requests -- You've made too many requests!
500 Internal Server Error -- We had a problem with our server. Try again later.
503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.