Introduction
Welcome to the Tree Schema API!
The Tree Schema API gives you programatic access to just about every resource within Tree Schema. The Tree Schema API is designed to give you the ability to keep your data catalog up to date by integrating Tree Schema directly into your ETL jobs, model pipelines and analytical workflows.
We have language bindings depicted in Shell and Python (with more on the way!) but all of the interfaces are built around REST so you can interact with Tree Schema from any You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.
All API requests are made to the following host:
https://api.treeschema.com/catalog
Make sure to properly authenticate!
API Overview
Authentication
Use base64 encoding to create the authentication string for your account. You will need to concatenate your email, a colon and your secret key before base64 encoding the full string. Add the Authorization key to your headers with the
Basic
prefix.
SECRET_KEY=your_secret_key
TREE_SCHEMA_EMAIL=your_email
ENCODED_SECRET=$(echo -n "$TREE_SCHEMA_EMAIL:$SECRET_KEY" | openssl base64)
curl -H "Authorization: Basic $ENCODED_SECRET" \
"https://api.treeschema.com/catalog/search?term=dev"
import base64
import requests as r
creds = (your_email + ':' + your_secret_key).encode('utf-8')
encoded_creds = base64.b64encode(creds).decode('utf-8')
headers = {
'Authorization': 'Basic ' + encoded_creds
}
resp = r.get(..., headers=headers)
Authorization is done using a combination of the email used for your Tree Schema account and your user secret key. Your organization owner will first need to enable programatic access for your org and once that is done you can access your personal secret key from your user profile.
Tree Schema expects for your secret key to be included in all API requests to the server in a header Authorization that looks like the following:
Authorization: Basic your_encoded_secret
You can view detailed instructions on how to generate your API keys in our help and documentation.
Pagination
An example of a meta response object with a next page
{
"meta": {
"current_page": 2,
"next_page": 3,
"total_cnt": 123
},
...
}
An example of a meta response object without a next page
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 5
},
...
}
When retrieving a list of objects with a [GET] request, results are being paginated by Tree Schema.
All paginated responses return 1000 results per request.
Meta Response Object
Field | Data Type | Description |
---|---|---|
current_page | integer | The number for the current page |
next_page | integer | The number for the next page, if there is a next page, this will be null if there is not a next page |
total_cnt | integer | The total count of objects returned for the given API |
Meta information is returned for all queries that contain pagination. The meta object will respond with the page number for the next page and the total
Additional Headers
HTTP headers:
{ "Content-Type": "application/json" }
Every POST
, PUT
and DELETE
HTTP request sent to the Tree Schema Public API must specify the Content-Type
entity header to application/json
.
Data Stores
Data stores are containers for your data, they can be databases, file stores, dashboard tools and more. They are where your data physically (or virtually) resides. You can create and retrieve data stores.
Data Store Object
The data store object
{
"data_store_id": 18,
"name": "Kafka Prod Cluster",
"type": "kafka",
"other_type": null,
"created_ts": "2020-09-23 18:16:16",
"updated_ts": "2020-09-23 18:16:16",
"description_markup": "<p>This is the Kafka cluster.</p>",
"description_raw": "This is the Kafka cluster.",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
"details": {
"bootstrap_servers": "1.3.5.7:22"
}
}
The Data Store object is returned when you GET a single or multiple data store(s). It is also returned when you create a data store. An example of the data store object can be seen to the right.
Data Store Object Fields
Field | Data Type | Description |
---|---|---|
data_store_id | integer | The ID used to uniquely represent the data store, the same ID can be found in the Tree Schema GUI, the URL for the data store will contain the data store ID |
name | string | The name of the data store |
type | string | The type of the data store |
other_type | string | The more detailed type, if provided |
created_ts | timestamp | The timestamp that the data store was created |
updated_ts | timestamp | The timestamp that the data store was updated |
description_markup | string | An HTML string that represents the full markup description |
description_raw | string | The data store description that has had all markup removed |
steward | User Object] | The data steward assigned to the data store |
tech_poc | User Object] | The technical point of contact assigned to the data store |
details | object | An object that can contain any arbitrary key/value pairs for the data store. Details will include information such as host and port, if the data store is connected to a data base, but users can also add arbitary key/value pairs of information and they will be returned as well. |
Get All Data Stores
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores'
headers = {'Authorization': 'Basic your_encoded_secret'}
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores"
Retrieve all data stores in your organization.
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 5
},
"data_stores": [
{
"data_store_id": 18,
"name": "Kafka Prod Cluster",
"type": "kafka",
"other_type": null,
"created_ts": "2020-09-23 18:16:16",
"updated_ts": "2020-09-23 18:16:16",
"description_markup": "<p>This is the Kafka cluster.</p>",
"description_raw": "This is the Kafka cluster.",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
"details": {
"bootstrap_servers": "1.3.5.7:22"
}
}
]
}
HTTPs Request
GET /data-stores
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
name | null |
The name of the data store |
Path Parameters
There are no path parameters for this endpoint.
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
data_stores | list[Data Store Object] | A list of data store objects |
Response Codes
Value | Description |
---|---|
200 | Retrieved all data stores |
Get A Data Store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1'
headers = {'Authorization': 'Basic your_encoded_secret'}
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores/1"
Retrieve a specific data stores from your organization.
Returns the object:
{
"data_store": {
"data_store_id": 1,
"name": "Oracle DB",
"type": "oracle",
"other_type": "",
"created_ts": "2020-08-15 17:15:24",
"updated_ts": "2020-08-15 17:15:24",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"details": {
"host": "oracle.host",
"port": 1521,
"servicename": "dbschema"
}
}
}
HTTPs Request
GET /data-stores/{data_store_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store to retrieve. |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
data_store | Data Store Object | A data store object |
Response Codes
Value | Description |
---|---|
200 | Successfully retrieved data store |
404 | The data store ID requested could not be found |
Create A Data Store
To create the data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores'
new_ds_data = {
'name': "My API Data Store",
'type': 'postgres',
'tech_poc': 2,
'description': 'This data store was created via an API'
}
resp = r.post(url, json=new_ds_data, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "My API Data Store - From Shell", "type": "other", "other_type": "some other value", "tech_poc": 2, "description": "This data store was created via an API"}' \
$BASE_URL/data-stores
Create a new data store. If the name of the data store you are trying to create already exists then the existing data store will be returned.
Returns the object:
{
"data_store": {
"data_store_id": 20,
"name": "My API Data Store",
"type": "postgres",
"other_type": null,
"created_ts": "2020-09-29 14:51:14",
"updated_ts": "2020-09-29 14:51:14",
"description_markup": "<p>This data store was created via an API</p>",
"description_raw": "This data store was created via an API",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
"details": {}
}
}
HTTPs Request
POST /data-stores
Query Parameters
There are no query parameters for this endpoint
Path Parameters
There are no path parameters for this endpoint.
Body
Field | Required | Description |
---|---|---|
name | Yes | The name of the data store |
type | Yes | The type of data store, must be one of: dynamodb , kafka , mongodb , mysql , oracle , other , postgres , redis , redshift or s3 |
other_type | No | A more descriptive type of data store that can augment the field type if the value other is chosen |
description | No | The description to give the data store |
tech_poc | No | The ID for the user to assign as the technical point of contact for this data store, if no value is provided the user executing The API will be used |
steward | No | The ID for the user to assign as the steward for this data store, if no value is provided the user executing the API will be used |
Response
Field | Data Type | Description |
---|---|---|
data_store | Data Store Object | A data store object |
Response Codes
Value | Description |
---|---|
200 | A data store with the same name already exists |
201 | Data Store Created |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Get Tags for a Data Store
To get existing tags from a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/tags'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/data-stores/19/tags"
Get exsiting tags
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
]
}
HTTPs Request
GET /data-stores/{data_store_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store to get the tag(s) for |
Body
There is no body object for this endpoint
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags for the data store |
Response Codes
Value | Description |
---|---|
200 | The list of tags was retrieved successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Tag A Data Store
To tag a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/tags'
tags = {'tags': ['api tag', 'schema tag', 'pii', 'mktg']}
resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "schema tag", "pii", "mktg"]}' \
"$BASE_URL/data-stores/19/tags"
Add a tag to a data store.
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
],
"tag_statuses": [
"added",
"added",
"added",
"added"
]
}
HTTPs Request
POST /data-stores/{data_store_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store to add the tag(s) to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags that were processed |
tag_statuses | List[string] | The status for each tag processed, statuses match the same index position as their corresponding tag. Values include added and exists . |
Response Codes
Value | Description |
---|---|
200 | All of the tags requested already existed for the data store |
201 | At least one of the tags requested was added |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Remove Tags from a Data Store
To remove one or more tags from a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/tags'
tags = {'tags': ['api tag', 'mktg']}
resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
$BASE_URL/data-stores/1/tags
Remove one or more tags from a data store.
Returns the object:
{
"removed_tags": [
"api tag",
"mktg"
]
}
HTTPs Request
DELETE /data-stores/{data_store_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema to add tags to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
removed_tags | List[string] | The list of tags that were removed |
Response Codes
Value | Description |
---|---|
200 | The tags were removed successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store could not be found, descriptions of the error will be provided in the body |
Upload a dbt Manifest File
To upload a manifest file
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/46/dbt/parse-manifest'
file_loc = "./sample_files/manifest.json"
files = {'manifest_file': open(file_loc,'rb')}
# Note the additional 'Accept' parameter for the file!
headers = {
'Accept': 'application/octet-stream',
'Authorization': 'Basic ' + encoded_creds
}
resp = r.post(url, files=files, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Accept: application/octet-stream" \
-H "Content-Type: multipart/form-data" \
-F 'manifest_file=@./target/manifest.json' \
"$BASE_URL/data-stores/46/dbt/parse-manifest"
You may upload a dbt manifest file to a data store in order to allow Tree Schema to automatically extract the schemas, fields, descriptions, tags and lineage from your dbt output. This is the first of 2 required steps in order to save the results in Tree Schema. This step parses the dbt file, the second step saves the results. Optionally, you can view the parsed results before saving.
Learn more about the dbt processing in the Tree Schema help documentation.
Returns the object:
{
"dbt_process_id": "b4000641-eed7-4345-9ba0-7701f77ce568"
}
HTTPs Request
POST /data-stores/{data_store_id}/dbt/parse-manifest
Headers
This endpoint requires different header parameters than the other APIs since it uploads a file.
Header | Description |
---|---|
Authorization | The standard authorization header, as defined above |
Accept | Set to application/octet-stream |
Content-Type | Set to multipart/form-data |
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that the dbt manifest belongs to. dbt processes run within the context of a database and this should correspond to the same database that you have already defined in Tree Schema. |
Body
Field | Required | Description |
---|---|---|
manifest_file | Yes | A file object of the dbt manifest.json |
Response
Field | Data Type | Description |
---|---|---|
dbt_process_id | string | A unique ID for the process that will parse the dbt file. This will be used in subsequent calls to retrieve the status and to persist the results from the parsed file. |
Response Codes
Value | Description |
---|---|
201 | The request created a new process to parse the file |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Data Schemas
Schemas are the heart and soul of a Data Catalog. They describe the shape, structure and format of the data. You may typically have data schemas represented as a table, a JSON or Parquet file, or a CSV but a Data Schema is really just a reference to a structured set of fields.
All schemas reside within a data store, therefore, in order to interact with a data schema you must know the data store that it belongs to.
Data Schema Object
The data schema object
{
"data_schema_id": 16,
"name": "My API Schema",
"type": "table",
"schema_loc": null,
"created_ts": "2020-09-23 14:56:02",
"updated_ts": "2020-09-23 14:56:02",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
The Data Schema object is returned when you GET a single or multiple data schema(s) from a data store. It is also returned when you create a new data schema. An example of the data schema object can be seen to the right.
Data Schema Object Fields
Field | Data Type | Description |
---|---|---|
data_schema_id | integer | The ID used to uniquely represent the data schema, the same ID can be found in the Tree Schema GUI, the URL for the data schema will contain the data schema ID |
name | string | The name of the data schema |
type | string | The type of the data schema |
schema_loc | string | The location where the schema resides, this is used primarily for object data stores, such as s3. The schema location would represent the path to the directory where the schmema exists. For most schemas, the schema_loc will be the same as the name. |
created_ts | timestamp | The timestamp that the data store was created |
updated_ts | timestamp | The timestamp that the data store was updated |
description_markup | string | An HTML string that represents the full markup description |
description_raw | string | The data store description that has had all markup removed |
steward | User Object] | The data steward assigned to the data store |
tech_poc | User Object] | The technical point of contact assigned to the data store |
Get All Schemas from Data Store
To get all schemas for a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores/1/schemas"
List all schemas for a data store.
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 4
},
"data_schemas": [
{
"data_schema_id": 16,
"name": "public.session_info",
"type": "table",
"schema_loc": "public.session_info",
"created_ts": "2020-09-23 14:56:02",
"updated_ts": "2020-09-23 14:56:02",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
},
{
"data_schema_id": 7,
"name": "public.device_info",
"type": "table",
"schema_loc": "public.device_info",
"created_ts": "2020-08-15 22:10:17",
"updated_ts": "2020-08-15 22:10:17",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
]
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
name | null |
The name of the data schema |
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that you are listing schemas for |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
data_schemas | list[Data Schema Object] | A list of data schema objects |
Response Codes
Response Codes
Value | Description |
---|---|
200 | Retrieved all data schemas for the data store |
404 | The data store ID requested could not be found |
Get a Schema
To get a single schemas from a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
"$BASE_URL/data-stores/1/schemas/1"
Get a single schema from a data store.
Returns the object:
{
"data_schema": {
"data_schema_id": 1,
"name": "public.session_info",
"type": "table",
"schema_loc": "public.session_info",
"created_ts": "2020-08-15 17:16:10",
"updated_ts": "2020-08-15 17:16:10",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Asher",
"email": "asher@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas/{data_schema_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that you are listing schemas for |
data_schema_id | The ID for the data schema that exists within the data store |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
data_schema | Data Schema Object | A data store object |
Response Codes
Value | Description |
---|---|
200 | Retrieved the data schema from the data store |
404 | The data store ID requested could not be found or the schema requested does not exist within the data store |
Create a Schema
To create a schema in a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas'
new_schema = {
'name': "My API Schema",
'type': 'table',
'description': 'This schema is created via API'
}
resp = r.post(url, json=new_schema, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "My API Schema - Shell", "type": "table", "description": "This schema is created via API"}' \
$BASE_URL/data-stores/1/schemas
Create a data schema. Since a schema must reside within a data store the data store that you want to contain the schema must be specified in the URL path. If a schema with the same name (case insensitive) already exists within the data store then the existing schema is returned and no updates are made.
Returns the object:
{
"data_schema": {
"data_schema_id": 501,
"name": "My New API Schema",
"type": "table",
"schema_loc": "My New API Schema",
"created_ts": "2020-09-29 16:07:16",
"updated_ts": "2020-09-29 16:07:16",
"description_markup": "<p>This schema is created via API</p>",
"description_raw": "This schema is created via API",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
}
HTTPs Request
POST /data-stores/{data_store_id}/schemas
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that will contain the schema being created |
Body
Field | Required | Description |
---|---|---|
name | Yes | The name of the data schema |
type | Yes | The type of data schema, must be one of: avro , csv , csv_other , json , parquet , other , table , view or tsv |
description | No | The description to give the schema |
schema_loc | No | The location where the schema resides, this is used primarily for object data stores, such as s3. The schema location would represent the path to the directory where the schmema exists. For most schemas, the schema_loc will be the same as the name . If a schema_loc is not provided then the value will be set as value provided for the name |
tech_poc | No | The ID for the user to assign as the technical point of contact for this data schema, if no value is provided the user executing The API will be used |
steward | No | The ID for the user to assign as the steward for this data schema, if no value is provided the user executing the API will be used |
Response
Field | Data Type | Description |
---|---|---|
data_schema | Data Schema Object | A data schema object |
Response Codes
Value | Description |
---|---|
409 | A data schema with the same name already exists and was returned instead of creating a new object |
201 | Data Schema Created |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found |
Update A Schema
To update a single schema in a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1'
updates = {
'description': "A new description",
'type': 'parquet',
'tech_poc': '1',
'steward': 2
}
resp = r.post(url, json=updates, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"description": "New Schema description", "type": "view", "tech_poc": "1", "steward": 2}' \
$BASE_URL/data-stores/1/schemas/1
Update a single scehma in a data store. You can update the description, schema type, tech POC and steward.
Returns the updated object:
{
"data_schema": {
"data_schema_id": 1,
"name": "DS1",
"type": "view",
"schema_loc": "DS1",
"created_ts": "2021-01-29 14:39:10",
"updated_ts": "2021-02-01 12:37:43",
"description_markup": "<p>New Schema description</p>",
"description_raw": "New Schema description",
"data_store_id": 1,
"steward": {
"user_id": 2,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Asher",
"email": "asher@treeschema.com"
}
}
}
HTTPs Request
POST /data-stores/{data_store_id}/schemas/{data_schema_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
Body
One or more of the following fields must be provided.
Field | Required | Description |
---|---|---|
type | No | The type of the field, valid values are avro , csv , csv_other , json , parquet , other , table , view or tsv |
description | No | The new description for the field, this will override any existing description |
tech_poc | No | The ID for the user to assign as the technical point of contact for this data field, if no value is provided the user executing The API will be used |
steward | No | The ID for the user to assign as the steward for this data field, if no value is provided the user executing the API will be used |
Response
Field | Data Type | Description |
---|---|---|
data_schema | Data Schema Object | A data schema object |
Response Codes
Value | Description |
---|---|
200 | The data schema was updated successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found or the data schema does not exist within the data store |
Delete Schemas
To delete multiple schemas from a data store
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas'
delete_schemas = {'schema_ids': [501, 502]}
resp = r.delete(url, json=delete_schemas, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"schema_ids": [8, 9]}' \
$BASE_URL/data-stores/1/schemas
There is no response in the body for this request
Deprecates data schemas that exist within a data store. In order to delete the schemas, the schema IDs must exist within the data store specified in the path parameters. If multiple schema IDs are provided and some exist within the data store but others do not exist within the data store then only those that exist within the data store will be deleted.
HTTPs Request
DELETE /data-stores/{data_store_id}/schemas
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that will contain the schema being deleted |
Body
Field | Required | Description |
---|---|---|
schema_ids | list[integer] | A list of IDs that corresponds to the schemas to be deleted. |
Response
There is no response body for this endpoint.
Response Codes
Value | Description |
---|---|
200 | The schemas provided were deleted from the data store |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found |
Get Tags for a Data Schema
To get existing tags from a data schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/schemas/20/tags'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/data-stores/19/schemas/19/tags"
Get exsiting tags
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
]
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas/{data_schema_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema to add tags to |
data_schema_id | The ID for the data schema to have the tags added to |
Body
There is no body object for this endpoint
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags for the data store |
Response Codes
Value | Description |
---|---|
200 | The list of tags was retrieved successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store or schema could not be found, descriptions of the error will be provided in the body |
Tag A Schema
To tag a data schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/tags'
tags = {'tags': ['api tag', 'schema tag', 'pii', 'mktg2']}
resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "schema tag", "pii", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/1/tags
Add one or more tags to a data schema.
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
],
"tag_statuses": [
"added",
"added",
"added",
"added"
]
}
HTTPs Request
POST /data-stores/{data_store_id}/schemas/{data_schema_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema to add tags to |
data_schema_id | The ID for the data schema to have the tags added to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags that were processed |
tag_statuses | List[string] | The status for each tag processed, statuses match the same index position as their corresponding tag |
Response Codes
Value | Description |
---|---|
200 | All of the tags requested already existed for the data store |
201 | At least one of the tags requested was added |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Remove Tags from a Schema
To remove one or more tags from a schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/2/tags'
tags = {'tags': ['api tag', 'mktg']}
resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/2/tags
Remove one or more tags from a schema.
Returns the object:
{
"removed_tags": [
"api tag",
"mktg"
]
}
HTTPs Request
DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema to add tags to |
data_schema_id | The ID for the data schema to have the tags added to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
removed_tags | List[string] | The list of tags that were removed |
Response Codes
Value | Description |
---|---|
200 | The tags were removed successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store or schema could not be found, descriptions of the error will be provided in the body |
Data Fields
Data Fields are the most granular part of your catalog that describes the format and data type of our underlying data. Whether your Fields are represented as columns in a table, keys in JSON file, or Structs in a distributed Parquet data set you can capture their meaning and definition with Tree Schema Fields.
All fields reside within a data schema, therefore, in order to interact with a data fields you must know the data schema and data store that it belongs to.
Data Fields Object
The data fields object
{
"field_id": 1,
"name": "FIRST_NAME",
"parent_path": null,
"full_path_name": "FIRST_NAME",
"type": "scalar",
"data_type": "string",
"data_format": "VARCHAR2",
"nullable": false,
"created_ts": "2020-08-15 17:16:11",
"updated_ts": "2020-08-15 17:16:11",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
}
}
The Data Fields object is returned when you GET a single or multiple data field(s) from a data schema. It is also returned when you create a new data field. An example of the data field object can be seen to the right.
Data Field Object Fields
Field | Data Type | Description |
---|---|---|
field_id | integer | The ID used to uniquely represent the data schema, the same ID can be found in the Tree Schema GUI, the URL for the data schema will contain the data schema ID |
name | string | The name of the field, for example, this would be the column name if the field is from a table or CSV or it could be a struct name if the field is from a Parquet file |
parent_path | string | The dot-notation path for the parent to this field, this is only provided for fields that are contained within other fields, e.g. {"parent_field": {"child_field": 1}} would be parent_field.child_field |
full_path_name | string | This is a concatenation of the parent path and the name . If the parent path is null then this value is the same as the name |
type | string | Valid values include scalar , object and list |
data_type | string | A JSON compatible data type, values include array , boolean , bytes , null , number , object and string |
data_format | string | A free-form field that describes the format of the data, this could be varchar(32) , YYYY-MM-DD , float(16) , etc. |
nullable | boolean | Whether or not the field can be null |
created_ts | timestamp | The timestamp that the field was created in Tree Schema |
updated_ts | timestamp | The timestamp that the field was updated in Tree Schema |
description_markup | string | An HTML string that represents the full markup description |
description_raw | string | The field description that has had all markup removed |
steward | User Object] | The data steward assigned to the field |
tech_poc | User Object] | The technical point of contact assigned to the field |
Get All Fields from Schema
To get all fields for a data schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields
List all fields for a data schema.
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 3
},
"data_fields": [
{
"field_id": 1,
"name": "FIRST_NAME",
"parent_path": null,
"full_path_name": "FIRST_NAME",
"type": "scalar",
"data_type": "string",
"data_format": "VARCHAR2",
"nullable": false,
"created_ts": "2020-08-15 17:16:11",
"updated_ts": "2020-08-15 17:16:11",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
},
{
"field_id": 2,
"name": "LAST_NAME",
"parent_path": null,
"full_path_name": "LAST_NAME",
"type": "scalar",
"data_type": "string",
"data_format": "VARCHAR2",
"nullable": false,
"created_ts": "2020-08-15 17:16:11",
"updated_ts": "2020-08-15 17:16:11",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
]
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
name | name |
The name of the field |
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
data_fields | list[Data Field Object] | A list of data field objects |
Response Codes
Value | Description |
---|---|
200 | Retrieved all data fields for the data schema |
404 | The data store ID requested could not be found or the data schema does not exist within the data store |
Get A Field
To get a single field from a data schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1
Get a single field from a schema.
Returns the object:
{
"data_field": {
"field_id": 1,
"name": "FIRST_NAME",
"parent_path": null,
"full_path_name": "FIRST_NAME",
"type": "scalar",
"data_type": "string",
"data_format": "VARCHAR2",
"nullable": false,
"created_ts": "2020-08-15 17:16:11",
"updated_ts": "2020-08-15 17:16:11",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
data_field_id | The ID for the data field being requested |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
data_field | Data Field Object | A data field object |
Response Codes
Value | Description |
---|---|
200 | Retrieved the data field for the schema |
404 | The data store ID requested could not be found or the data schema does not exist within the data store or the data field does not exist within the schema |
Create A Field
To create a single field in a data schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields'
new_field = {
'name': "my_field.sub_field",
'type': 'scalar',
'data_type': 'number',
'data_format': 'integer(16)'
}
resp = r.put(url, json=new_field, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X PUT -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "my_field.sub_field.from_shell", "type": "scalar", "data_type": "number", "data_format": "integer(16)"}' \
$BASE_URL/data-stores/1/schemas/1/fields
Create a single field in a schema. If a field with the same name (case insensitive) already exists within the schema then the existing field is returned and no updates are made.
Returns the object:
{
"data_field": {
"field_id": 5453,
"name": "sub_field",
"parent_path": "my_field",
"full_path_name": "my_field.sub_field",
"type": "scalar",
"data_type": "number",
"data_format": "integer(16)",
"nullable": true,
"created_ts": "2020-09-29 18:07:09",
"updated_ts": "2020-09-29 18:07:09",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
}
HTTPs Request
PUT /data-stores/{data_store_id}/schemas/{data_schema_id}/fields
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
Body
Field | Required | Description |
---|---|---|
name | Yes | The name of the field |
type | Yes | The type of the field, valid values are scalar , list and object |
data_type | Yes | The data type for the field, this is a representation of the field as a JSON compatible data type, must be one of array , boolean , bytes , null , number , object or string |
data_format | Yes | A free-form field that describes the format of the data, this could be varchar(32) , YYYY-MM-DD , float(16) , etc. |
nullable | No | Whether or not the field can be null, defaults to True |
tech_poc | No | The ID for the user to assign as the technical point of contact for this data field, if no value is provided the user executing The API will be used |
steward | No | The ID for the user to assign as the steward for this data field, if no value is provided the user executing the API will be used |
Response
Field | Data Type | Description |
---|---|---|
data_field | Data Field Object | A data field object |
Response Codes
Value | Description |
---|---|
200 | A data field with the same name already exists in the schema and was returned |
201 | The data field was created |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found or the data schema does not exist within the data store |
Update A Field
To update a single field in a data schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1'
updates = {
'description': "A new description",
'type': 'list',
'data_type': 'array',
'data_format': 'YYYY-MM-DD',
'nullable': False,
'tech_poc': '1',
'steward': 2
}
resp = r.post(url, json=updates, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"description": "A new description", "type": "list", "data_type": "array", "data_format": "YYYY-MM-DD", "nullable": false, "tech_poc": "1", "steward": 2}' \
$BASE_URL/data-stores/1/schemas/1/fields/1
Update a single field in a schema. You can update any value for a field except for the name.
Returns the updated object:
{
"data_field": {
"field_id": 5453,
"name": "sub_field",
"parent_path": "my_field",
"full_path_name": "my_field.sub_field",
"type": "scalar",
"data_type": "number",
"data_format": "integer(16)",
"nullable": true,
"created_ts": "2020-09-29 18:07:09",
"updated_ts": "2020-09-29 18:07:09",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
}
HTTPs Request
POST /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
data_field_id | The ID for the data field being requested |
Body
Field | Required | Description |
---|---|---|
type | No | The type of the field, valid values are scalar , list and object |
data_type | No | The data type for the field, this is a representation of the field as a JSON compatible data type, must be one of array , boolean , bytes , null , number , object or string |
data_format | No | A free-form field that describes the format of the data, this could be varchar(32) , YYYY-MM-DD , float(16) , etc. |
description | No | The new description for the field, this will override any existing description |
nullable | No | Whether or not the field can be null, defaults to True |
tech_poc | No | The ID for the user to assign as the technical point of contact for this data field, if no value is provided the user executing The API will be used |
steward | No | The ID for the user to assign as the steward for this data field, if no value is provided the user executing the API will be used |
Response
Field | Data Type | Description |
---|---|---|
data_field | Data Field Object | A data field object |
Response Codes
Value | Description |
---|---|
200 | The data field was updated successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found or the data schema does not exist within the data store or the data field does not exist within the schema |
Delete Multiple Fields
To delete fields from a schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields'
delete_fields = {'field_ids': [5452, 5454]}
resp = r.delete(url, json=delete_fields, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"field_ids": [5453]}' \
$BASE_URL/data-stores/1/schemas/504/fields
There is no response in the body for this request
Deprecates data fields that exist within a data schema. In order to deprecate the fields, the field IDs must exist within the data schema specified in the path parameters. If multiple field IDs are provided and some exist within the data schema but others do not exist within the data schema then only those that exist within the data store will be deleted.
HTTPs Request
DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/fields
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that will contain the field(s) being deleted |
data_schema_id | The ID for the data schema that will contain the field(s) being deleted |
Body
Field | Required | Description |
---|---|---|
field_ids | list[integer] | A list of IDs that corresponds to the fields to be deleted. |
Response
There is no response body for this endpoint.
Response Codes
Value | Description |
---|---|
200 | The fields provided were deleted from the data store |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found or the data schema ID does not exist within the data store ID provided |
Delete A Field
To delete a field from a schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1'
resp = r.delete(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1
There is no response in the body for this request
Deprecates data field in the path provided.
HTTPs Request
DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that will contain the field being deleted |
data_schema_id | The ID for the data schema that will contain the field being deleted |
data_field_id | The ID for the data field to be deleted |
Body
There is no body for this endpoint.
Response
There is no response body for this endpoint.
Response Codes
Value | Description |
---|---|
200 | The field was deleted from the data store |
404 | The data store ID requested could not be found or the data schema ID does not exist within the data store ID provided or the data field does not exist within the schema |
Get Tags for a Field
To get existing tags for a field
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/20/schemas/20/fields/20/tags'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/data-stores/19/schemas/19/fields/19/tags"
Get exsiting tags
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
]
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema to add tags to |
data_schema_id | The ID for the data schema to have the tags added to |
data_field_id | The ID for the field to add the tags to |
Body
There is no body object for this endpoint
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags for the data store |
Response Codes
Value | Description |
---|---|
200 | The list of tags was retrieved successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store, schema or field could not be found, descriptions of the error will be provided in the body |
Tag A Field
To tag a field
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/2/fields/4/tags'
tags = {'tags': ['api tag', 'schema tag', 'pii', 'mktg2']}
resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "schema tag", "pii", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/2/fields/5/tags
Add a tag to a data field.
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
],
"tag_statuses": [
"added",
"added",
"added",
"added"
]
}
HTTPs Request
POST /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema to add tags to |
data_schema_id | The ID for the data schema to have the tags added to |
data_field_id | The ID for the field to add the tags to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags that were processed |
tag_statuses | List[string] | The status for each tag processed, statuses match the same index position as their corresponding tag |
Response Codes
Value | Description |
---|---|
200 | All of the tags requested already existed for the field |
201 | At least one of the tags requested was added |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store, schema or field could not be found, descriptions of the error will be provided in the body |
Remove Tags from a Field
To remove one or more tags from a field
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/2/fields/4/tags'
tags = {'tags': ['api tag', 'mktg']}
resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
$BASE_URL/data-stores/1/schemas/2/fields/5/tags
Remove a tag from a field.
Returns the object:
{
"removed_tags": [
"api tag",
"mktg"
]
}
HTTPs Request
DELETE /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema to add tags to |
data_schema_id | The ID for the data schema to have the tags added to |
data_field_id | The ID for the field to add the tags to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
removed_tags | List[string] | The list of tags that were removed |
Response Codes
Value | Description |
---|---|
200 | The tags were removed successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store, schema or field could not be found, descriptions of the error will be provided in the body |
Field Values
Field values are just that - values for a field. For example, if your field is status_code
you may have the values 01
, 02
, 03
, etc. and each of these values has a specific meaning. Field values allow you to capture both the value and the meaning of the value.
All field values reside within a data field, therefore, in order to interact with a field value you must know the data field, data schema, and data store that it belongs to.
Field Value Object
The field value object
{
"field_value_id": 396,
"field_value": "01",
"description_markup": "<p>New customer</p>",
"description_raw": "New customer",
"created_ts": "2020-08-15 22:10:18",
"updated_ts": "2020-08-15 22:10:18"
}
The Field Value object is returned when you GET a single or multiple field value(s) from a data field. It is also returned when you create a new field value. An example of the field value object can be seen to the right.
Field Value Object Fields
Field | Data Type | Description |
---|---|---|
field_value_id | integer | The ID used to uniquely represent the field value |
field_value | string | The value |
description_markup | string | An HTML string that represents the full markup description, this can be null if no description has been provided |
description_raw | string | The field description that has had all markup removed, this can be null if no description has been provided |
created_ts | timestamp | The timestamp that the field was |
created in Tree Schema updated_ts | timestamp | The timestamp that the field was updated in Tree Schema
Get All Values for Field
To get all values for a data field
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1/values'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1/values
List all values for a data field.
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 4
},
"field_values": [
{
"field_value_id": 1,
"field_value": "01",
"description_markup": "<p>New customer</p>",
"description_raw": "New customer",
"created_ts": "2020-08-15 22:10:18",
"updated_ts": "2020-08-15 22:10:18"
},
{
"field_value_id": 2,
"field_value": "02",
"created_ts": "2020-08-15 22:10:18",
"updated_ts": "2020-08-15 22:10:18",
"description_markup": null,
"description_raw": null
}
]
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
value | null |
The value of a sample value to retrieve |
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
data_field_id | The ID for the data field that the values belong to |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
field_values | list[Field Value Object] | A list of field value objects |
Response Codes
Value | Description |
---|---|
200 | Retrieved all field values for the field |
404 | The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema |
Get A Sample Value
To get a single valuee for a data field
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/1/fields/1/values/1'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/data-stores/1/schemas/1/fields/1/values/1
List all values for a data field.
Returns the object:
{
"field_value": {
"field_value_id": 1,
"field_value": "01",
"description_markup": "<p>New customer</p>",
"description_raw": "New customer",
"created_ts": "2020-08-15 22:10:18",
"updated_ts": "2020-08-15 22:10:18"
}
}
HTTPs Request
GET /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values/{field_value_id}
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
data_field_id | The ID for the data field that the values belong to |
field_value_id | The ID for the specific field value to retrieve |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
field_value | Field Value Object | A field value object |
Response Codes
Value | Description |
---|---|
200 | Retrieved all field values for the field |
404 | The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema |
Create A Field Value
To create a value for a data field
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/7/fields/78/values'
new_field_value = {
'field_value': 'a new value here',
'description': 'and a new description'
}
resp = r.put(url, json=new_field_value, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X PUT -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"field_value": "a second value here", "description": "and a new description"}' \
$BASE_URL/data-stores/1/schemas/7/fields/78/values
Create a new value for a field.
Returns the object:
{
"field_value": {
"field_value_id": 16323,
"field_value": "a new value here",
"created_ts": "2020-09-29 20:51:18",
"updated_ts": "2020-09-29 20:51:18",
"description_markup": "<p>and a new description</p>",
"description_raw": "and a new description"
}
}
HTTPs Request
PUT /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
data_field_id | The ID for the data field that the values belong to |
Body
Field | Required | Description |
---|---|---|
field_value | Yes | The sample value for the field |
description | No | The description for the sample value, an omitted description will be created as null |
Response
Field | Data Type | Description |
---|---|---|
field_value | Field Value Object | A field value object |
Response Codes
Value | Description |
---|---|
201 | Created the field value |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema |
409 | The field value already exists for the field provided |
Update a Field Value
To update a value for a data field
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/data-stores/1/schemas/7/fields/78/values/16324'
new_desc = {'description': 'new description goes here'}
resp = r.post(url, json=new_desc, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"field_value": "a second value here", "description": "and a new description"}' \
$BASE_URL/data-stores/1/schemas/7/fields/78/values/16324
Update a value for a field.
Returns the object:
{
"field_value": {
"field_value_id": 16323,
"field_value": "a new value here",
"created_ts": "2020-09-29 20:51:18",
"updated_ts": "2020-09-29 20:51:18",
"description_markup": "<p>and a new description</p>",
"description_raw": "and a new description"
}
}
HTTPs Request
POST /data-stores/{data_store_id}/schemas/{data_schema_id}/fields/{data_field_id}/values/{field_value_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
data_store_id | The ID for the data store that contains the schema in the path |
data_schema_id | The ID for the data schema that contains the fields being requested |
data_field_id | The ID for the data field that the values belong to |
field_value_id | The ID of the field value to update |
Body
Field | Required | Description |
---|---|---|
field_value | No | The sample value for the field, if omitted the existing field value will remain in place |
description | No | The description for the sample value, if omitted the existing description will remain in place |
Response
Field | Data Type | Description |
---|---|---|
field_value | Field Value Object | A field value object |
Response Codes
Value | Description |
---|---|
200 | Retrieved all field values for the field |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The data store ID requested could not be found or the data schema does not exist within the data store or the field does not exist within the schema |
Transformations
Creating Transformations in Tree Schema is a critical part of unlocking the true value in your data as it allows you to see how data moves from system to system, identify dependencies in your data flow and to create your data lineage. Transformations describe data movement from field to field between schemas.
Transformation Object
The transformation object
{
"transformation_id": 25,
"name": "my api transform #2",
"type": "some",
"created_ts": "2020-09-22 17:20:38",
"updated_ts": "2020-09-22 17:25:34",
"description_markup": "<p>desc</p>",
"description_raw": "desc",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
The transformation object by itself is a shell, it is only used to hold transformations links. Once a transformation object is created add transformation links to it in order to build your data lineage!
Transformation Object Fields
Field | Data Type | Description |
---|---|---|
transformation_id | integer | The ID used to uniquely represent the transformation, the same ID can be found in the Tree Schema GUI, the URL for the transformation will contain the transformation ID |
name | string | The name of the transformation |
type | string | The type of the transformation, valid values are batch_process_scheduled , batch_process_triggered , other , pub_sub_event and sql_trigger |
created_ts | timestamp | The timestamp that the transformation was created |
updated_ts | timestamp | The timestamp that the transformation was updated |
description_markup | string | An HTML string that represents the full markup description |
description_raw | string | The transformation description that has had all markup removed |
steward | User Object] | The data steward assigned to the transformation |
tech_poc | User Object] | The technical point of contact assigned to the transformation |
Get All Transformations
To get all transformations
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations
List all transformations.
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 2
},
"transformations": [
{
"transformation_id": 25,
"name": "My Tansform",
"type": "batch_process_triggered",
"created_ts": "2020-09-22 17:20:38",
"updated_ts": "2020-09-22 17:25:34",
"description_markup": "<p>desc</p>",
"description_raw": "desc",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
},
{
"transformation_id": 28,
"name": "My Second Transformation",
"type": "other",
"created_ts": "2020-09-22 18:06:17",
"updated_ts": "2020-09-22 18:06:56",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
]
}
HTTPs Request
GET /transformations
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
name | null |
The name of the transformation to return |
Path Parameters
There are no path parameters for this endpoint.
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
transformations | list[Transformation Object] | A list of transformation objects |
Response Codes
Value | Description |
---|---|
200 | Retrieved all transformations |
Get A Transformation
Get a single transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/25'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/25
Get a single transformation
Returns the object:
{
"transformation": {
"transformation_id": 25,
"name": "My Tansform",
"type": "batch_process_triggered",
"created_ts": "2020-09-22 17:20:38",
"updated_ts": "2020-09-22 17:25:34",
"description_markup": "<p>desc</p>",
"description_raw": "desc",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
}
HTTPs Request
GET /transformations/{transformation_id}
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation being retrieved |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
transformation | Transformation Object | A transformation object |
Response Codes
Value | Description |
---|---|
200 | Retrieved the transformations |
404 | The transformaiton requested does not exist |
Create A Transformation
Create a new transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations'
new_transform = {
'name': 'My API Transformation!',
'type': 'other'
}
resp = r.put(url, json=new_transform, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X PUT -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"name": "My API Transformation2", "type": "other"}' \
$BASE_URL/transformations
Create a transformation
Returns the object:
{
"transformation": {
"transformation_id": 25,
"name": "My Tansform",
"type": "batch_process_triggered",
"created_ts": "2020-09-22 17:20:38",
"updated_ts": "2020-09-22 17:25:34",
"description_markup": "<p>desc</p>",
"description_raw": "desc",
"steward": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
}
HTTPs Request
PUT /transformations
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
There are no path parameters for this endpoint.
Body
Field | Required | Description |
---|---|---|
name | Yes | The name of the transformation |
type | Yes | The type of transformation, alid values are batch_process_scheduled , batch_process_triggered , other , pub_sub_event and sql_trigger |
description | No | The description to give the transformation |
tech_poc | No | The ID for the user to assign as the technical point of contact for this transformation, if no value is provided the user executing The API will be used |
steward | No | The ID for the user to assign as the steward for this transformation, if no value is provided the user executing the API will be used |
Response
Field | Data Type | Description |
---|---|---|
transformation | Transformation Object | A transformation object |
Response Codes
Value | Description |
---|---|
200 | Existing transformation retrieved |
201 | Transformation Created |
404 | The transformaiton requested does not exist |
Delete A Transformation
Delete a transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/36'
resp = r.delete(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/31
Delete a transformation
HTTPs Request
DELETE /transformations/{transformation_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to be deleted |
Body
There is no body for this endpoint.
Response
There is no response body for this endpoint.
Response Codes
Value | Description |
---|---|
200 | Transformation deleted |
404 | The transformaiton requested does not exist |
Get Tags for a Transformation
To get existing tags from a data schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/2/tags'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
"$BASE_URL/transformations/1/tags"
Get exsiting tags
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
]
}
HTTPs Request
GET /transformations/{transformation_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to add the tag(s) to |
Body
There is no body object for this endpoint
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags for the data store |
Response Codes
Value | Description |
---|---|
200 | The list of tags was retrieved successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
404 | The transformation could not be found, descriptions of the error will be provided in the body |
Tag A Transformation
To tag a transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/30/tags'
tags = {'tags': ['api tag', 'transform tag', 'pii', 'mktg']}
resp = r.post(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "transform tag", "pii", "mktg"]}' \
"$BASE_URL/transformations/30/tags"
Add a tag to a transformation.
Returns the object:
{
"tags": [
"api tag",
"schema tag",
"pii",
"mktg"
],
"tag_statuses": [
"added",
"added",
"added",
"added"
]
}
HTTPs Request
POST /transformations/{transformation_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to add the tag(s) to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
tags | List[string] | The list of tags that were processed |
tag_statuses | List[string] | The status for each tag processed, statuses match the same index position as their corresponding tag. Values include added and exists . |
Response Codes
Value | Description |
---|---|
200 | All of the tags requested already existed for the transformation |
201 | At least one of the tags requested was added |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Remove Tags from a Transformation
To remove one or more tags from a transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/30/tags'
tags = {'tags': ['api tag', 'mktg']}
resp = r.delete(url, json=tags, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"tags": ["api tag", "mktg"]}' \
"$BASE_URL/transformations/30/tags"
Remove a tag from a transformation.
Returns the object:
{
"removed_tags": [
"api tag",
"mktg"
]
}
HTTPs Request
DELETE /transformations/{transformation_id}/tags
Query Parameters
There are no query parameters for this endpoint
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to add the tag(s) to |
Body
Field | Required | Description |
---|---|---|
tags | List[string] | A list of string values to add as tags, each tag can be up to 32 characters |
Response
Field | Data Type | Description |
---|---|---|
removed_tags | List[string] | The list of tags that were removed |
Response Codes
Value | Description |
---|---|
200 | The tags were removed successfully |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Transformation Links
Transformation links capture how data moves from field to field between your schemas. A single transformation link represents a single field to field movement. A single transformation (which may represent a data pipeline, or ETL / ELT job) will likely contain many transformation links.
Transformation Link Object
The transformation link object
{
"transformation_link_id": 1,
"created_ts": "2020-09-22 23:54:26",
"updated_ts": "2020-09-22 23:54:26",
"source_data_store_id": 3,
"source_data_store_name": "Kafka Prod",
"source_schema_id": 17,
"source_schema_name": "users-topic.v1",
"source_field_id": 200,
"source_field_name": "user_id",
"target_data_store_id": 4,
"target_data_store_name": "Redshift",
"target_schema_id": 469,
"target_schema_name": "usr.user_info",
"target_field_id": 5399,
"target_field_name": "user_id"
}
The transformation link object contains references to all of the data stores, schemas and fields that are associated when data moves from one schema to another, these associations are referred to as the source
and target
.
Transformation Link Object Fields
Field | Data Type | Description |
---|---|---|
transformation_link_id | integer | The ID used to uniquely represent the transformation link |
source_data_store_id | integer | The unique ID for the data store for the source of the transformation. |
source_schema_id | integer | The unique ID for the schema for the source of the transformation. |
source_field_id | integer | The unique ID for the field for the source of the transformation. |
target_data_store_id | integer | The unique ID for the data store for the target of the transformation. |
target_schema_id | integer | The unique ID for the schema for the target of the transformation. |
target_field_id | integer | The unique ID for the field for the target of the transformation. |
created_ts | timestamp | The timestamp that the transformation link was created |
updated_ts | timestamp | The timestamp that the transformation link was updated |
Get Links for A Transformation
To get all links for a transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/1/links
List all transformation links for a given transformation.
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 4
},
"transformation_links": [
{
"transformation_link_id": 1,
"created_ts": "2020-09-22 23:54:26",
"updated_ts": "2020-09-22 23:54:26",
"source_data_store_id": 3,
"source_data_store_name": "Kafka Prod",
"source_schema_id": 17,
"source_schema_name": "users-topic.v1",
"source_field_id": 200,
"source_field_name": "user_id",
"target_data_store_id": 4,
"target_data_store_name": "Redshift",
"target_schema_id": 469,
"target_schema_name": "usr.user_info",
"target_field_id": 5399,
"target_field_name": "user_id"
},
{
"transformation_link_id": 1,
"created_ts": "2020-09-22 23:54:26",
"updated_ts": "2020-09-22 23:54:26",
"source_data_store_id": 3,
"source_data_store_name": "Kafka Prod",
"source_schema_id": 17,
"source_schema_name": "users-topic.v1",
"source_field_id": 201,
"source_field_name": "email",
"target_data_store_id": 4,
"target_data_store_name": "Redshift",
"target_schema_id": 469,
"target_schema_name": "usr.user_info",
"target_field_id": 5400,
"target_field_name": "email"
}
]
}
HTTPs Request
GET /transformations/{transformation_id}/links
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to retrieve the links |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
transformation_links | list[Transformation Link Object] | A list of transformation objects |
Response Codes
Value | Description |
---|---|
200 | Retrieved all transformation links |
Get a Transformation Link
To a transformation link
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links/1'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/transformations/1/links/1
Get a single transformation link for a given transformation.
Returns the object:
{
"transformation_link": {
"transformation_link_id": 1,
"created_ts": "2020-09-22 23:54:26",
"updated_ts": "2020-09-22 23:54:26",
"source_data_store_id": 3,
"source_data_store_name": "Kafka Prod",
"source_schema_id": 17,
"source_schema_name": "users-topic.v1",
"source_field_id": 200,
"source_field_name": "user_id",
"target_data_store_id": 4,
"target_data_store_name": "Redshift",
"target_schema_id": 469,
"target_schema_name": "usr.user_info",
"target_field_id": 5399,
"target_field_name": "user_id"
}
}
HTTPs Request
GET /transformations/{transformation_id}/links/{transformation_link_id}
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through data stores |
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to retrieve the links |
transformation_link_id | The ID for the transformation link |
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
transformation_link | Transformation Link Object | A transformation object |
Response Codes
Value | Description |
---|---|
200 | Retrieved the transformation link |
Create Transformation Links
Create links for a transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links'
new_links = {
'links': [
{
'source_field_id': 89,
'target_field_id': 5399
},
{
'source_field_id': 200,
'target_field_id': 5399
}
]
}
resp = r.post(url, json=new_links, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST \
-H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"links": [{"source_field_id": 89, "target_field_id": 5399}, {"source_field_id": 200, "target_field_id": 5399}]}' \
$BASE_URL/transformations/1/links
Create links for a transformation.
When creating links you only need to link to fields together - a source field and a target field. Tree Schema will infer the schema and data store directly from the field IDs.
Returns the object:
{
"links": [
{
"source_field_id": 89,
"target_field_id": 5399
},
{
"source_field_id": 200,
"target_field_id": 5399
}
],
"link_statuses": [
"exists",
"exists"
],
"updated_links": [
{
"transformation_link_id": 205,
"source_field_id": 89,
"source_field_name": "account_type",
"source_schema_id": 8,
"source_schema_name": "public.accounts",
"source_data_store_id": 3,
"source_data_store_name": "Postgres Prod",
"target_field_id": 5399,
"target_field_name": "acct_type",
"target_schema_id": 469,
"target_schema_name": "acct.dvc.raw.01",
"target_data_store_id": 4,
"target_data_store_name": "Kafka"
},
{
"transformation_link_id": 206,
"source_field_id": 200,
"source_field_name": "user_id",
"source_schema_id": 17,
"source_schema_name": "public.users",
"source_data_store_id": 3,
"source_data_store_name": "Postgres Prod",
"target_field_id": 5399,
"target_field_name": "acct_type",
"target_schema_id": 469,
"target_schema_name": "acct.dvc.raw.01",
"target_data_store_id": 4,
"target_data_store_name": "Kafka"
}
]
}
HTTPs Request
POST /transformations/{transformation_id}/links
Query Parameters
Parameter | Default | Description |
---|---|---|
set_state | False |
If True, the state of the transformation will be set to the links provieded, |
any exsisting links in the transformation that are not part of the input will be deprecated and any links that are provided but do not exist in the transformation will be created
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to add the links |
Body
Check if a given state of links will break data lineage by providing a list of source to target fields.
Field | Required | Description |
---|---|---|
links | Yes | List[Transformation source to target mapping] that represents the source and target for each transformation link |
Transformation source to target mapping
Field | Required | Description |
---|---|---|
source_field_id | Yes | The field_id for the source field where data moves from |
target_field_id | Yes | The field_id for the target field where data moves to |
Response
Field | Data Type | Description |
---|---|---|
links | list[Transformation source to target mapping] | The same source to target mapping inputs provided as the input |
link_statuses | list[string]] | The status for each link processed, statuses match the same index position as their corresponding link. Values include created , exists and could_not_create . |
updated_links | list[Transformation Link Object] | A list of transformation link objects for each transformation link requested that was created or already exists |
Response Codes
Value | Description |
---|---|
200 | All transformation links processed |
201 | At least one transformation link was created |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Check Transformation for Breaking Change
Check for breaking changes to a Transformation
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links/check-breaking-change'
links = {
'link_state': [
{
'source_field_id': 1,
'target_field_id': 2
},
{
'source_field_id': 2,
'target_field_id': 3
}
]
}
resp = r.post(url, json=links, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST \
-H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"link_state": [{"source_field_id": 1, "target_field_id": 2}, {"source_field_id": 2, "target_field_id": 3}]}' \
$BASE_URL/transformations/1/links/check-breaking-change
Check to see if a change to the links in a transformation will cause a breaking change by passing in a mock "state" of your new transformation. For example, if your transformation contains the following data lineage: A -> B -> C -> D
, then it would have the following links: (A, B)
, (B, C)
, (C, D)
.
With this API, check the impact to the data lineage in your entire catalog - not only the data lineage within this single transformation - by creating a new "state" of your transformation. Continuiing with the example above, if you check for breaking changes with the new mock state [(B, C), (C, D)]
then you would have removed the link (A, B)
. In this scenario, the user is explicitly removing the link going into Field B, therefore B is not considered broken. However, everything that is downstream from B would be considered broken.
Returns the object:
{
"breaking": true,
"impact_summary": {
"fields": 1,
"schemas": 1,
"data_stores": 1
},
"impacted_assets": [
{
"field_id": 7,
"schema_id": 7,
"data_store_id": 1,
"impact_chain": [
{
"field_id": 6,
"schema_id": 6,
"data_store_id": 1
},
{
"field_id": 3,
"schema_id": 3,
"data_store_id": 1
}
]
}
]
}
HTTPs Request
POST /transformations/{transformation_id}/links/check-breaking-change
Query Parameters
Parameter | Default | Description |
---|---|---|
max_depth | 5 |
The maximum down-stream depth to return results for the branches of the impacted assets. This can be set to 1 in order to see the immediate downstream results. The maximum depth is 10. Larger values will cause longer processing time. |
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to check the breaking changes |
Body
Transformation links are created by providing a list of source to target fields.
Field | Required | Description |
---|---|---|
link_state | Yes | List[Transformation source to target mapping] that represents the source and target for each transformation link for the new state of the transformation. |
Transformation source to target mapping
Field | Required | Description |
---|---|---|
source_field_id | Yes | The field_id for the source field where data moves from |
target_field_id | Yes | The field_id for the target field where data moves to |
Response
Field | Data Type | Description |
---|---|---|
breaking | Boolean | Whether or not there are any breaking changes to the transformation given the state provided |
impact_summary | Dict | High level metrics on the total number of assets impacted by the breaking changes. This will contain the keys fields , schemas and data_stores . The values for each key are the total number of unique impacted assets for the given type. |
impacted_assets | list[Impacted Asset] | A list of transformation link objects for each transformation link requested that was created or already exists |
Impacted Asset
Impacted assets are the data assets that are impacted by a breaking change to data lineage. Tree Schema captures data lineage at the field level, therefore, all Impacted Assets contain the field level identifiers, along with the corresponding schema and data store identifiers.
Impacted Asset Object
Field | Data Type | Required | Description |
---|---|---|---|
field_id | Integer | Yes | The ID for the data field that is impacted |
schema_id | Integer | Yes | The ID for the data schema associated to the data field |
data_store_id | Integer | Yes | The ID for the data store associated to the schema |
impact_chain | list[Impacted Asset] | No | This field is only created on the highest level of the original Impact Asset list. This is list of the data assets that would be considered broken by the new state of the transformation links. This list sorted in the order of the data lineage. For example, if field E has an impact chain of [B, C, D] then the actual lineage would be B -> C -> D -> E . |
Response Codes
Value | Description |
---|---|
200 | The request was successful |
400 | A malformed request was made, descriptions of the error will be provided in the body |
Delete Transformation Link
Delete a single transformation link
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/transformations/1/links'
delete_links = {
'transform_link_ids': [
206,205
]
}
resp = r.delete(url, json=delete_links, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X DELETE \
-H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"transform_link_ids": [142, 144]}' \
$BASE_URL/transformations/1/links
Delete links for a transformation.
Returns the object:
{
"links": [
205,
206
],
"link_statuses": [
"deleted",
"deleted"
]
}
HTTPs Request
DELETE /transformations/{transformation_id}/links
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
Parameter | Description |
---|---|
transformation_id | The ID for the transformation to delete the links |
Body
Transformation links are created by providing a list of source to target fields.
Field | Required | Description |
---|---|---|
transformation_link_ids | Yes | List[integer] The list of transformation link IDs to delete |
Response
Field | Data Type | Description |
---|---|---|
transformation_link_ids | List[integer] The list of transformation link IDs submitted to delete | |
link_statuses | list[string]] | The status for each link processed, statuses match the same index position as their corresponding link. Values include deleted and could_not_delete . |
Response Codes
Value | Description |
---|---|
200 | All transformation links processed |
400 | A malformed request was made, descriptions of the error will be provided in the body |
======
Users
Access your teammates and assign them as tech pocs and data stewards.
User object
The user object
{
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
}
User Result Object Fields
Field | Data Type | Description |
---|---|---|
user_id | integer | The ID used to uniquely represent the user |
name | string | The name of the user |
string | The user's email |
Get All Users
Get all users in your organization
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/users'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/users
Get all users in your organization
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 2
},
"users": [
{
"user_id": 2,
"name": "Asher",
"email": "asher@treeschema.com"
},
{
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
]
}
HTTPs Request
GET /users
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through search results |
null |
A user's email address |
Path Parameters
There are no path parameters for this endpoint.
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
users | list[User Object] | A list of users |
Response Codes
Value | Description |
---|---|
200 | Retrieved the users |
Get a User
Get a user in your organization
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/users/1'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/users/1
Get all users in your organization
Returns the object:
{
"user": {
"user_id": 1,
"name": "Grant",
"email": "grant@treeschema.com"
}
}
HTTPs Request
GET /users/{user_id}
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
There are no path parameters for this endpoint.
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
user | User Object] | A users |
Response Codes
Value | Description |
---|---|
200 | Retrieved the users |
404 | The user requested was not found |
Full Catalog Search
Search your entire data catalog from a single place. Can't remember what data store your user_analytics schema sits in? Need a refresher on where that pesky usr_start_dt field is? Search the catalog!
Catalog Search Object
The search result object
{
"entity_id": 5405,
"schema_id": 469,
"data_store_id": 4,
"name": "device_id",
"entity_type": "field"
}
The search API is intended to search the following:
- Data stores
- Data schemas
- Data Fields
- Transformations
The search results are intended to enable a simple and easy way to quickly find the key IDs needed to use your Tree Schema catalog.
Search Result Object Fields
Field | Data Type | Description |
---|---|---|
entity_id | integer | The ID used to uniquely represent the entity, this goes with the entity_type to find a specific item in the catalog |
entity_type | string | The type of entity that the entity_id relates to, possible values include data_store , data_schema , field , and transformation |
name | string | the name of the object |
data_store_id | integer | If the entity resides within a data store, for example a data_schema or field then this field will be populated, otherwise it will be null |
schema_id | integer | If the entity resides within a data schema, for example a field then this field will be populated, otherwise it will be null |
Search the Catalog
Search the catalog
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/search?term=usr'
resp = r.get(url, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/search?term=usr
Search the entire catalog
Returns the object:
{
"meta": {
"current_page": 1,
"next_page": null,
"total_cnt": 7
},
"results": [
{
"entity_id": 5405,
"schema_id": 469,
"data_store_id": 4,
"name": "device_id",
"entity_type": "field"
},
{
"entity_id": 79,
"schema_id": 7,
"data_store_id": 1,
"name": "DEVICE_ID",
"entity_type": "field"
}
]
}
HTTPs Request
GET /search
Query Parameters
Parameter | Default | Description |
---|---|---|
page | 1 | The page to retrieve when paginating through search results |
term | None | The search term to look for in the data catalog |
Path Parameters
There are no path parameters for this endpoint.
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
meta | Meta object | A meta object for pagination |
results | list[Search Results Object] | A list of search results |
Response Codes
Value | Description |
---|---|
200 | Retrieved the search results |
Batch Load Assets
You can make batch requests to load data stores, schemas and fields to save time with the API overhead that comes with requesting individual items.
Batch Load Response Object
The batch load result object
{
"data_stores": [
{"DATA_STORE_OBJECT"},
{"DATA_STORE_OBJECT"},
...
],
"data_schemas": [
{"DATA_SCHEMA_OBJECT"},
{"DATA_SCHEMA_OBJECT"},
...
],
"data_fields": [
{"DATA_FIELD_OBJECT"},
{"DATA_FIELD_OBJECT"},
...
]
}
The batch load API is intended to retrieve the information you need about data stores, schemas and fields when you already have the ID for the corresponding assets available. The API will always return the required parent assets for each data asset that you request. As an example, if you make a request to batch load three fields then the response will contain the following:
- The three data fields requested
- A list of schemas such that the schema for each of the fields above is contained in the response
- A list of data stores such that the data store for each of the schemas above is contained in the response
Consider the following example. The data assets in Tree Schema are as follows:
- Data Store:
DS_1
, Schema:SCHEMA_1
, Field:FIELD_1
- Data Store:
DS_1
, Schema:SCHEMA_2
, Field:FIELD_2
- Data Store:
DS_1
, Schema:SCHEMA_2
, Field:FIELD_3
- Data Store:
DS_2
, Schema:SCHEMA_3
, Field:FIELD_4
The request is made to Tree Schema to batch load FIELD_1
, FIELD_2
, FIELD_3
and SCHEMA_3
. In this example, the following assets would be returned:
- Data Stores:
DS_1
,DS_2
(becauseDS_2
is a parent ofSCHEMA_3
) - Schemas:
SCHEMA_1
,SCHEMA_2
,SCHEMA_3
- Fields:
FIELD_1
,FIELD_2
,FIELD_3
Batch Request Assets
Batch requests to Tree Schema
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/batch-assets'
assets = {
'assets': [
{'type': 'schema', 'id': 2},
{'type': 'data_store', 'id': 1},
{'type': 'field', 'id': 1}
]
}
resp = r.get(url, json=assets, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"assets": [{"type": "schema", "id": 2}, {"type": "data_store", "id": 1}, {"type": "field", "id": 1}]}' \
$BASE_URL/batch-assets
Batch request objects from Tree Schema
Returns the object:
{
"data_stores": [
{
"data_store_id": 1,
"name": "My API DS #3 AWS S3",
"type": "s3",
"other_type": null,
"created_ts": "2021-02-01 04:32:53",
"updated_ts": "2021-02-01 04:32:53",
"description_markup": null,
"description_raw": null,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "gramt@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Asher",
"email": "asher@treeschema.com"
},
"details": {}
}
],
"data_schemas": [
{
"data_schema_id": 1,
"name": "My API Schema #1",
"type": "csv",
"schema_loc": "My API Schema #1",
"created_ts": "2021-02-01 04:56:43",
"updated_ts": "2021-02-01 04:59:16",
"description_markup": "<p>This is an updated description</p>",
"description_raw": "This is an updated description",
"data_store_id": 1,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "gramt@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Asher",
"email": "asher@treeschema.com"
},
}
],
"data_fields": [
{
"field_id": 1,
"name": "my_field",
"parent_path": null,
"full_path_name": "my_field",
"type": "list",
"data_type": "array",
"data_format": "YYYY-MM-DD",
"nullable": true,
"created_ts": "2021-02-01 05:14:13",
"updated_ts": "2021-02-01 15:09:27",
"description_markup": "<p>--- NEW DESC ---</p>",
"description_raw": "--- NEW DESC ---",
"data_schema_id": 1,
"steward": {
"user_id": 1,
"name": "Grant",
"email": "gramt@treeschema.com"
},
"tech_poc": {
"user_id": 1,
"name": "Asher",
"email": "asher@treeschema.com"
},
}
]
}
HTTPs Request
POST /batch-assets
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
There are no path parameters for this endpoint.
Body
Field | Type | Description |
---|---|---|
assets | list[Batch Asset Request Object] | A batch request objects |
Batch Asset Request Object
Field | Type | Description |
---|---|---|
type | String | Must be of type data_store , schema or field |
id | Integer | The ID of the data asset |
Response
Field | Data Type | Description |
---|---|---|
data_stores | list[Data Store Object] | A list of data stores requested, or parents of other data assets requested |
data_schemas | list[Data Schema Object] | A list of data schemas requested, or parents of other data assets requested |
data_fields | list[Data Field Object] | A list of data fields requested |
Response Codes
Value | Description |
---|---|
200 | Retrieved the batch response |
400 | Invalid requests, errors will be provided in the body |
dbt
Tree Schema can process your dbt file in order to ingest your existing metadata into Tree Schema. The first step is to upload a manifest file to a data store, this is because all dbt processing occurs within a data store. To upload the file to a data store use the data store dbt endpoint.
Once the file has been uploaded you can retrieve the status of the parsing process and trigger the step to save the underlying results.
Get dbt Parse Status
Retrieve the status
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/dbt/parse-results'
params = {'dbt_process_id': dbt_process_id}
resp = r.get(url, params=params, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -H "Authorization: Basic $ENCODED_SECRET" \
$BASE_URL/dbt/parse-results?$DBT_PROCESS_ID
This status check uses the dbt_process_id
that is returned from the data store dbt endpoint.
Returns the object:
{
"dbt_process_id": "72c83743-cad1-48b4-9916-78bc02083772",
"found": true,
"status": "success",
"error_msg": null,
"dbt_schemas": [
{
"schema_name": "TS_SCHEMA1.cust_mkt_segment",
"schema_type": "view",
"schema_status": "exists"
},
{
"schema_name": "TS_SCHEMA1.cnt_segment",
"schema_type": "view",
"schema_status": "exists"
}
],
"dbt_lineage": [
{
"source_schema_name": "TS_SCHEMA1.cust_mkt_segment",
"target_schema_name": "TS_SCHEMA1.max_segment"
},
{
"source_schema_name": "TS_SCHEMA1.cust_mkt_segment",
"target_schema_name": "TS_SCHEMA1.segment_total"
}
]
}
HTTPs Request
GET /dbt/parse-results
Query Parameters
Parameter | Default | Description |
---|---|---|
dbt_process_id | None | The dbt process ID that was returned when starting the parsing process |
Path Parameters
There are no path parameters for this endpoint.
Body
There is no body for this endpoint.
Response
Field | Data Type | Description |
---|---|---|
dbt_process_id | string | The same dbt_process_id provided as part of the request |
found | boolean | Whether or not the dbt_process_id was found, since the file processing occurs asynchronously it is possible that the dbt_process_id will not be found if the request for the parsing status occurs immediately after the uploading of the file. |
status | string | The status of the upload, this will be waiting if found is False or if the file processing has not yet completed. It will be success if it has completed parsing successfully, error if an error occurred or processed if the data has already been saved. |
error_msg | string | The error that occurred during processing, only provided if the status is error , otherwise it is null |
dbt_schemas | List[Dict] | A list of schema objects found in the dbt manifest file. The name and type of schema will be provideda as well as whether or not the schema already exists in Tree Schema. |
dbt_lineage | List[Dict] | A list of lineage objects found in the dbt manifest file. |
Response Codes
Value | Description |
---|---|
200 | Retrieved the status results |
Save dbt Results
Save the results from parsing the dbt manifest file
import requests as r
BASE_URL = 'https://api.treeschema.com/catalog'
url = BASE_URL + '/dbt/save-results'
data = {
'dbt_process_id': dbt_process_id,
'add_schemas_fields': False,
'update_descriptions': True,
'update_tags': True,
'add_lineage': True
}
resp = r.post(url, json=data, headers=headers)
resp.json()
BASE_URL='https://api.treeschema.com/catalog'
curl -X POST -H "Authorization: Basic $ENCODED_SECRET" \
-H "Content-Type: application/json" \
-d '{"dbt_process_id": "$DBT_PROCESS_ID", "add_schemas_fields": false, "update_descriptions": true, "update_tags": true, "add_lineage": false}' \
$BASE_URL/dbt/save-results
Saves the results from the parsed manifest.json file using the options provided by the user. The user can choose to add the schemas and fields, update descriptions, update tags and add lineage from the manifest into Tree Schema. For more detials on these options see the dbt documentation in Tree Schema.
Returns the object:
{
"dbt_process_id": "b4000641-eed7-4345-9ba0-7701f77ce568"
}
HTTPs Request
GET /dbt/save-results
Query Parameters
There are no query parameters for this endpoint.
Path Parameters
There are no path parameters for this endpoint.
Body
There is no body for this endpoint.
Field | Required | Description |
---|---|---|
dbt_process_id | Yes | The dbt process ID that was returned when starting the parsing process |
add_schemas_fields | No | Whether or not to add schemas and fields if they do not exist within Tree Schema. Defaults to False as it is generally better to first allow Tree Schema to auto discover what exists. |
update_descriptions | No | Whether or not to update descriptions of the schemas and fields in Tree Schema. This only applies if the corresponding schemas and fields in the manifest file contain descriptions. Defaults to False as this could overwrite descriptions that have been updated in Tree Schema. It is generally good to use set this to True on the initial load to bootstrap your documentation. |
update_tags | No | Whether or not to update tags for the schemas and fields in Tree Schema. This only applies if the corresponding schemas and fields in the manifest file contain tags. Defaults to True . |
add_lineage | No | Whether or not to add the data lineage for the schemas and fields in Tree Schema. This only applies if the corresponding schemas and fields in the manifest file contain tags. Defaults to True . |
Response
Field | Data Type | Description |
---|---|---|
dbt_process_id | string | The same dbt_process_id provided as part of the request |
Response Codes
Value | Description |
---|---|
201 | Request successfully received |
Errors
These are common errors that apan across the Tree Schema API Requests.
Error Code | Meaning |
---|---|
400 | Bad Request -- Your request is invalid. |
401 | Unauthorized -- Your API key is wrong. |
403 | Forbidden -- The resource requested is hidden for administrators only. |
404 | Not Found -- The specified resource could not be found. |
406 | Not Acceptable -- You requested a format that isn't json. |
429 | Too Many Requests -- You've made too many requests! |
500 | Internal Server Error -- We had a problem with our server. Try again later. |
503 | Service Unavailable -- We're temporarily offline for maintenance. Please try again later. |