treeschema.catalog.transformation

class Transformation(transformation_inputs: [<class 'int'>, typing.Dict], *args, **kwargs)

Bases: treeschema.catalog.base_serializer.TreeSchemaSerializer

An object that represents a single data store.

Create a data store object with either the ID of a data store or the fully defined data store object as a dictionary.

Parameters
  • id – the ID for the data store

  • inputs – a dictionary of inputs that can

fully serialize a data store

add_tags(tags: [<class 'str'>, typing.List[str]]) → Dict

Adds one or more tags to the transformation

Parameters

tags – a list of tags or a single tag can also be passed

Returns

the API response

>>> my_transformation = ts.transformation('my transformation')
>>> my_transformation.add_tags(['new_tag', 'a second new tag'])
>>> my_transformation.add_tags('single tag')
check_breaking_change(link_state: [typing.Dict, typing.List[typing.Dict], typing.Tuple[treeschema.catalog.data_field.DataField, treeschema.catalog.data_field.DataField], typing.List[typing.Tuple[treeschema.catalog.data_field.DataField, treeschema.catalog.data_field.DataField]]], max_depth: int = 5)

Checks to see if the link_state provided will cause a breaking change. The link_state is compared to the existing links in Tree Schema in order to see which links have been removed. When a link is removed, the data assets downstream from that link are considered to be broken.

For example, consider the following group of connected data assets (data moves from left to right):

A – B – C – D

/

E ——

In this example, passing in the values for link_state:

[(B,C), (C,D), (E,C)]

Would result with the link (A,B) being removed. The data assets that would be considered broken are C & D, becuase those are the downstream assets from the removed link. The data asset B is not considered broken becuase Tree Schema assumes that the user removing the link to asset B will also be removing the underlying data dependency. Similarly, data asset E is not broken because it not dependent on the (A,B) link that was removed.

Creates a TransformationLink between two DataField`s. The `TransformationLink is the building block for data lineage and describes how data moves from one schema to another.

Parameters

links – a single dictionary containing data to create or retrieve a link or a list of dictionaries

>>> src_schema = ts.data_store('my 1st data store').schema('my.schema1')
>>> tgt_schema = ts.data_store('another data store').schema('schema.num2')
>>>
>>> t = ts.transformation('my transform')
>>> transform_links = [
>>>     (src_schema_1.field('field_1'), tgt_schema_1.field('target_field'))
>>> ]
>>> t.create_links(transform_links) 

Deletes (deprecates) a transformation link, or list of links, from a transformation.

Parameters

remove_links – a single link or a list of link (these can be the link ID or a list of TransformationLink objects)

Returns

True if the links are deprecated

>>> my_transformation = ts.transformation('my transform')
>>> delete_link1 = my_transformation.link(1)
>>> delete_link2 = my_transformation.link(2)
>>> my_transformation.delete_links([delete_link1, delete_link2])
True

Retrieves all transformation links for the transformation. After this is called for the first time the links are cached locally.

Parameters

refresh – Default False, if True, will force all links to be retrieved from Tree Schema and not the local cache

Returns

a list of TransformationLink objects that belong to this transformation

Creates or retrieves a transformation link object, inputs can be an integer (for the transformation link ID) or a dictionary of values used to create the link

Parameters
  • link_inputs – the inputs used to create or retrieve the link

  • refresh – whether or not to force a refresh from the database, the default is False

  • raise_if_not_exist – default is False, if True will raise a treescheam.exceptions.DataAssetDoesNotExist exception if the link does not exists, when False None is returned for link that do not exist

Returns

a Transformation Link object

>>> t = ts.transformation('my transform')
>>> link1 = t.link(1)
>>> link2 = t.link({'source_field_id': 1, 'target_field_id': 2})

A dictionary of links that have been retrieved from Tree Schema. This will not have all of the links for a given transformation until get_links() is called to fetch the existing links.

remove_tags(tags: [<class 'str'>, typing.List[str]]) → Dict

Removes one or more tags from the transformation

Parameters

tags – a list of tags, a single tag can also be passed

Returns

the API response

>>> my_transformation = ts.transformation('my transformation')
>>> my_transformation.remove_tags(['new_tag', 'a second new tag'])
>>> my_transformation.remove_tags('single tag')

Sets the current state of the traansformation to have exactly the links provided as input. Any exiting links that are not provided in the input but exist within the transformation will be deprecated and any new links provided that do not exist within the transformation will be created.

Parameters

links – a single dictionary containing data to create or retrieve a link or a list of dictionaries

>>> src_schema = ts.data_store('my 1st data store').schema('my.schema1')
>>> tgt_schema = ts.data_store('another data store').schema('schema.num2')
>>>
>>> t = ts.transformation('my transform')
>>> transform_links = [
>>>     (src_schema_1.field('field_1'), tgt_schema_1.field('target_field'))
>>> ]
>>> t.set_links_state(transform_links)
property tags

Retrieves the tags for a given data store. If the tags have not already been retrieved for the data store then the existing tags are fetched from Tree Schema

>>> my_data_store = ts.data_store('my data store')
>>> my_data_store.tags
    # ['tag_1', 'tag_2']