treeschema.catalog.lineage

class ImpactedAsset(raw_asset: Dict[Any, Any])

Bases: object

A single impacted object. Generated from a raw asset, which is a dictionary that contains the IDs required to build a data asset. These data assets will be serialized once the LineageImpact has batch retrieved all of the assets.

Each raw asset may have an impact chain that depicts the broken data lineage that leads to the given asset. For example, if this ImpactedAsset is for field E and the impact chain is [B, C, D] then the full broken lineage will be B -> C -> D -> E.

Note - in order to create an impacted asset you must first instantiate a TreeSchema() object

Parameters

raw_asset – A dictionary of key/value pairs that points to a unique data asset within Tree Schema. An impact chain may be provided that depicts the broken lineage for this asset

>>> from treeschema import TreeSchema
>>> from treeschema.catalog.lineage import ImpactedAsset
>>> ts = TreeSchema('<your_email>', '<your_secret_key>')
>>> asset = {
        'data_store_id': 1,
        'schema_id': 1,
        'field_id': 1,
        'impact_chain': [
            {
                'data_store_id': 1,
                'schema_id': 1,
                'field_id': 1,
            }
        ]
    }
>>> ia = ImpactedAsset(asset)
>>> # After created, Impacted Assets should be serialized
>>> ia.try_serialize_self()
>>> ia
    ImpactedAsset(Data Store: Ds1 (1), Schema: DS1 (1), , Field: field_1a (1))
pretty_print_impact(show_by='field') → str

Generates a list of pretty print strings that can be used to show a visual of the impact chain that led to the broken asset.

Parameters

show_by – Allowed values are: field or schema. If field is provided then the impact will be shown at the field level. If schema is provided then the impact will be shown at the schema level. This may be beneficial to see a higher level view in the event that the number of breaking changes for the fields is high and it creates clutter.

Returns

a string of the pretty printed impact

>>> print(impacted_asset.pretty_print_impact(show_by='schema'))
    Data Store: Ds1 (1), Schema: ds2 (2), 
        └-->Data Store: Ds1 (1), Schema: ds3 (3), 
            └-->Data Store: Ds1 (1), Schema: ds4 (4) 
pretty_print_string(show_by: str = 'field') → str

Creates a pretty printed string for this impacted asset.

Parameters

show_byfield, schema. If field is provided then the impact will be shown at the field level. If schema is provided then the impact will be shown at the schema level. This may be beneficial to see a higher level view in the event that the number of breaking changes for the fields is high and it creates clutter.

>>> print(impacted_asset.pretty_print_string(show_by='schema'))
    Data Store: Ds1 (1), Schema: ds4 (4)
try_serialize_self()

Serializes the response into Tree Schema objects.

class LineageImpact(lineage_impact: Dict[Any, Any])

Bases: object

Represents the impact to data lineage that may occur from a breaking change.

Converts the response from the check-breaking-changes endpoint into TreeSchema serialized objects. Lineage impacts allow you to see if there is an expected breaking change as well as the full lineage from the original broken chain through each impacted asset.

>>> from treeschema import TreeSchema
>>> from treeschema.catalog.lineage import LineageImpact
>>> ts = TreeSchema('<email>', '<ts_secret_key>')   
>>> impact = {
        "breaking": True,
        "impact_summary": {
            "fields": 1,
            "schemas": 1,
            "data_stores": 1
        },
        "impacted_assets": [
            {
                "field_id": 4,
                "schema_id": 4,
                "data_store_id": 1,
                "impact_chain": [
                    {
                        "field_id": 2,
                        "schema_id": 2,
                        "data_store_id": 1
                    },
                ]
            }
        ]
    }
>>> li = LineageImpact(impact)
>>> li.breaking
    True
Parameters

lineage_impact – a dictionary of values

all_impact_strings(show_by: str = 'field', show: int = 25) → str

Creates a string that includes all of the full lineage impact for each impacted asset. For example:

>>> Data Store: Ds1 (1), Schema: ds2 (2), Field: field_1b (2)
    └-->Data Store: Ds1 (1), Schema: ds3 (3), Field: field_1c (3)
        └-->Data Store: Ds1 (1), Schema: ds4 (4), Field: field_1d (4)
Parameters
  • show_by – possible values: field, schema. If field is provided then the impact will be shown at the field level. If schema is provided then the impact will be shown at the schema level. This may be beneficial to see a higher level view in the event that the number of breaking changes for the fields is high and it creates clutter.

  • show – The number of assets to show at one time. Default is 25.

>>> ts = TreeSchema('<email>', '<ts_secret_key>')   
>>> t = ts.transformation(1)
>>> links = [ 
        ({'source_field_id': 2, 'target_field_id': 3}),
        ({'source_field_id': 3, 'target_field_id': 4}),
        ({'source_field_id': 4, 'target_field_id': 5})
    ]
>>> li = t.check_breaking_change(link_state=links)
>>> printed_impacts = li.all_impact_strings(show=50)
>>> print(printed_impacts)
>>> # Lineage for Each Breaking Change
    # --------------------------------
    #
    # Data Store: Ds1 (1), Schema: ds2 (2), Field: field_1b (2)
    #     └-->Data Store: Ds1 (1), Schema: ds3 (3), Field: field_1c (3)
    #
    # -----
    # 
    # Data Store: Ds1 (1), Schema: ds2 (2), Field: field_1b (2)
    #     └-->Data Store: Ds1 (1), Schema: ds3 (3), Field: field_1c (3)
    #         └-->Data Store: Ds1 (1), Schema: ds4 (4), Field: field_1d (4)
class LineageImpactSummary(impact_summary: Dict[Any, Any])

Bases: object

Holds the summary for the data assets impacted

A summary of the total number of impacted assets

Parameters

impact_summary – A dictionary containing the summary output from the data lineage impact. This should have three keys: data_stores, schemas, and fields. The values for each will be integers representing the total number of unique assets of the given type that are impacted.

>>> impacts = {'data_stores': 1, 'schemas': 2, 'fields': 5}
>>> LineageImpactSummary(impacts)
    LineageImpactSummary(Data Stores: 1, Schemas: 2, Fields: 5)