treeschema.catalog.data_schema

class DataSchema(data_schema_inputs: [<class 'int'>, <class 'str'>, typing.Dict], data_store_id: int, *args, **kwargs)

Bases: treeschema.catalog.base_serializer.TreeSchemaSerializer

An object that represents a single data schema.

Create a data schema object with either the ID of a data schema, the name of the data schema, or the fully defined data schema object as a dictionary.

Parameters
  • data_schema_inputs – the inputs to create or retrieve the data schema

  • data_store_id – The ID of the data store that this schema belongs to

add_tags(tags: List[str]) → Dict

Adds one or more tags to the data schema

Parameters

tags – a list of tags, a single tag can also be passed

Returns

the API response

>>> my_schema = ts.data_store('my data store').schema('some schema')
>>> my_schema.add_tags(['new_tag', 'a second new tag'])
>>> my_schema.add_tags('single tag')
delete_fields(remove_fields: [typing.List[int], <class 'int'>, typing.List[treeschema.catalog.data_field.DataField], <class 'treeschema.catalog.data_field.DataField'>]) → bool

Deletes (deprecates) a single field or list of field from the data schema.

Parameters

remove_fields – The fields to remove, can be passed as the field ID or a DataField object. Values being passed can be a single field or a list of fields

Returns

True if the fields are deprecated

>>> my_schema = ts.data_store('my data store').schema('some schema')
>>> delete_field = my_schema.field('some_field')
>>> my_schema.delete_fields(delete_field)
True
field(field_inputs: [<class 'int'>, typing.Dict], refresh: bool = False, pre_fetch: bool = True, raise_if_not_exist: bool = False)

Creates or retrieves a field object, Inputs can be an integer (for the field ID), a string (for the field name), or a dictionary of values used to create the field

Parameters
  • field_inputs – the inputs used to create or retrieve the field

  • refresh – whether or not to force a refresh from the database, the default is False

  • pre_fetch – whether or not to pre-fetch all of the fields for this schema during the initial load. This should primiarly be used when the inputs are a dictionary and you have already batch-retrieved the data assets required. Note - you do have the option to not pre-fetch and then request a pre-fetch later.

  • raise_if_not_exist – default is False, if True will raise a treescheam.exceptions.DataAssetDoesNotExist exception if the field does not exists, when False None is returned for fields that do not exist

Returns

a Data Field object

>>> my_schema = ts.data_store('my data store').schema('some schema')
>>> field_1 = my_schema.field(1)
>>> field_2 = my_schema.field('second_field')
>>> field_inputs = {
>>>     'name': 'new_field', 'type': 'scalar', 'data_type': 'number',
>>>     'data_format': 'bigint', 'description': 'My python description'
>>> }
>>> field_3 = my_schema.field(field_inputs)

It is possible to create a data field by passing in a native python type for the type, data_type and data_format inputs, however, only the type field is required. For example, a field can be created as:

>>> field_inputs = {
>>>     'name': 'new_field', 'type': str, 'data_type': str, 'data_format': str
>>> }
>>> my_schema.field(field_inputs)

Or as little as just the name and type

>>> my_schema.field({'name': 'new_field', 'type': float})

The fields inputs managed by the API, all required fields for data fields REST can be found in BODY of the the API to Create a Field this Python client only requires name and type IF the type is a native Python type (e.g. str, float, int, bool, bytes, list or dict) it will try to infer the values of the remaining fields from these native types.

property fields
get_fields(refresh: bool = False) → List[treeschema.catalog.data_field.DataField]

Retrieves all fields from the data schema. After this is called for the first time the fields are cached locally.

Parameters

refresh – Default False, if True, will force all fields to be retrieved from Tree Schema and not the local cache

Returns

a list of DataField objects that belong to this schema

remove_tags(tags: [<class 'str'>, typing.List[str]]) → Dict

Removes one or more tags from the data schema

Parameters

tags – a list of tags, a single tag can also be passed

Returns

the API response

>>> my_schema = ts.data_store('my data store').schema('some schema')
>>> my_schema.remove_tags(['new_tag'])
>>> my_schema.remove_tags('single tag')
property tags

Retrieves the tags for a given data store. If the tags have not already been retrieved for the data store then the existing tags are fetched from Tree Schema

>>> my_data_store = ts.data_store('my data store')
>>> my_data_store.tags
    # ['tag_1', 'tag_2']
update(*, _type: str = None, description: str = None, tech_poc: [<class 'int'>, <class 'treeschema.catalog.user.TreeSchemaUser'>] = None, steward: [<class 'int'>, <class 'treeschema.catalog.user.TreeSchemaUser'>] = None)

Update an existing schema. Only keyword arguments can be provided, positional arguments are not allowed.

Parameters
  • _type – the type of schema, must be a Tree Schema field type

  • description – The description for the schema

  • tech_poc – The technical point of contact

  • steward – The data steward

Returns

a DataSchema, an updated version of itself

>>> # As few as one argument can be provided or all 4 can be provided at once
>>> schema.update(
        description='This is a new description - updated!',
        steward=1,
        tech_poc=1,
        _type='avro'
    )