aito.schema.AitoTableSchema

class aito.schema.AitoTableSchema(columns: Dict[str, aito.schema.AitoColumnTypeSchema])

Bases: aito.schema.AitoSchema

Aito Table schema contains the columns and their schema

Can be thought of as a dict-like container for AitoColumnTypeSchema objects

Infer AitoTableSchema from a Pandas DataFrame

>>> import pandas as pd
>>> df = pd.DataFrame(data={'id': [1, 2], 'name': ['Neil', 'Buzz']})
>>> table_schema = AitoTableSchema.infer_from_pandas_data_frame(df)
>>> print(table_schema.to_json_string(indent=2, sort_keys=True))
{
  "columns": {
    "id": {
      "nullable": false,
      "type": "Int"
    },
    "name": {
      "nullable": false,
      "type": "String"
    }
  },
  "type": "table"
}
>>> print(table_schema['name'])
{"type": "String", "nullable": false}

change the property of a column

>>> table_schema['name'].nullable = True
>>> print(table_schema['name'])
{"type": "String", "nullable": true}

add a column to the table schema

>>> table_schema['description'] = AitoColumnTypeSchema(AitoTextType(), nullable=True)
>>> print(table_schema.to_json_string(indent=2, sort_keys=True))
{
  "columns": {
    "description": {
      "nullable": true,
      "type": "Text"
    },
    "id": {
      "nullable": false,
      "type": "Int"
    },
    "name": {
      "nullable": true,
      "type": "String"
    }
  },
  "type": "table"
}

delete a column in the table schema

>>> del table_schema['description']
>>> table_schema.columns
['id', 'name']

check if a column exist in the table schema

>>> 'id' in table_schema
True

iterate over the table schema >>> for col in table_schema: … table_schema[col].nullable = False

Parameters

columns (Dict[str, AitoColumnTypeSchema]) – a dictionary of the table’s columns’ name and schema

Methods

from_deserialized_object(obj)

create a class object from a JSON deserialized object

from_json_string(json_string, **kwargs)

create an class object from a JSON string

has_column(column_name)

check if the table has the specified column

infer_from_pandas_data_frame(df[, …])

Infer a TableSchema from a Pandas DataFrame

json_schema()

the JSON schema of the class

json_schema_validate(obj)

Validate an object with the class json_schema Returns the object if validation success, else raise JsonValidationError

json_schema_validate_with_schema(obj, schema)

Validate an object with the given schema

to_json_serializable()

convert the object to an object that can be serialized to a JSON formatted string

to_json_string(**kwargs)

convert the object to a JSON string

Attributes

column_link_pattern

column_name_pattern

columns

list of the table’s columns name

columns_schemas

a dictionary contains the names of the table columns and its corresponding schemas

comparison_properties

properties of the schema object that will be used for comparison operation

links

a dictionary contains the names of the table columns and its corresponding link

table_name_pattern

type

the type of the schema component

uuid_pattern

property columns

list of the table’s columns name

Return type

List[str]

property columns_schemas

a dictionary contains the names of the table columns and its corresponding schemas

Return type

Dict[str, AitoColumnTypeSchema]

property comparison_properties

properties of the schema object that will be used for comparison operation

Return type

Iterable[str]

classmethod from_deserialized_object(obj)

create a class object from a JSON deserialized object

classmethod from_json_string(json_string: str, **kwargs)

create an class object from a JSON string

Parameters
  • json_string (str) – the JSON string

  • kwargs – the keyword arguments for json.loads method

has_column(column_name: str) → bool

check if the table has the specified column

Parameters

column_name (str) – the name of the column

Returns

true if the table has the specified column

Return type

bool

classmethod infer_from_pandas_data_frame(df: pandas.DataFrame, max_sample_size: int = 100000)aito.schema.AitoTableSchema

Infer a TableSchema from a Pandas DataFrame

Parameters
  • df (pd.DataFrame) – input Pandas DataFrame

  • max_sample_size (int, optional) – maximum number of rows that will be used for inference, defaults to 100000

Raises

Exception – an error occurred during column type inference

Returns

inferred table schema

Return type

Dict

classmethod json_schema()

the JSON schema of the class

Return type

Dict

classmethod json_schema_validate(obj: Any)

Validate an object with the class json_schema Returns the object if validation success, else raise JsonValidationError

Parameters

obj (Any) – the object to be validated

Returns

the object if validation succeed

Return type

Any

json_schema_validate_with_schema(obj: Any, schema: Dict)

Validate an object with the given schema

Parameters
  • obj (Any) – the object to be validated

  • schema (Dict) – the schema to be validate against

Returns

the object if validation succeed

Return type

Any

a dictionary contains the names of the table columns and its corresponding link

Return type

Dict[str, AitoColumnLinkSchema]

to_json_serializable()

convert the object to an object that can be serialized to a JSON formatted string

to_json_string(**kwargs)

convert the object to a JSON string

Parameters

kwargs – the keyword arguments for json.dumps method

Return type

str

property type

the type of the schema component

Return type

str