aito.schema.AitoLanguageAnalyzerSchema

class aito.schema.AitoLanguageAnalyzerSchema(language: str, use_default_stop_words: bool = None, custom_stop_words: List[str] = None, custom_key_words: List[str] = None)

Bases: aito.schema.AitoAnalyzerSchema

Aito LanguageAnalyzer schema

Parameters
  • language (str) – the name or the ISO code of the language

  • use_default_stop_words (bool, defaults to False) – filter the language default stop words

  • custom_stop_words (List[str], defaults to []) – words that will be filtered

  • custom_key_words (List[str], defaults to []) – words that will not be featurized

Methods

from_deserialized_object(obj)

create a class object from a JSON deserialized object

from_json_string(json_string, **kwargs)

create an class object from a JSON string

infer_from_samples(samples[, max_sample_size])

Infer an analyzer from the given samples

json_schema()

the JSON schema of the class

json_schema_validate(obj)

Validate an object with the class json_schema Returns the object if validation success, else raise JsonValidationError

json_schema_validate_with_schema(obj, schema)

Validate an object with the given schema

to_json_serializable()

convert the object to an object that can be serialized to a JSON formatted string

to_json_string(**kwargs)

convert the object to a JSON string

Attributes

analyzer_type

the type of the analyzer

column_link_pattern

column_name_pattern

comparison_properties

properties of the schema object that will be used for comparison operation

custom_key_words

list of words that will not be featurized

custom_stop_words

list of words that will be filtered

language

the language of the analyzer

table_name_pattern

type

the type of the schema component

use_default_stop_words

filter the language default stop words

uuid_pattern

property analyzer_type

the type of the analyzer

Return type

str

property comparison_properties

properties of the schema object that will be used for comparison operation

Return type

Iterable[str]

property custom_key_words

list of words that will not be featurized

Return type

List[str]

property custom_stop_words

list of words that will be filtered

Return type

List[str]

classmethod from_deserialized_object(obj: Dict)

create a class object from a JSON deserialized object

classmethod from_json_string(json_string: str, **kwargs)

create an class object from a JSON string

Parameters
  • json_string (str) – the JSON string

  • kwargs – the keyword arguments for json.loads method

classmethod infer_from_samples(samples: Iterable[str], max_sample_size: int = 10000)

Infer an analyzer from the given samples

Parameters
  • samples (Iterable) – iterable of sample

  • max_sample_size (int) – at most first max_sample_size will be used for inference, defaults to 10000

Returns

inferred Analyzer or None if no analyzer is applicable

Return type

Optional[AitoAnalyzerSchema]

classmethod json_schema()

the JSON schema of the class

Return type

Dict

classmethod json_schema_validate(obj: Any)

Validate an object with the class json_schema Returns the object if validation success, else raise JsonValidationError

Parameters

obj (Any) – the object to be validated

Returns

the object if validation succeed

Return type

Any

json_schema_validate_with_schema(obj: Any, schema: Dict)

Validate an object with the given schema

Parameters
  • obj (Any) – the object to be validated

  • schema (Dict) – the schema to be validate against

Returns

the object if validation succeed

Return type

Any

property language

the language of the analyzer

Return type

str

to_json_serializable() → Dict

convert the object to an object that can be serialized to a JSON formatted string

to_json_string(**kwargs)

convert the object to a JSON string

Parameters

kwargs – the keyword arguments for json.dumps method

Return type

str

property type

the type of the schema component

Return type

str

property use_default_stop_words

filter the language default stop words

Return type

bool