Aito SDK

Quickstart guide to upload data

Load a Data File to Pandas DataFrame

The Aito Python SDK uses Pandas DataFrame for multiple operations.

The example below shows how you can load a csv file into a DataFrame, please read the official pandas guide for further instructions. You can download an example csv file reddit_sample.csv here and run the code below:

import pandas as pd

reddit_df = pd.read_csv('reddit_sample.csv')

Infer a Table Schema

An Aito table schema describes how the table should be constructed and processed internally. You can read more about the Aito schema here

The AitoTableSchema can be inferred from a Pandas DataFrame. The example below assumes that you already have a DataFrame named reddit_df from Load a Data File to Pandas DataFrame from

from aito.schema import AitoTableSchema, AitoStringType, AitoTokenNgramAnalyzerSchema, AitoAliasAnalyzerSchema
reddit_schema = AitoTableSchema.infer_from_pandas_data_frame(reddit_df)

# Feel free to change the schema as you see fit. For example:
# Change the data type  of the `label` column  to `String` instead of `Int`
reddit_schema['label'].data_type = AitoStringType()
# Change the analyzer of the `comments` column
reddit_schema['comments'].analyzer = AitoTokenNgramAnalyzerSchema(
  source=AitoAliasAnalyzerSchema('en'),
  min_gram=1,
  max_gram=3
)

Create a Table

You can create a table after you have the table schema with the AitoClient.

Your AitoClient must be set up with the READ-WRITE API key

The example below assumes that you already have a table_schema named reddit_schema from Infer a Table Schema.

from aito.client import AitoClient
aito_client = AitoClient(instance_url="your_aito_instance_url", api_key="your_rw_api_key")
aito_client.create_table(table_name='reddit', table_schema=reddit_schema)

# Check your table schema in Aito
aito_client.get_table_schema(table_name=table_name)
  • To create a database schema:

    # Aito DB schema example
    from aito.schema import AitoDatabaseSchema
    database_schema = AitoDatabaseSchema(tables={'reddit': reddit_schema})
    aito_client.create_database(database_schema=database_schema)
    
    # Check your DB schema in Aito
    aito_client.get_database_schema()
    

Upload Data

You can upload data to a table with the AitoClient.

Your AitoClient must be set up with the READ-WRITE API key

from aito.client import AitoClient
aito_client = AitoClient(instance_url="your_aito_instance_url", api_key="your_rw_api_key")
  • Upload a list of table entries with upload_entries()

    entries = [
      {
        'label': 0,
        'comment': 'it was.',
        'author': 'renden123',
        'subreddit': 'CFB',
        'score': 4,
        'ups': -1,
        'downs': -1,
        'date': '2016-11',
        'created_utc': '2016-11-22 21:32:03',
        'parent_comment': "Wasn't it 2010?"
      }
    ]
    aito_client.upload_entries(table_name='reddit', entries=entries)
    
  • Upload a Pandas DataFrame

    # convert DataFrame to list of entries
    entries = df.to_dict(orient="records")
    aito_client.upload_entries(table_name='reddit', entries=entries)
    
  • Upload a gzipped ndjson file with upload_file()

    aito_client.upload_file(table_name='table_name', file_path=file_path)
    
  • Upload using generator

    def entries_generator(start, end):
        for idx in range(start, end):
            entry = {'id': idx}
            yield entry
    
    aito_client.upload_entries(
        table_name="table_name",
        entries=entries_generator(start=0, end=4),
        batch_size=2,
        optimize_on_finished=False
    )
    

Delete data

You can delete data with the AitoClient.

Your AitoClient must be set up with the READ-WRITE API key

Execute Queries

You can execute queries with the AitoClient.

Your AitoClient can be set up with the READ-ONLY API key

Request to an endpoint

The example below show how you could send a predict query to Aito:

aito_client.request(
  method='POST',
  endpoint='/api/v1/_predict',
  query={
    'from': 'invoice',
    'where': {
      'description': 'a very long invoice description'
    },
    'predict': 'sales_rep'
  }
)

Query a Table Entries

# query the first 10 entries of a table
aito_client.query_entries(table_name='table_name')

Executing multiple queries asynchronously

# predict with different descriptions

descriptions = ['first description', 'second description', 'third description']

responses = aito_client.async_requests(
  methods=['POST'] * len(descriptions),
  endpoints=['/api/v1/_predict'] * len(descriptions),
  queries=[
    {
      'from': 'invoice',
      'where': {
        'description': desc
      },
      'predict': 'sales_rep'
    }
    for desc in descriptions
  ]
)

Sending a job request for query that takes longer than 30 seconds

Some queries might take longer than 30 seconds to run (e.g: Evaluate). You can use the job request for these queries. For example:

response = aito_client.job_request(
  job_endpoint='/api/v1/jobs/_evaluate',
  query={
    "test": {
      "$index": {
        "$mod": [4, 0]
      }
    },
    "evaluate": {
      "from": "invoice",
      "where": {
        "description": { "$get": "description" }
      },
      "predict": "sales_rep"
    }
  }
)