Quickstart

This section explains how to upload data to Aito and send your first query with either CLI or Python SDK.

Essentially, uploading data into Aito can be broken down into the following steps:

  1. Infer a Table Schema cli | sdk

  2. Change the inferred schema if needed cli | sdk

  3. Create a table cli | sdk

  4. Convert the data cli | sdk

  5. Upload the data cli | sdk

  6. Send a query to an Aito Endpoint cli | sdk

Note

Skip steps 1, 2, and 3 if you upload data to an existing table Skip step 4 if you already have the data in the appropriate format for uploading or the data matches the table schema

If you don’t have a data file, you can download our example file and follow the guide.

Upload data and send your first query with the CLI

Setup Aito credentials

The easiest way to set-up the credentials is by configure command:

$ aito configure

Note

You can use the Quick Add Table Operation instead of doing upload step-by-step if you want to upload to a new table and don’t think you need to adjust the inferred schema.

The CLI supports all steps needed to upload data:

Infer a Table Schema

For examples, infer a table schema from a csv file:

$ aito infer-table-schema csv < path/to/myCSVFile.csv > path/to/inferredSchema.json

Change the Schema

You might want to change the ColumnType, e.g: The id column should be of type String instead of Int, or add an Analyzer to a Text column. In that case, just make changes to the inferred schema JSON file.

The example below use jq to change the id column type:

$ jq '.columns.id.type = "String"' < path/to/schemaFile.json > path/to/updatedSchemaFile.json

Create a Table

You need a table name and a table schema to create a table:

$ aito database create-table tableName path/to/tableSchema.json

Convert the Data

If you made changes to the inferred schema or have an existing schema, use the schema when with the -s flag to make sure that the converted data matches the schema:

$ aito convert csv -s path/to/updatedSchema.json path/to/myCSVFile.csv > path/to/myConvertedFile.ndjson

You can either convert the data to:

  • A list of entries in JSON format for Batch Upload:

    $ aito convert csv --json path/to/myCSVFile.csv > path/to/myConvertedFile.json
    
  • A NDJSON file for File Upload:

    $ aito convert csv < path/to/myFile.csv > path/to/myConvertedFile.ndjson
    

    Remember to gzip the NDJSON file:

    $ gzip path/to/myConvertedFile.ndjson
    

Upload the Data

You can upload the data by either:

Send your first query

You can send a query to an Aito endpoint by:

$ aito <endpoint> <query>

For example:

$ aito search '{"from": "products"}'
$ aito predict '{"from": "products", "where": {"name": {"$match": "rye bread"}}, "predict": "tags"}'

Upload data and send your first query with the SDK

The Aito Python SDK uses Pandas DataFrame for multiple operations.

The example below shows how you can load a csv file into a DataFrame, please read the official pandas guide for further instructions. You can download an example csv file reddit_sample.csv here and run the code below:

import pandas
reddit_df = pandas.read_csv("reddit_sample.csv")

Infer a table schema

You can infer a AitoTableSchema from a Pandas DataFrame:

from aito.schema import AitoTableSchema
from pprint import pprint
reddit_schema = AitoTableSchema.infer_from_pandas_data_frame(reddit_df)
print(reddit_schema.to_json_string(indent=2, sort_keys=True))
{
  "columns": {
    "author": {
      "nullable": false,
      "type": "String"
    },
    "comment": {
      "analyzer": {
        "customKeyWords": [],
        "customStopWords": [],
        "language": "english",
        "type": "language",
        "useDefaultStopWords": false
      },
      "nullable": false,
      "type": "Text"
    },
    "created_utc": {
      "analyzer": {
        "delimiter": ":",
        "trimWhitespace": true,
        "type": "delimiter"
      },
      "nullable": false,
      "type": "Text"
    },
    "date": {
      "analyzer": {
        "delimiter": "-",
        "trimWhitespace": true,
        "type": "delimiter"
      },
      "nullable": false,
      "type": "Text"
    },
    "downs": {
      "nullable": false,
      "type": "Int"
    },
    "label": {
      "nullable": false,
      "type": "Int"
    },
    "parent_comment": {
      "analyzer": {
        "customKeyWords": [],
        "customStopWords": [],
        "language": "english",
        "type": "language",
        "useDefaultStopWords": false
      },
      "nullable": false,
      "type": "Text"
    },
    "score": {
      "nullable": false,
      "type": "Int"
    },
    "subreddit": {
      "nullable": false,
      "type": "String"
    },
    "ups": {
      "nullable": false,
      "type": "Int"
    }
  },
  "type": "table"
}

Change the Schema

You might want to change the ColumnType, e.g: The id column should be of type String instead of Int, or add a Analyzer to a Text column.

You can access and update the column schema by using the column name as the key:

from aito.schema import AitoStringType, AitoTokenNgramAnalyzerSchema, AitoAliasAnalyzerSchema

# Change the label type to String instead of Int
reddit_schema['label'].data_type = AitoStringType()

# Change the analyzer of the `comments` column
reddit_schema['comment'].analyzer = AitoTokenNgramAnalyzerSchema(
  source=AitoAliasAnalyzerSchema('en'),
  min_gram=1,
  max_gram=3
)

Create a table

You can create_table() using an AitoClient and specifying the table name and the table schema

Note

The example is not direclty copy-pastable. Please use your own Aito environment credentials

from aito.client import AitoClient
from aito.api import create_table
aito_client = AitoClient(instance_url=YOUR_AITO_INSTANCE_URL, api_key=YOUR_AITO_INSTANCE_API_KEY)
create_table(client=aito_client, table_name='reddit', schema=reddit_schema)

Convert the Data

The DataFrameHandler can convert a DataFrame to match an existing schema:

from aito.utils.data_frame_handler import DataFrameHandler
data_frame_handler = DataFrameHandler()
converted_reddit_df = data_frame_handler.convert_df_using_aito_table_schema(
  df=reddit_df,
  table_schema=reddit_schema
)

A DataFrame can be converted to:

  • A list of entries in JSON format for Batch Upload:

    reddit_entries = converted_reddit_df.to_dict(orient="records")
    
  • A gzipped NDJSON file for File Upload using the DataFrameHandler:

    data_frame_handler.df_to_format(
      df=converted_reddit_df,
      out_format='ndjson',
      write_output='reddit_sample.ndjson.gz',
      convert_options={'compression': 'gzip'}
    )
    

Upload the Data

You can upload_entries() using an AitoClient

  • Batch Upload:

    from aito.api import upload_entries
    upload_entries(aito_client, table_name='reddit', entries=reddit_entries)
    
  • File Upload:

    from pathlib import Path
    from aito.api import upload_file, get_table_size
    
    upload_file(aito_client, table_name='reddit', file_path=Path('reddit_sample.ndjson.gz'))
    
    # Check that the data has been uploaded
    print(get_table_size(aito_client, 'reddit'))
    
    10000
    

The Batch Upload can also be done using a generator:

def entries_generator(start, end):
  for idx in range(start, end):
    entry = {'id': idx}
    yield entry

upload_entries(
  aito_client,
  table_name="table_name",
  entries=entries_generator(start=0, end=4),
  batch_size=2,
  optimize_on_finished=False
)

Send your first query

You can send a query to an Aito endpoint by using the AitoClient method:

from aito.client import AitoClient
from aito.api import search, predict
aito_client = AitoClient(instance_url=INSTANCE_URL, api_key=INSTANCE_API_KEY)
search(client=aito_client, query={
  "from": "products",
  "where": {"name": {"$match": "rye bread"}}
})

predict(client=aito_client, query={
  "from": "products",
  "where": {"name": "rye bread"},
  "predict": "tags"
})