Quickstart

This section explains how to upload data to Aito with either CLI or Python SDK.

Essentially, uploading data into Aito can be broken down into the following steps:

  1. Infer a Table Schema cli | sdk

  2. Change the inferred schema if needed cli | sdk

  3. Create a table cli | sdk

  4. Convert the data cli | sdk

  5. Upload the data cli | sdk

Note

Skip steps 1, 2, and 3 if you upload data to an existing table Skip step 4 if you already have the data in the appropriate format for uploading or the data matches the table schema

If you don’t have a data file, you can download our example file and follow the guide.

Upload Data with the CLI

Setup Aito credentials

The easiest way to set-up the credentials is by configure command:

$ aito configure

Note

You can use the Quick Add Table Operation instead of doing upload step-by-step if you want to upload to a new table and don’t think you need to adjust the inferred schema.

The CLI supports all steps needed to upload data:

Infer a Table Schema

For examples, infer a table schema from a csv file:

$ aito infer-table-schema csv < path/to/myCSVFile.csv > path/to/inferredSchema.json

Change the Schema

You might want to change the ColumnType, e.g: The id column should be of type String instead of Int, or add an Analyzer to a Text column. In that case, just make changes to the inferred schema JSON file.

The example below use jq to change the id column type:

$ jq '.columns.id.type = "String"' < path/to/schemaFile.json > path/to/updatedSchemaFile.json

Create a Table

You need a table name and a table schema to create a table:

$ aito database create-table tableName path/to/tableSchema.json

Convert the Data

If you made changes to the inferred schema or have an existing schema, use the schema when with the -s flag to make sure that the converted data matches the schema:

$ aito convert csv -s path/to/updatedSchema.json path/to/myCSVFile.csv > path/to/myConvertedFile.ndjson

You can either convert the data to:

  • A list of entries in JSON format for Batch Upload:

    $ aito convert csv --json path/to/myCSVFile.csv > path/to/myConvertedFile.json
    
  • A NDJSON file for File Upload:

    $ aito convert csv < path/to/myFile.csv > path/to/myConvertedFile.ndjson
    

    Remember to gzip the NDJSON file:

    $ gzip path/to/myConvertedFile.ndjson
    

Upload the Data

You can upload the data by either:

Upload Data with the SDK

The Aito Python SDK uses Pandas DataFrame for multiple operations.

The example below shows how you can load a csv file into a DataFrame, please read the official pandas guide for further instructions. You can download an example csv file reddit_sample.csv here and run the code below:

import pandas as pd

reddit_df = pd.read_csv('reddit_sample.csv')

Infer a Table Schema

You can infer a AitoTableSchema from a Pandas DataFrame:

from aito.schema import AitoTableSchema
reddit_schema = AitoTableSchema.infer_from_pandas_dataframe(reddit_schema)

Change the Schema

You might want to change the ColumnType, e.g: The id column should be of type String instead of Int, or add a Analyzer to a Text column.

You can access and update the column schema by using the column name as the key:

reddit_schema['label'].data_type = AitoStringType()
# Change the analyzer of the `comments` column
reddit_schema['comments'].analyzer = AitoTokenNgramAnalyzerSchema(
  source=AitoAliasAnalyzerSchema('en'),
  min_gram=1,
  max_gram=3
)

Create a Table

The AitoClient can create a table using a table name and a table schema:

from aito.client import AitoClient
aito_client = AitoClient(instance_url="your_aito_instance_url", api_key="your_rw_api_key")
aito_client.create_table(table_name='reddit', table_schema=reddit_schema)

Convert the Data

The DataFrameHandler can convert a DataFrame to match an existing schema:

from aito.utils.data_frame_handler import DataFrameHandler
data_frame_handler = DataFrameHandler()
converted_data_frame = data_frame_handler.convert_df_from_aito_table_schema(
  df=data_frame,
  table_schema=table_schema_content
)

A DataFrame can be converted to:

  • A list of entries in JSON format for Batch Upload:

    entries = data_frame.to_dict(orient="records")
    
  • A gzipped NDJSON file for File Upload using the DataFrameHandler:

    from aito.utils.data_frame_handler import DataFrameHandler
    data_frame_handler = DataFrameHandler()
    data_frame_handler.df_to_format(
      df=data_frame,
      out_format='ndjson',
      write_output='path/to/myConvertedFile.ndjson.gz',
      convert_options={'compression': 'gzip'}
    )
    

Upload the Data

The AitoClient can upload the data with either Batch Upload or File Upload:

from aito.client import AitoClient
aito_client = AitoClient(instance_url="your_aito_instance_url", api_key="your_rw_api_key")

# Batch upload
aito_client.upload_entries(table_name='reddit', entries=entries)

# File Upload
aito_client.upload_file(table_name='table_name', file_path=file_path)

The Batch Upload can also be done using a generator:

def entries_generator(start, end):
  for idx in range(start, end):
    entry = {'id': idx}
    yield entry

aito_client.upload_entries(
  table_name="table_name",
  entries=entries_generator(start=0, end=4),
  batch_size=2,
  optimize_on_finished=False
)