Quickstart¶
This section explains how to upload data to Aito with either CLI or Python SDK.
Essentially, uploading data into Aito can be broken down into the following steps:
Note
Skip steps 1, 2, and 3 if you upload data to an existing table Skip step 4 if you already have the data in the appropriate format for uploading or the data matches the table schema
If you don’t have a data file, you can download our example file and follow the guide.
Upload Data with the CLI¶
Setup Aito credentials¶
The easiest way to set-up the credentials is by configure command:
$ aito configure
Note
You can use the Quick Add Table Operation instead of doing upload step-by-step if you want to upload to a new table and don’t think you need to adjust the inferred schema.
The CLI supports all steps needed to upload data:
Infer a Table Schema¶
For examples, infer a table schema from a csv file:
$ aito infer-table-schema csv < path/to/myCSVFile.csv > path/to/inferredSchema.json
Change the Schema¶
You might want to change the ColumnType, e.g: The id
column should be of type String
instead of Int
,
or add an Analyzer to a Text
column. In that case, just make changes to the inferred schema JSON file.
The example below use jq to change the id
column type:
$ jq '.columns.id.type = "String"' < path/to/schemaFile.json > path/to/updatedSchemaFile.json
Create a Table¶
You need a table name and a table schema to create a table:
$ aito database create-table tableName path/to/tableSchema.json
Convert the Data¶
If you made changes to the inferred schema or have an existing schema, use the schema when with the -s
flag to make sure that the converted data matches the schema:
$ aito convert csv -s path/to/updatedSchema.json path/to/myCSVFile.csv > path/to/myConvertedFile.ndjson
You can either convert the data to:
A list of entries in JSON format for Batch Upload:
$ aito convert csv --json path/to/myCSVFile.csv > path/to/myConvertedFile.jsonA NDJSON file for File Upload:
$ aito convert csv < path/to/myFile.csv > path/to/myConvertedFile.ndjsonRemember to gzip the NDJSON file:
$ gzip path/to/myConvertedFile.ndjson
Upload the Data¶
You can upload the data by either:
$ aito upload-entries tableName < tableEntries.json $ aito upload-file tableName tableEntries.ndjson.gz
Upload Data with the SDK¶
The Aito Python SDK uses Pandas DataFrame for multiple operations.
The example below shows how you can load a csv file into a DataFrame, please read the official pandas guide for further instructions.
You can download an example csv file reddit_sample.csv
here and run the code below:
import pandas as pd reddit_df = pd.read_csv('reddit_sample.csv')
Infer a Table Schema¶
You can infer a AitoTableSchema
from a Pandas DataFrame:
from aito.schema import AitoTableSchema reddit_schema = AitoTableSchema.infer_from_pandas_dataframe(reddit_schema)
Change the Schema¶
You might want to change the ColumnType, e.g: The id
column should be of type String
instead of Int
,
or add a Analyzer to a Text
column.
You can access and update the column schema by using the column name as the key:
reddit_schema['label'].data_type = AitoStringType() # Change the analyzer of the `comments` column reddit_schema['comments'].analyzer = AitoTokenNgramAnalyzerSchema( source=AitoAliasAnalyzerSchema('en'), min_gram=1, max_gram=3 )
Create a Table¶
The AitoClient
can create a table using a table name and a table schema:
from aito.client import AitoClient aito_client = AitoClient(instance_url="your_aito_instance_url", api_key="your_rw_api_key") aito_client.create_table(table_name='reddit', table_schema=reddit_schema)
Convert the Data¶
The DataFrameHandler
can convert a DataFrame to match an existing schema:
from aito.utils.data_frame_handler import DataFrameHandler data_frame_handler = DataFrameHandler() converted_data_frame = data_frame_handler.convert_df_from_aito_table_schema( df=data_frame, table_schema=table_schema_content )
A DataFrame can be converted to:
A list of entries in JSON format for Batch Upload:
entries = data_frame.to_dict(orient="records")A gzipped NDJSON file for File Upload using the DataFrameHandler:
from aito.utils.data_frame_handler import DataFrameHandler data_frame_handler = DataFrameHandler() data_frame_handler.df_to_format( df=data_frame, out_format='ndjson', write_output='path/to/myConvertedFile.ndjson.gz', convert_options={'compression': 'gzip'} )
Upload the Data¶
The AitoClient
can upload the data with either Batch Upload or File Upload:
from aito.client import AitoClient aito_client = AitoClient(instance_url="your_aito_instance_url", api_key="your_rw_api_key") # Batch upload aito_client.upload_entries(table_name='reddit', entries=entries) # File Upload aito_client.upload_file(table_name='table_name', file_path=file_path)
The Batch Upload can also be done using a generator:
def entries_generator(start, end): for idx in range(start, end): entry = {'id': idx} yield entry aito_client.upload_entries( table_name="table_name", entries=entries_generator(start=0, end=4), batch_size=2, optimize_on_finished=False )