Quickstart¶
This section explains how to upload data to Aito and send your first query with either CLI or Python SDK.
Essentially, uploading data into Aito can be broken down into the following steps:
Note
Skip steps 1, 2, and 3 if you upload data to an existing table Skip step 4 if you already have the data in the appropriate format for uploading or the data matches the table schema
If you don’t have a data file, you can download our example file and follow the guide.
Upload data and send your first query with the CLI¶
Setup Aito credentials¶
The easiest way to set-up the credentials is by configure command:
$ aito configure
Note
You can use the Quick Add Table Operation instead of doing upload step-by-step if you want to upload to a new table and don’t think you need to adjust the inferred schema.
The CLI supports all steps needed to upload data:
Infer a Table Schema¶
For examples, infer a table schema from a csv file:
$ aito infer-table-schema csv < path/to/myCSVFile.csv > path/to/inferredSchema.json
Change the Schema¶
You might want to change the ColumnType, e.g: The id
column should be of type String
instead of Int
,
or add an Analyzer to a Text
column. In that case, just make changes to the inferred schema JSON file.
The example below use jq to change the id
column type:
$ jq '.columns.id.type = "String"' < path/to/schemaFile.json > path/to/updatedSchemaFile.json
Create a Table¶
You need a table name and a table schema to create a table:
$ aito database create-table tableName path/to/tableSchema.json
Convert the Data¶
If you made changes to the inferred schema or have an existing schema, use the schema when with the -s
flag to make sure that the converted data matches the schema:
$ aito convert csv -s path/to/updatedSchema.json path/to/myCSVFile.csv > path/to/myConvertedFile.ndjson
You can either convert the data to:
A list of entries in JSON format for Batch Upload:
$ aito convert csv --json path/to/myCSVFile.csv > path/to/myConvertedFile.jsonA NDJSON file for File Upload:
$ aito convert csv < path/to/myFile.csv > path/to/myConvertedFile.ndjsonRemember to gzip the NDJSON file:
$ gzip path/to/myConvertedFile.ndjson
Upload the Data¶
You can upload the data by either:
$ aito upload-entries tableName < tableEntries.json $ aito upload-file tableName tableEntries.ndjson.gz
Send your first query¶
You can send a query to an Aito endpoint by:
$ aito <endpoint> <query>
For example:
$ aito search '{"from": "products"}'
$ aito predict '{"from": "products", "where": {"name": {"$match": "rye bread"}}, "predict": "tags"}'
Upload data and send your first query with the SDK¶
The Aito Python SDK uses Pandas DataFrame for multiple operations.
The example below shows how you can load a csv file into a DataFrame, please read the official pandas guide for further instructions.
You can download an example csv file reddit_sample.csv
here and run the code below:
import pandas
reddit_df = pandas.read_csv("reddit_sample.csv")
Infer a table schema¶
You can infer a AitoTableSchema
from a Pandas DataFrame:
from aito.schema import AitoTableSchema
from pprint import pprint
reddit_schema = AitoTableSchema.infer_from_pandas_data_frame(reddit_df)
print(reddit_schema.to_json_string(indent=2, sort_keys=True))
{
"columns": {
"author": {
"nullable": false,
"type": "String"
},
"comment": {
"analyzer": {
"customKeyWords": [],
"customStopWords": [],
"language": "english",
"type": "language",
"useDefaultStopWords": false
},
"nullable": false,
"type": "Text"
},
"created_utc": {
"analyzer": {
"delimiter": ":",
"trimWhitespace": true,
"type": "delimiter"
},
"nullable": false,
"type": "Text"
},
"date": {
"analyzer": {
"delimiter": "-",
"trimWhitespace": true,
"type": "delimiter"
},
"nullable": false,
"type": "Text"
},
"downs": {
"nullable": false,
"type": "Int"
},
"label": {
"nullable": false,
"type": "Int"
},
"parent_comment": {
"analyzer": {
"customKeyWords": [],
"customStopWords": [],
"language": "english",
"type": "language",
"useDefaultStopWords": false
},
"nullable": false,
"type": "Text"
},
"score": {
"nullable": false,
"type": "Int"
},
"subreddit": {
"nullable": false,
"type": "String"
},
"ups": {
"nullable": false,
"type": "Int"
}
},
"type": "table"
}
Change the Schema¶
You might want to change the ColumnType, e.g: The id
column should be of type String
instead of Int
,
or add a Analyzer to a Text
column.
You can access and update the column schema by using the column name as the key:
from aito.schema import AitoStringType, AitoTokenNgramAnalyzerSchema, AitoAliasAnalyzerSchema
# Change the label type to String instead of Int
reddit_schema['label'].data_type = AitoStringType()
# Change the analyzer of the `comments` column
reddit_schema['comment'].analyzer = AitoTokenNgramAnalyzerSchema(
source=AitoAliasAnalyzerSchema('en'),
min_gram=1,
max_gram=3
)
Create a table¶
You can create_table()
using an AitoClient
and specifying the table name and the table schema
Note
The example is not direclty copy-pastable. Please use your own Aito environment credentials
from aito.client import AitoClient
from aito.api import create_table
aito_client = AitoClient(instance_url=YOUR_AITO_INSTANCE_URL, api_key=YOUR_AITO_INSTANCE_API_KEY)
create_table(client=aito_client, table_name='reddit', schema=reddit_schema)
Convert the Data¶
The DataFrameHandler
can convert a DataFrame to match an existing schema:
from aito.utils.data_frame_handler import DataFrameHandler
data_frame_handler = DataFrameHandler()
converted_reddit_df = data_frame_handler.convert_df_using_aito_table_schema(
df=reddit_df,
table_schema=reddit_schema
)
A DataFrame can be converted to:
A list of entries in JSON format for Batch Upload:
reddit_entries = converted_reddit_df.to_dict(orient="records")A gzipped NDJSON file for File Upload using the
DataFrameHandler
:data_frame_handler.df_to_format( df=converted_reddit_df, out_format='ndjson', write_output='reddit_sample.ndjson.gz', convert_options={'compression': 'gzip'} )
Upload the Data¶
You can upload_entries()
using an AitoClient
Batch Upload:
from aito.api import upload_entries upload_entries(aito_client, table_name='reddit', entries=reddit_entries)File Upload:
from pathlib import Path from aito.api import upload_file, get_table_size upload_file(aito_client, table_name='reddit', file_path=Path('reddit_sample.ndjson.gz')) # Check that the data has been uploaded print(get_table_size(aito_client, 'reddit'))10000
The Batch Upload can also be done using a generator:
def entries_generator(start, end): for idx in range(start, end): entry = {'id': idx} yield entry upload_entries( aito_client, table_name="table_name", entries=entries_generator(start=0, end=4), batch_size=2, optimize_on_finished=False )
Send your first query¶
You can send a query to an Aito endpoint by using the AitoClient method:
from aito.client import AitoClient from aito.api import search, predict aito_client = AitoClient(instance_url=INSTANCE_URL, api_key=INSTANCE_API_KEY) search(client=aito_client, query={ "from": "products", "where": {"name": {"$match": "rye bread"}} }) predict(client=aito_client, query={ "from": "products", "where": {"name": "rye bread"}, "predict": "tags" })