aporia
Initialize the Aporia SDK.
Arguments:
- token: Authentication token.
- host: Controller host.
- environment: Environment in which aporia is initialized (e.g production, staging).
- port: Controller port. Defaults to 443.
- verbose: True to enable verbose error messages. Defaults to False.
- throw_errors: True to cause errors to be raised as exceptions. Defaults to False.
- debug: True to enable debug logs and stack traces in log messages. Defaults to False.
- http_timeout_seconds: HTTP timeout in seconds. Defaults to 30.
- verify_ssl: default to true
Notes:
- The token, host and environment parameters are required.
- All of the parameters here can also be defined as environment variables:
- token -> APORIA_TOKEN
- host -> APORIA_HOST
- environment -> APORIA_ENVIRONMENT
- port -> APORIA_PORT
- verbose -> APORIA_VERBOSE
- throw_errors -> APORIA_THROW_ERRORS
- debug -> APORIA_DEBUG
- http_timeout_seconds -> APORIA_HTTP_TIMEOUT_SECONDS
- Values passed as parameters to aporia.init() override the values from the corresponding environment variables.
Creates a new model.
Arguments:
- model_id: A unique identifier for the new model, which will be used in all future operations
- name: A name for the new model, which will be displayed in Aporia's dashboard
- description: A description of the model
- owner: The email of the model owner
- color: A color to distinguish the model in Aporia's dashboard. Defaults to blue
- icon: An icon that indicates the model's designation. Defaults to general
- tags: A mapping of tag keys to tag values
Returns:
Model ID.
Notes:
- If this model_id already exists, NO EXCEPTION WILL BE RAISED! Instead, the same model ID will be returned.
Creates a new model version, and defines a schema for it.
Arguments:
- model_id: Model identifier, as received from the Aporia dashboard.
- model_version: Model version - this can be any string that represents the model version, such as "v1" or a git commit hash.
- model_type: Model type (also known as objective - see notes).
- features: Schema for model features (See notes).
- predictions: Schema for prediction results (See notes).
- raw_inputs: Schema for raw inputs (See notes).
- metrics: Schema for prediction metrics (See notes).
- model_data_type: Model data type.
- labels: Labels of multi-label, multiclass or binary model. Deprecated.
- multiclass_labels: Labels of multi-label, multiclass or binary model. Same as "labels", Deprecated.
- feature_importance: Features' importance.
- mapping: General mapping (See notes).
Notes:
A schema is a dict, in which the keys are the fields you wish to report, and the values are the types of those fields. For example:
{ "feature1": "numeric", "feature2": "datetime" }
The supported model types are:
- "regression" - for regression models
- "binary" - for binary classification models
- "multiclass" - for multiclass classification models
- "multi-label" - for multi-label classification models
- "ranking" - for ranking models
The valid field types (and corresponding python types) are:
Field Type Python Types "numeric" float, int "categorical" int "boolean" bool "string" str "datetime" datetime.datetime, or str representing a datetime in ISO-8601 format "vector" list of floats "text" str (to be used as free text) "dict" dict[str, int] - The supported data types are:
- "tabular"
- "nlp"
The feature_importance is a mapping from feature name to it's importance (float). For example:
{ "feature1": 1, "feature2": 2 }
The mapping allowed fields are:
- batch_id_column_name: The name of the key in the
raw_inputs
dict that holds the value of thebatch_id
.- relevance_column_name: The name of the key in the
predictions
dict that holds the value of therelevance
score of models of typeranking
.- actual_relevance_column_name: The name of the key in the
actuals
dict that holds the value of therelevance
score of models of typeranking
.
Returns:
Model object for the new version.
Return features schema to use in version creation from ndarray shape.
Arguments:
- features_shape (Tuple): the shape of the features array.
Returns:
OrderedDict: The object to pass to features in the version schema.
Deletes a model.
Arguments:
- model_id: ID of the model to delete
Shuts down the Aporia SDK.
Notes:
- It is advised to call flush() before calling shutdown(), to ensure that all of the data that was sent reaches the controller.
Deletes a model tag.
Arguments:
- model_id: Model ID
- tag_key: Tag key to delete
Notes:
- This function is best-effort, it will not fail if the tag doesn't exist.
Model object for logging model events.
Initializes a model object.
Arguments:
- model_id: Model identifier, as received from the Aporia dashboard.
- model_version: Model version - this can be any string that represents the model version, such as "v1" or a git commit hash.
Logs aggregations of the whole set and logs a sample of the data.
Arguments:
- features: Training set features
- predictions: Training set predictions
- labels: Training set labels
- raw_inputs: Training set raw inputs.
- log_sample: Whether to log a sample of the data.
- sample_size: Number of records to sample.
Notes:
- Each dataframe corresponds to a field category defined in create_model_version:
- features -> features
- predictions -> predictions
- labels -> predictions
- raw_inputs -> raw_inputs
- Each column in the dataframe should match a field defined in create_model_version
- Missing fields will be handled as missing values
- Columns that do not match a defined field will be ignored
- The column name must match the field name
- This function is blocking and may take a while to finish running.
Logs a sample of the training data.
Arguments:
- features: Training set features
- labels: Training set labels
- raw_inputs: Training set raw_inputs
- sample_size: Number of records to sample
Notes:
- Each dataframe corresponds to a field category defined in create_model_version:
- features -> features
- labels -> predictions
- Each column in the dataframe should match a field defined in create_model_version
- Missing fields will be handled as missing values
- Columns that do not match a defined field will be ignored
- The column name must match the field name
- This function is blocking and may take a while to finish running.
Logs test data.
Arguments:
- features: Test set features
- predictions: Test set predictions
- labels: Test set labels
- raw_inputs: Test set raw inputs.
- confidences: Confidence values for the test predictions.
Notes:
- Each dataframe corresponds to a field category defined in create_model_version:
- features -> features
- predictions -> predictions
- labels -> predictions
- raw_inputs -> raw_inputs
- Each column in the dataframe should match a field defined in create_model_version
- Missing fields will be handled as missing values
- Columns that do not match a defined field will be ignored
- The column name must match the field name
- This function is blocking and may take a while to finish running.
Logs raw inputs of multiple predictions.
Arguments:
- ids: Prediction identifiers
- raw_inputs: Raw inputs of each prediction
Notes:
- The ids dataframe must contain exactly one column
- The ids and raw_inputs dataframes must have the same number of rows
Logs actual values of multiple predictions.
Arguments:
- ids: Prediction identifiers
- actuals: Actual prediction results of each prediction
Notes:
- The ids dataframe must contain exactly one column
- The ids and actuals dataframes must have the same number of rows
Logs multiple predictions.
Arguments:
- data: PsSpark dataframe
- id_column: Optional dataframe column to use as an id.
- timestamp_column: Optional dataframe column to use as an id.
- features: A mapping of feature names (from the schema) to dataframe columns
- predictions: A mapping of predictions names (from the schema) to dataframe columns
- raw_inputs: A mapping of raw input names (from the schema) to dataframe columns
- labels: A mapping of label names (from the schema) to dataframe columns
- spark_options: Optional configuration extension for spark elastic connector. See https://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html
Logs stream of predictions.
Arguments:
- data: PsSpark dataframe
- id_column: Optional dataframe column to use as an id.
- timestamp_column: Optional dataframe column to use as an id.
- features: A mapping of feature names (from the schema) to dataframe columns
- predictions: A mapping of predictions names (from the schema) to dataframe columns
- raw_inputs: A mapping of raw input names (from the schema) to dataframe columns
- labels: A mapping of label names (from the schema) to dataframe columns
- spark_options: Optional configuration extension for spark elastic connector. See https://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html
Logs training data from PySpark DataFrames.
Arguments:
- data: PySparkDataFrame all the data
- features: Optional[Mapping[str,str]] mapping of column to features as configured in the schema. Key is the name in the schema and the value is the name of the corresponding dataframe column
- predictions: Optional[Mapping[str,str]] mapping of column to predictions as configured in the schema. Key is the name in the schema and the value is the name of the corresponding dataframe column
- raw_inputs: Optional[Mapping[str, str]] mapping of column to raw inputs as configured in the schema. Key is the name in the schema and the value is the name of the corresponding dataframe column
Logs test data from PySpark DataFrames.
Arguments:
- features: Test set features
- predictions: Test set predictions
- labels: Test set labels
- raw_inputs: Test set raw inputs.
Notes:
- Each dataframe corresponds to a field category defined in create_model_version:
- features -> features
- predictions -> predictions
- labels -> predictions
- raw_inputs -> raw_inputs
- Each column in the dataframe should match a field defined in create_model_version
- Missing fields will be handled as missing values
- Columns that do not match a defined field will be ignored
- The column name must match the field name
- This function is blocking and may take a while to finish running.
Inherited Members
- InferenceModel
- log_prediction
- log_batch_prediction
- log_raw_inputs
- log_batch_raw_inputs
- log_actuals
- log_batch_actuals
- log_json
- upload_model_artifact
- set_feature_importance
- log_index_to_word_mapping
- connect_serving
- connect_actuals
- connect_training
- connect_testing
- flush
- aporia.core.base_model.BaseModel
- handle_error
Model object for logging inference events.
Initializes an inference model object.
Arguments:
- model_id: Model identifier, as received from the Aporia dashboard.
- model_version: Model version - this can be any string that represents the model version, such as "v1" or a git commit hash.
Logs a single prediction.
Arguments:
- id: Prediction identifier.
- features: Values for all the features in the prediction
- predictions: Prediction result
- metrics: Prediction metrics.
- occurred_at: Prediction timestamp.
- confidence: Prediction confidence.
- raw_inputs: Raw inputs of the prediction.
- actuals: Actual prediction results.
Note:
- If occurred_at is None, it will be reported as datetime.now()
Logs multiple predictions.
Arguments:
batch_predictions: An iterable that produces prediction dicts.
Each prediction dict MUST contain the following keys:
- features (Dict[str, FieldValue]): Values for all the features in the prediction
- predictions (Dict[str, FieldValue]): Prediction result
Each prediction dict MAY also contain the following keys:
- id (str): Prediction identifier.
- occurred_at (datetime): Prediction timestamp.
- metrics (Dict[str, FieldValue]): Prediction metrics
- confidence (Union[float, List[float]]): Prediction confidence.
- raw_inputs (Dict[str, FieldValue]): Raw inputs of the prediction.
- actuals (Dict[str, FieldValue]) Actual prediction results.
Notes:
- If occurred_at is None in any of the predictions, it will be reported as datetime.now()
Logs raw inputs of a single prediction.
Arguments:
- id: Prediction identifier.
- raw_inputs: Raw inputs of the prediction.
Logs raw inputs of multiple predictions.
Arguments:
batch_raw_inputs: An iterable that produces raw_inputs dicts.
- Each dict MUST contain the following keys:
- id (str): Prediction identifier.
- raw_inputs (Dict[str, FieldValue]): Raw inputs of the prediction.
- Each dict MUST contain the following keys:
Logs actual values of a single prediction.
Arguments:
- id: Prediction identifier.
- actuals: Actual prediction results.
Note:
- The fields reported in actuals must be a subset of the fields reported in predictions.
Logs actual values of multiple predictions.
Arguments:
batch_actuals: An iterable that produces actuals dicts.
- Each dict MUST contain the following keys:
- id (str): Prediction identifier.
- actuals (Dict[str, FieldValue]): Actual prediction results.
- Each dict MUST contain the following keys:
Note:
- The fields reported in actuals must be a subset of the fields reported in predictions.
Logs arbitrary data.
Arguments:
- data: Data to log, must be JSON serializable
Uploads binary model artifact.
Arguments:
- model_artifact: Binary model artifact.
- artifact_type: The type of model artifact (see below)
Model Artifact Types:
- onnx
- h5
Update the features' importance of the model.
Arguments:
- feature_importance: feature name to importance mapping
Logs index to word mapping.
Arguments:
- index_to_word_mapping: A mapping between a numeric index to a word.
Connect to external serving data set.
Arguments:
- data_source: The data source to fetch the data set from.
- id_column: The name of the id column.
- timestamp_column: The name of the timestamp column.
- features: Mapping from feature name to column name.
- predictions: Mapping from prediction name to column name.
- labels: Mapping from actual name to column name.
- raw_inputs: Mapping from raw input name to column name.
- http_timeout_seconds: HTTP timeout in seconds. Defaults to 10 minutes.
Connect to external actual data set.
Arguments:
- data_source: The data source to fetch the data set from.
- id_column: The name of the id column.
- timestamp_column: The name of a column contains the times actual was updated.
- labels: Mapping from the predictions name to column holding the actual value.
- http_timeout_seconds: HTTP timeout in seconds. Defaults to 10 minutes.
Connect to external training data set.
Arguments:
- data_source: The data source to fetch the data set from.
- id_column: The name of the id column.
- timestamp_column: The name of the timestamp column.
- features: Mapping from feature name to column name.
- predictions: Mapping from prediction name to column name.
- labels: Mapping from prediction name to column name.
- raw_inputs: Mapping from raw input name to column name.
- http_timeout_seconds: HTTP timeout in seconds. Defaults to 10 minutes.
Connect to external test data set.
Arguments:
- data_source: The data source to fetch the data set from.
- id_column: The name of the id column.
- timestamp_column: The name of the timestamp column.
- features: Mapping from feature name to column name.
- predictions: Mapping from prediction name to column name.
- labels: Mapping from prediction name to column name.
- raw_inputs: Mapping from raw input name to column name.
- http_timeout_seconds: HTTP timeout in seconds. Defaults to 10 minutes.
Waits for all currently scheduled tasks to finish.
Arguments:
- timeout: Maximum number of seconds to wait for tasks to complete. Default to None (No timeout).
Returns:
Number of tasks that haven't finished running.
Inherited Members
- aporia.core.base_model.BaseModel
- handle_error
Model colors.
Inherited Members
- enum.Enum
- name
- value
Model Icons.
Inherited Members
- enum.Enum
- name
- value
AWS Athena data source.
Initializes a JDBCDataSource.
Arguments:
- url: Database connection URL
- query: SQL query to read data from the database
- s3_output_location: Path to S3 bucket for storing query results
- user: Database user
- password: Database password
- sample_size: Fraction of data to sample
- select_expr: Select expressions to apply to the dataframe after reading
- read_options: Additional spark read options
Inherited Members
- aporia.core.types.data_source.DataSource
- serialize
BigQuery data source.
Initializes a SparkDataSource.
Arguments:
- credentials_base64: Base64 encoded JSON string containing GCP service account details.
- table: Table to query
- dataset: Dataset to query
- project: Project name
- parent_project: Parent project name
- sample_size: Fraction of data to sample
- select_expr: Select expressions to apply to the dataframe after reading
- read_options: Additional spark read options
Inherited Members
- aporia.core.types.data_source.DataSource
- serialize
Generic JDBC data source.
Initializes a JDBCDataSource.
Arguments:
- url: Database connection URL
- query: SQL query to read data from the database
- user: Database user
- password: Database password
- sample_size: Fraction of data to sample
- select_expr: Select expressions to apply to the dataframe after reading
- read_options: Additional spark read options
Inherited Members
- aporia.core.types.data_source.DataSource
- serialize
Postgres (via JDBC) data source.
Inherited Members
- aporia.core.types.data_source.DataSource
- serialize
S3 data source.
Initializes a S3DataSource.
Arguments:
- object_path: The path in S3 to the object, excluding s3 prefix (e.g: bucket-name/file.parquet)
- object_format: Type of the input file (Parquet, CSV, JSON, etc)
- sample_size: Fraction of data to sample
- select_expr: Select expressions to apply to the dataframe after reading
- read_options: Additional spark read options
Inherited Members
- aporia.core.types.data_source.DataSource
- serialize
Snowflake data source.
Initializes a SnowflakeDataSource.
Arguments:
- url: The full Snowflake URL of your instance
- query: SQL query to read data from the database
- user: Username for database connection
- password: Password for database connection
- database: Database name
- schema: Schema name
- warehouse: The default virtual warehouse to use
- sample_size: Fraction of data to sample
- select_expr: Select expressions to apply to the dataframe after reading
- read_options: Additional spark read options
Inherited Members
- aporia.core.types.data_source.DataSource
- serialize
Glue data source.
Initializes a GlueDataSource.
Arguments:
- query: SQL query to read data from the database
- sample_size: Fraction of data to sample
- select_expr: Select expressions to apply to the dataframe after reading
Inherited Members
- aporia.core.types.data_source.HiveDataSource
- serialize_config
- aporia.core.types.data_source.DataSource
- serialize