Skip to content

Datasource

Datasource pydantic-model

Config:

  • frozen: True

Fields:

id pydantic-field

id = None

An optional unique identifier for the datasource defined by the user.

name pydantic-field

name

Name representing the datasource in the data set catalog and other pages, such as incidents and notifications. This name can be modified to provide a more meaningful identifier than the one assigned to the physical storage.

owners pydantic-field

owners

Assign users to a data source by choosing from the list of platform-accessible users. This functionality promotes collaboration by clearly designating responsibility for each data source. When a user is assigned as an owner, they are tagged and notified of any incidents related to their data source, ensuring prompt attention and resolution.

data_connection_configuration pydantic-field

data_connection_configuration

contract_generation_configuration pydantic-field

contract_generation_configuration

incident_configuration pydantic-field

incident_configuration

pivot_fields pydantic-field

pivot_fields = ()

Pivot fields used to split the data into different contracts. As of this SDK version’s release, Upriver supports a maximum of 2 pivot fields.

tags pydantic-field

tags = []

Tags allows to organize and classify data sources for more efficient management and enhanced access controls.

is_key_asset pydantic-field

is_key_asset = False

If true, the datasource is a key asset.

upriver_id pydantic-field

upriver_id

The unique identifier of the datasource in Upriver's system.

status pydantic-field

status

root_id pydantic-field

root_id = None

Identifier of parent datasource.

children pydantic-field

children

List of children datasources, if available.

model_config class-attribute instance-attribute

model_config = ConfigDict(frozen=True)

DatasourceContractGenerationConfiguration pydantic-model

Configurations that alter the contract generation process.

Fields:

update_interval_minutes pydantic-field

update_interval_minutes = None

The time in minutes between contract updates. If None, the system will determine and update the interval automatically as needed.

sampling_rate pydantic-field

sampling_rate = None

Sampling rate for the data to be used to generate the contract, between 0 and 1. If None, the system will determine and update the sampling rate automatically based on the volume of data.

cardinality_threshold pydantic-field

cardinality_threshold = 10

The threshold for the cardinality of a column to be considered a categorical column. If the column has less than cardinality_threshold unique values, it will be considered a categorical column. Categorial columns will generate an exact histogram of the values provided, and monitor both the specific values existence and their distribution.

nullify_empty_strings pydantic-field

nullify_empty_strings = False

If true, Upriver will consider an empty string as a null, reducing completeness and not affecting the distribution of the lengths of the field.

nullify_empty_arrays pydantic-field

nullify_empty_arrays = False

If true, Upriver will consider an empty array as a null, reducing completeness and not affecting the distribution of the lengths of the field.

filter pydantic-field

filter

A map from field name (case sensitive) to the Filter to be applied to the field. Only rows where all the fields match filter value will be considered for the contract.

staleness_threshold_days pydantic-field

staleness_threshold_days = 7

The amount of days that no new data can be found before the datasource is considered stale.

timestamp_column pydantic-field

timestamp_column = None

The name of the column in the data representing the timestamp. Currently this is only used in bigquery or delta table based datasources.

primary_key_columns pydantic-field

primary_key_columns

The names of the columns in the data representing the primary key. If defined, These will be provided along with examples whenever they are provided.

manual_trigger pydantic-field

manual_trigger = False

If true, Upriver will not update the contract automatically and will only monitor when triggered by the user.

pause_new_derived_datasources pydantic-field

pause_new_derived_datasources = False

If true, Upriver will set derived datasources (derived by pivot field value) to paused state upon creation.

DatasourceId pydantic-model

A class representing the identity of a datasource. Used to reference other datasources without loading the full datasource object.

Fields:

model_config class-attribute instance-attribute

model_config = frozen_config

upriver_id pydantic-field

upriver_id

The unique identifier of the datasource in Upriver's system.

name pydantic-field

name

Name representing the datasource in the data set catalog and other pages, such as incidents and notifications. This name can be modified to provide a more meaningful identifier than the one assigned to the physical storage.

DatasourceIncidentConfiguration pydantic-model

Configurations that alter the incidents raised by the datasource.

Fields:

default_severity pydantic-field

default_severity = UNSET

The default severity of incidents.

notification_webhook_url pydantic-field

notification_webhook_url = None

The webhook URL to be called when an incident is raised.

DatasourceMetadataExpectations pydantic-model

Metadata expectations configurations that can be set over the datasource. All the expectations are optional, and can be set independently. Ranged expectations can be set with a lower bound, an upper bound, or both. Setting any bound to None means that the expectation will not be set (or will be removed if it was set before).

Fields:

row_count pydantic-field

row_count = None

Row count expectation over the datasource.

freshness pydantic-field

freshness = None

Freshness expectation over the datasource. Freshness is measure of time between the last modified time of the data and Upriver's task run time, given in seconds. For example if the data should be written at most once a day the freshness upper bound should be 24 * 60 * 60.

row_uniqueness pydantic-field

row_uniqueness = None

Row uniqueness expectation over the datasource. Row uniqueness is the percentage of unique rows in the data. Possible values between 0 and 1, where 1 means each row is unique.

DatasourceSchema pydantic-model

Fields:

title pydantic-field

title = None

properties pydantic-field

properties

type pydantic-field

type = 'object'

DatasourceSettings pydantic-model

Configurations that can be edited after the datasource is created.

Fields:

id pydantic-field

id = None

An optional unique identifier for the datasource defined by the user.

name pydantic-field

name = None

Name representing the datasource in the data set catalog and other pages, such as incidents and notifications.

connection_configuration pydantic-field

connection_configuration = None

The connection configuration for the datasource. Note: For data integrations, only the id field can be modified to change which integration the datasource connects to. All other integration fields are read-only and cannot be edited.

owners pydantic-field

owners = None

Assign users to a data source by choosing from the list of platform-accessible users. This functionality promotes collaboration by clearly designating responsibility for each data source. When a user is assigned as an owner, they are tagged and notified of any incidents related to their data source, ensuring prompt attention and resolution.

tags pydantic-field

tags = None

Tags to be associated with the datasource. Tags can be used to filter datasources in the UI.

sampling_rate pydantic-field

sampling_rate = None

Sampling rate for the data to be used to generate the contract, between 0 and 1. If None, the system will determine and update the sampling rate automatically based on the volume of data.

update_interval_minutes pydantic-field

update_interval_minutes = None

The time in minutes between contract updates. If None, the system will determine and update the interval automatically as needed.

manual_trigger pydantic-field

manual_trigger = None

If true, Upriver will not update the contract automatically and will only monitor when triggered by the user.

pivot_fields pydantic-field

pivot_fields = None

Pivot fields used to split the data into different contracts. As of this SDK version’s release, Upriver supports a maximum of 2 pivot fields.

pause_new_derived_datasources pydantic-field

pause_new_derived_datasources = None

If true, Upriver will set derived datasources (derived by pivot field value) to paused state upon creation.

filter pydantic-field

filter = None

A map from field name (case sensitive) to the Filter to be applied to the field. Only rows where all the fields match filter value will be considered for the contract.

cardinality_threshold pydantic-field

cardinality_threshold = None

The threshold for the cardinality of a column to be considered a categorical column. If the column has less than cardinality_threshold unique values, it will be considered a categorical column. Categorial columns will generate an exact histogram of the values provided, and monitor both the specific values existence and their distribution.

default_severity pydantic-field

default_severity = None

The default severity of incidents.

staleness_threshold_days pydantic-field

staleness_threshold_days = None

The amount of days that no new data can be found before the datasource is considered stale.

timestamp_column pydantic-field

timestamp_column = None

The name of the column in the data representing the timestamp. Currently this is only used in bigquery or delta table based datasources.

notification_webhook_url pydantic-field

notification_webhook_url = None

The webhook URL to be called when an incident is raised.

nullify_empty_strings pydantic-field

nullify_empty_strings = None

If true, Upriver will consider an empty string as a null, reducing completeness and not affecting the distribution of the lengths of the field.

nullify_empty_arrays pydantic-field

nullify_empty_arrays = None

If true, Upriver will consider an empty array as a null, reducing completeness and not affecting the distribution of the lengths of the field.

MonitoringStatus

The current status of the datasource.

INITIALIZING class-attribute instance-attribute

INITIALIZING = 'INITIALIZING'

The data source has been identified and has still not seen a single run.

RUNNING class-attribute instance-attribute

RUNNING = 'RUNNING'

The data source is continuously monitored by Upriver

ERROR class-attribute instance-attribute

ERROR = 'ERROR'

The datasource has encountered an error and is not currently capable of running on the data.

PAUSED class-attribute instance-attribute

PAUSED = 'PAUSED'

The data source is not being monitored because the user explicitly decided to stop the monitor.

STALE class-attribute instance-attribute

STALE = 'STALE'

The data source has not been updated for more than the specified threshold.

LEARNING class-attribute instance-attribute

LEARNING = 'LEARNING'

The data source has already performed at least one run, however it is still waiting for more update iterations in order to define a more stable baseline.

NewDatasource pydantic-model

Represents a request to create a new datasource.

Fields:

id pydantic-field

id = None

An optional unique identifier for the datasource defined by the user.

name pydantic-field

name

Name representing the datasource in the data set catalog and other pages, such as incidents and notifications. This name can be modified to provide a more meaningful identifier than the one assigned to the physical storage.

owners pydantic-field

owners

Assign users to a data source by choosing from the list of platform-accessible users. This functionality promotes collaboration by clearly designating responsibility for each data source. When a user is assigned as an owner, they are tagged and notified of any incidents related to their data source, ensuring prompt attention and resolution.

data_connection_configuration pydantic-field

data_connection_configuration

contract_generation_configuration pydantic-field

contract_generation_configuration

incident_configuration pydantic-field

incident_configuration

pivot_fields pydantic-field

pivot_fields = ()

Pivot fields used to split the data into different contracts. As of this SDK version’s release, Upriver supports a maximum of 2 pivot fields.

tags pydantic-field

tags = []

Tags allows to organize and classify data sources for more efficient management and enhanced access controls.

is_key_asset pydantic-field

is_key_asset = False

If true, the datasource is a key asset.

pause_on_creation pydantic-field

pause_on_creation = False

If true, the datasource will be paused upon creation.

NewDatasourceResponse pydantic-model

Fields:

  • id (str)

id pydantic-field

id

The unique identifier of the datasource in Upriver's system.

RangedExpectation pydantic-model

Fields:

lower_bound pydantic-field

lower_bound = None

upper_bound pydantic-field

upper_bound = None

SchemaField pydantic-model

Fields:

type pydantic-field

type = None

format pydantic-field

format = None

properties pydantic-field

properties = None

BigQueryConfiguration pydantic-model

Fields:

type pydantic-field

type = BIG_QUERY

data_integration pydantic-field

data_integration = None

dataset pydantic-field

dataset

table pydantic-field

table

DataFormat

The format of the stored data.

JSON class-attribute instance-attribute

JSON = 'json'

PARQUET class-attribute instance-attribute

PARQUET = 'parquet'

CSV class-attribute instance-attribute

CSV = 'csv'

DELTA_TABLE class-attribute instance-attribute

DELTA_TABLE = 'delta_table'

ORC class-attribute instance-attribute

ORC = 'orc'

ICEBERG class-attribute instance-attribute

ICEBERG = 'iceberg'

DatasourceConnectionConfiguration module-attribute

DatasourceConnectionConfiguration = Union[
    KinesisConfiguration,
    FileSystemConfiguration,
    BigQueryConfiguration,
    SnowflakeConfiguration,
    RedshiftConfiguration,
]

DatasourceType

KINESIS class-attribute instance-attribute

KINESIS = 'kinesis'

S3 class-attribute instance-attribute

S3 = 's3'

GCS class-attribute instance-attribute

GCS = 'gcs'

BIG_QUERY class-attribute instance-attribute

BIG_QUERY = 'big_query'

SNOWFLAKE class-attribute instance-attribute

SNOWFLAKE = 'snowflake'

REDSHIFT class-attribute instance-attribute

REDSHIFT = 'redshift'

is_filesystem

is_filesystem()
Source code in upriver/sdk/datasource/datasource_connection_configuration.py
17
18
def is_filesystem(self) -> bool:
    return self in {DatasourceType.S3, DatasourceType.GCS}

FileSystemConfiguration pydantic-model

Fields:

data_integration pydantic-field

data_integration = None

type pydantic-field

type

bucket_path pydantic-field

bucket_path

The path of the bucket

bucket_prefix pydantic-field

bucket_prefix

The prefix in the bucket to scan in the datasource

region pydantic-field

region

The region in which the bucket is located

data_format pydantic-field

data_format

The format of the data stored in the bucket

additional_format_settings pydantic-field

additional_format_settings = None

Additional format settings for the data stored in the bucket.

FileSystemDatasourceType module-attribute

FileSystemDatasourceType = Literal[S3, GCS]

KinesisConfiguration pydantic-model

Fields:

data_integration pydantic-field

data_integration = None

type pydantic-field

type = KINESIS

stream_name pydantic-field

stream_name

The name of the Kinesis stream

region pydantic-field

region

The AWS region of the Kinesis stream

SnowflakeConfiguration pydantic-model

Fields:

data_integration pydantic-field

data_integration = None

type pydantic-field

type = SNOWFLAKE

host pydantic-field

host

database pydantic-field

database

schema_name pydantic-field

schema_name

table pydantic-field

table

DatasourceStatus pydantic-model

Fields:

monitoring_status pydantic-field

monitoring_status

incident_status pydantic-field

incident_status

children_incident_status pydantic-field

children_incident_status

The aggegated status of this datasource and all of its children datasources, if it has any.

has_sla pydantic-field

has_sla

unresolved_incidents_count pydantic-field

unresolved_incidents_count

last_data_detected_at pydantic-field

last_data_detected_at

Filter

operator instance-attribute

operator

value instance-attribute

value

FilterOperator

Currently Upriver only supports equlity filters. Additional filters are planned to be added in future releases.

EQUALS class-attribute instance-attribute

EQUALS = 'equals'

NOT_EQUALS class-attribute instance-attribute

NOT_EQUALS = 'not_equals'

FilterValue module-attribute

FilterValue = int | float | str | bool | None

RunResponse pydantic-model

Fields:

run_id pydantic-field

run_id

RunStatus

IN_PROGRESS class-attribute instance-attribute

IN_PROGRESS = 'in progress'

RUN_FAILED class-attribute instance-attribute

RUN_FAILED = 'run failed'

EMPTY_RUN class-attribute instance-attribute

EMPTY_RUN = 'empty run'

NO_INCIDENTS class-attribute instance-attribute

NO_INCIDENTS = 'no incidents'

MINOR_INCIDENTS class-attribute instance-attribute

MINOR_INCIDENTS = 'minor incidents'

MAJOR_INCIDENTS class-attribute instance-attribute

MAJOR_INCIDENTS = 'major incidents'

RunStatusResponse pydantic-model

Fields:

run_result pydantic-field

run_result