Datasource
Datasource
pydantic-model
Config:
frozen:True
Fields:
-
upriver_id(str) -
status(MonitoringStatus) -
root_id(Optional[DatasourceId]) -
children(List[DatasourceId])
id
pydantic-field
id = None
An optional unique identifier for the datasource defined by the user.
name
pydantic-field
name
Name representing the datasource in the data set catalog and other pages, such as incidents and notifications. This name can be modified to provide a more meaningful identifier than the one assigned to the physical storage.
owners
pydantic-field
owners
Assign users to a data source by choosing from the list of platform-accessible users. This functionality promotes collaboration by clearly designating responsibility for each data source. When a user is assigned as an owner, they are tagged and notified of any incidents related to their data source, ensuring prompt attention and resolution.
data_connection_configuration
pydantic-field
data_connection_configuration
contract_generation_configuration
pydantic-field
contract_generation_configuration
incident_configuration
pydantic-field
incident_configuration
pivot_fields
pydantic-field
pivot_fields = ()
Pivot fields used to split the data into different contracts. As of this SDK version’s release, Upriver supports a maximum of 2 pivot fields.
tags
pydantic-field
tags = []
Tags allows to organize and classify data sources for more efficient management and enhanced access controls.
is_key_asset
pydantic-field
is_key_asset = False
If true, the datasource is a key asset.
upriver_id
pydantic-field
upriver_id
The unique identifier of the datasource in Upriver's system.
status
pydantic-field
status
root_id
pydantic-field
root_id = None
Identifier of parent datasource.
children
pydantic-field
children
List of children datasources, if available.
model_config
class-attribute
instance-attribute
model_config = ConfigDict(frozen=True)
DatasourceContractGenerationConfiguration
pydantic-model
Configurations that alter the contract generation process.
Fields:
-
update_interval_minutes(Optional[float]) -
sampling_rate(Optional[float]) -
cardinality_threshold(int) -
nullify_empty_strings(bool) -
nullify_empty_arrays(bool) -
filter(Dict[str, Filter]) -
staleness_threshold_days(PositiveInt) -
timestamp_column(Optional[str]) -
primary_key_columns(List[str]) -
manual_trigger(bool) -
pause_new_derived_datasources(bool)
update_interval_minutes
pydantic-field
update_interval_minutes = None
The time in minutes between contract updates. If None, the system will determine and update the interval automatically as needed.
sampling_rate
pydantic-field
sampling_rate = None
Sampling rate for the data to be used to generate the contract, between 0 and 1. If None, the system will determine and update the sampling rate automatically based on the volume of data.
cardinality_threshold
pydantic-field
cardinality_threshold = 10
The threshold for the cardinality of a column to be considered a categorical column. If the column has less than cardinality_threshold unique values, it will be considered a categorical column. Categorial columns will generate an exact histogram of the values provided, and monitor both the specific values existence and their distribution.
nullify_empty_strings
pydantic-field
nullify_empty_strings = False
If true, Upriver will consider an empty string as a null, reducing completeness and not affecting the distribution of the lengths of the field.
nullify_empty_arrays
pydantic-field
nullify_empty_arrays = False
If true, Upriver will consider an empty array as a null, reducing completeness and not affecting the distribution of the lengths of the field.
filter
pydantic-field
filter
A map from field name (case sensitive) to the Filter to be applied to the field. Only rows where all the fields match filter value will be considered for the contract.
staleness_threshold_days
pydantic-field
staleness_threshold_days = 7
The amount of days that no new data can be found before the datasource is considered stale.
timestamp_column
pydantic-field
timestamp_column = None
The name of the column in the data representing the timestamp. Currently this is only used in bigquery or delta table based datasources.
primary_key_columns
pydantic-field
primary_key_columns
The names of the columns in the data representing the primary key. If defined, These will be provided along with examples whenever they are provided.
manual_trigger
pydantic-field
manual_trigger = False
If true, Upriver will not update the contract automatically and will only monitor when triggered by the user.
pause_new_derived_datasources
pydantic-field
pause_new_derived_datasources = False
If true, Upriver will set derived datasources (derived by pivot field value) to paused state upon creation.
DatasourceId
pydantic-model
A class representing the identity of a datasource. Used to reference other datasources without loading the full datasource object.
Fields:
-
upriver_id(str) -
name(str)
model_config
class-attribute
instance-attribute
model_config = frozen_config
upriver_id
pydantic-field
upriver_id
The unique identifier of the datasource in Upriver's system.
name
pydantic-field
name
Name representing the datasource in the data set catalog and other pages, such as incidents and notifications. This name can be modified to provide a more meaningful identifier than the one assigned to the physical storage.
DatasourceIncidentConfiguration
pydantic-model
Configurations that alter the incidents raised by the datasource.
Fields:
-
default_severity(IncidentSeverity) -
notification_webhook_url(Optional[str])
notification_webhook_url
pydantic-field
notification_webhook_url = None
The webhook URL to be called when an incident is raised.
DatasourceMetadataExpectations
pydantic-model
Metadata expectations configurations that can be set over the datasource. All the expectations are optional, and can be set independently. Ranged expectations can be set with a lower bound, an upper bound, or both. Setting any bound to None means that the expectation will not be set (or will be removed if it was set before).
Fields:
-
row_count(Optional[RangedExpectation]) -
freshness(Optional[RangedExpectation]) -
row_uniqueness(Optional[RangedExpectation])
row_count
pydantic-field
row_count = None
Row count expectation over the datasource.
freshness
pydantic-field
freshness = None
Freshness expectation over the datasource. Freshness is measure of time between the last modified time of the data and Upriver's task run time, given in seconds. For example if the data should be written at most once a day the freshness upper bound should be 24 * 60 * 60.
row_uniqueness
pydantic-field
row_uniqueness = None
Row uniqueness expectation over the datasource. Row uniqueness is the percentage of unique rows in the data. Possible values between 0 and 1, where 1 means each row is unique.
DatasourceSchema
pydantic-model
Fields:
-
title(Optional[str]) -
properties(Dict[str, SchemaField]) -
type(str)
title
pydantic-field
title = None
properties
pydantic-field
properties
type
pydantic-field
type = 'object'
DatasourceSettings
pydantic-model
Configurations that can be edited after the datasource is created.
Fields:
-
id(Optional[str]) -
name(Optional[str]) -
connection_configuration(Optional[DatasourceConnectionConfiguration]) -
owners(Optional[List[str]]) -
tags(Optional[List[str]]) -
sampling_rate(Optional[float]) -
update_interval_minutes(Optional[float]) -
manual_trigger(Optional[bool]) -
pivot_fields(Optional[Tuple[str, ...]]) -
pause_new_derived_datasources(Optional[bool]) -
filter(Optional[Dict[str, Filter]]) -
cardinality_threshold(Optional[int]) -
default_severity(Optional[IncidentSeverity]) -
staleness_threshold_days(Optional[PositiveInt]) -
timestamp_column(Optional[str]) -
notification_webhook_url(Optional[str]) -
nullify_empty_strings(Optional[bool]) -
nullify_empty_arrays(Optional[bool])
id
pydantic-field
id = None
An optional unique identifier for the datasource defined by the user.
name
pydantic-field
name = None
Name representing the datasource in the data set catalog and other pages, such as incidents and notifications.
connection_configuration
pydantic-field
connection_configuration = None
The connection configuration for the datasource. Note: For data integrations, only the id field can be modified to change which integration the datasource connects to. All other integration fields are read-only and cannot be edited.
owners
pydantic-field
owners = None
Assign users to a data source by choosing from the list of platform-accessible users. This functionality promotes collaboration by clearly designating responsibility for each data source. When a user is assigned as an owner, they are tagged and notified of any incidents related to their data source, ensuring prompt attention and resolution.
tags
pydantic-field
tags = None
Tags to be associated with the datasource. Tags can be used to filter datasources in the UI.
sampling_rate
pydantic-field
sampling_rate = None
Sampling rate for the data to be used to generate the contract, between 0 and 1. If None, the system will determine and update the sampling rate automatically based on the volume of data.
update_interval_minutes
pydantic-field
update_interval_minutes = None
The time in minutes between contract updates. If None, the system will determine and update the interval automatically as needed.
manual_trigger
pydantic-field
manual_trigger = None
If true, Upriver will not update the contract automatically and will only monitor when triggered by the user.
pivot_fields
pydantic-field
pivot_fields = None
Pivot fields used to split the data into different contracts. As of this SDK version’s release, Upriver supports a maximum of 2 pivot fields.
pause_new_derived_datasources
pydantic-field
pause_new_derived_datasources = None
If true, Upriver will set derived datasources (derived by pivot field value) to paused state upon creation.
filter
pydantic-field
filter = None
A map from field name (case sensitive) to the Filter to be applied to the field. Only rows where all the fields match filter value will be considered for the contract.
cardinality_threshold
pydantic-field
cardinality_threshold = None
The threshold for the cardinality of a column to be considered a categorical column. If the column has less than cardinality_threshold unique values, it will be considered a categorical column. Categorial columns will generate an exact histogram of the values provided, and monitor both the specific values existence and their distribution.
default_severity
pydantic-field
default_severity = None
The default severity of incidents.
staleness_threshold_days
pydantic-field
staleness_threshold_days = None
The amount of days that no new data can be found before the datasource is considered stale.
timestamp_column
pydantic-field
timestamp_column = None
The name of the column in the data representing the timestamp. Currently this is only used in bigquery or delta table based datasources.
notification_webhook_url
pydantic-field
notification_webhook_url = None
The webhook URL to be called when an incident is raised.
nullify_empty_strings
pydantic-field
nullify_empty_strings = None
If true, Upriver will consider an empty string as a null, reducing completeness and not affecting the distribution of the lengths of the field.
nullify_empty_arrays
pydantic-field
nullify_empty_arrays = None
If true, Upriver will consider an empty array as a null, reducing completeness and not affecting the distribution of the lengths of the field.
MonitoringStatus
The current status of the datasource.
INITIALIZING
class-attribute
instance-attribute
INITIALIZING = 'INITIALIZING'
The data source has been identified and has still not seen a single run.
RUNNING
class-attribute
instance-attribute
RUNNING = 'RUNNING'
The data source is continuously monitored by Upriver
ERROR
class-attribute
instance-attribute
ERROR = 'ERROR'
The datasource has encountered an error and is not currently capable of running on the data.
PAUSED
class-attribute
instance-attribute
PAUSED = 'PAUSED'
The data source is not being monitored because the user explicitly decided to stop the monitor.
STALE
class-attribute
instance-attribute
STALE = 'STALE'
The data source has not been updated for more than the specified threshold.
LEARNING
class-attribute
instance-attribute
LEARNING = 'LEARNING'
The data source has already performed at least one run, however it is still waiting for more update iterations in order to define a more stable baseline.
NewDatasource
pydantic-model
Represents a request to create a new datasource.
Fields:
-
pause_on_creation(bool)
id
pydantic-field
id = None
An optional unique identifier for the datasource defined by the user.
name
pydantic-field
name
Name representing the datasource in the data set catalog and other pages, such as incidents and notifications. This name can be modified to provide a more meaningful identifier than the one assigned to the physical storage.
owners
pydantic-field
owners
Assign users to a data source by choosing from the list of platform-accessible users. This functionality promotes collaboration by clearly designating responsibility for each data source. When a user is assigned as an owner, they are tagged and notified of any incidents related to their data source, ensuring prompt attention and resolution.
data_connection_configuration
pydantic-field
data_connection_configuration
contract_generation_configuration
pydantic-field
contract_generation_configuration
incident_configuration
pydantic-field
incident_configuration
pivot_fields
pydantic-field
pivot_fields = ()
Pivot fields used to split the data into different contracts. As of this SDK version’s release, Upriver supports a maximum of 2 pivot fields.
tags
pydantic-field
tags = []
Tags allows to organize and classify data sources for more efficient management and enhanced access controls.
is_key_asset
pydantic-field
is_key_asset = False
If true, the datasource is a key asset.
pause_on_creation
pydantic-field
pause_on_creation = False
If true, the datasource will be paused upon creation.
NewDatasourceResponse
pydantic-model
RangedExpectation
pydantic-model
Fields:
-
lower_bound(Optional[float | int]) -
upper_bound(Optional[float | int])
lower_bound
pydantic-field
lower_bound = None
upper_bound
pydantic-field
upper_bound = None
SchemaField
pydantic-model
Fields:
-
type(Optional[str]) -
format(Optional[str]) -
properties(Optional[Dict[str, SchemaField]])
type
pydantic-field
type = None
format
pydantic-field
format = None
properties
pydantic-field
properties = None
BigQueryConfiguration
pydantic-model
DataFormat
The format of the stored data.
JSON
class-attribute
instance-attribute
JSON = 'json'
PARQUET
class-attribute
instance-attribute
PARQUET = 'parquet'
CSV
class-attribute
instance-attribute
CSV = 'csv'
DELTA_TABLE
class-attribute
instance-attribute
DELTA_TABLE = 'delta_table'
ORC
class-attribute
instance-attribute
ORC = 'orc'
ICEBERG
class-attribute
instance-attribute
ICEBERG = 'iceberg'
DatasourceConnectionConfiguration
module-attribute
DatasourceConnectionConfiguration = Union[
KinesisConfiguration,
FileSystemConfiguration,
BigQueryConfiguration,
SnowflakeConfiguration,
RedshiftConfiguration,
]
DatasourceType
KINESIS
class-attribute
instance-attribute
KINESIS = 'kinesis'
S3
class-attribute
instance-attribute
S3 = 's3'
GCS
class-attribute
instance-attribute
GCS = 'gcs'
BIG_QUERY
class-attribute
instance-attribute
BIG_QUERY = 'big_query'
SNOWFLAKE
class-attribute
instance-attribute
SNOWFLAKE = 'snowflake'
REDSHIFT
class-attribute
instance-attribute
REDSHIFT = 'redshift'
is_filesystem
is_filesystem()
Source code in upriver/sdk/datasource/datasource_connection_configuration.py
17 18 | |
FileSystemConfiguration
pydantic-model
Fields:
-
type(FileSystemDatasourceType) -
bucket_path(str) -
bucket_prefix(str) -
region(str) -
data_format(DataFormat) -
additional_format_settings(Optional[AdditionalFormatSettings])
data_integration
pydantic-field
data_integration = None
type
pydantic-field
type
bucket_path
pydantic-field
bucket_path
The path of the bucket
bucket_prefix
pydantic-field
bucket_prefix
The prefix in the bucket to scan in the datasource
region
pydantic-field
region
The region in which the bucket is located
data_format
pydantic-field
data_format
The format of the data stored in the bucket
additional_format_settings
pydantic-field
additional_format_settings = None
Additional format settings for the data stored in the bucket.
KinesisConfiguration
pydantic-model
Fields:
-
type(Literal[KINESIS]) -
stream_name(str) -
region(str)
data_integration
pydantic-field
data_integration = None
stream_name
pydantic-field
stream_name
The name of the Kinesis stream
region
pydantic-field
region
The AWS region of the Kinesis stream
SnowflakeConfiguration
pydantic-model
Fields:
data_integration
pydantic-field
data_integration = None
host
pydantic-field
host
database
pydantic-field
database
schema_name
pydantic-field
schema_name
table
pydantic-field
table
DatasourceStatus
pydantic-model
Fields:
-
monitoring_status(MonitoringStatus) -
incident_status(IncidentStatus) -
children_incident_status(Optional[IncidentStatus]) -
has_sla(bool) -
unresolved_incidents_count(int) -
last_data_detected_at(Optional[datetime])
monitoring_status
pydantic-field
monitoring_status
incident_status
pydantic-field
incident_status
children_incident_status
pydantic-field
children_incident_status
The aggegated status of this datasource and all of its children datasources, if it has any.
has_sla
pydantic-field
has_sla
unresolved_incidents_count
pydantic-field
unresolved_incidents_count
last_data_detected_at
pydantic-field
last_data_detected_at
Filter
operator
instance-attribute
operator
value
instance-attribute
value
FilterOperator
Currently Upriver only supports equlity filters. Additional filters are planned to be added in future releases.
EQUALS
class-attribute
instance-attribute
EQUALS = 'equals'
NOT_EQUALS
class-attribute
instance-attribute
NOT_EQUALS = 'not_equals'
FilterValue
module-attribute
FilterValue = int | float | str | bool | None
RunResponse
pydantic-model
RunStatus
IN_PROGRESS
class-attribute
instance-attribute
IN_PROGRESS = 'in progress'
RUN_FAILED
class-attribute
instance-attribute
RUN_FAILED = 'run failed'
EMPTY_RUN
class-attribute
instance-attribute
EMPTY_RUN = 'empty run'
NO_INCIDENTS
class-attribute
instance-attribute
NO_INCIDENTS = 'no incidents'
MINOR_INCIDENTS
class-attribute
instance-attribute
MINOR_INCIDENTS = 'minor incidents'
MAJOR_INCIDENTS
class-attribute
instance-attribute
MAJOR_INCIDENTS = 'major incidents'