API Reference¶
- class datalake.Datalake(config={})[source]¶
- Parameters
config (dict) – configuration parameters in key/value pairs
- property csv_dialect¶
Returns the configured
csv.Dialectinstance
- download(store, key, filepath, path_params=None)[source]¶
Downloads the specified catalog entry from a store to a local file
- Parameters
store (str) – the name of the store
key (str) – the catalog key for the file to download
filepath (str) – the local file path
path_params (dict) – a map of key/value pair to fill the key path placeholders
- get_entry_path(key, path_params=None, strict=False)[source]¶
Builds a path for the specified entry with the given parameters
- get_entry_path_resolved(store, key, path_params=None, strict=False)[source]¶
Returns the resolved path for the specified entry with the given parameters and a storage for the specifed store
- get_secret(name)[source]¶
Get a Secret instance for the provider secret name
- Parameters
name (str) – the name of the secret (depending on the underlying provider)
- Returns
A concrete instance of
datalake.interface.ISecret
- get_storage(bucket)[source]¶
Get a Storage instance for the provided bucket
- Parameters
bucket (str) – the name of the bucket (depending on the underlying provider)
- Returns
A concrete instance of
datalake.interface.IStorage
- identify(path)[source]¶
Returns a tuple with the catalog entry that matches the path and a dict with the path’s placeholders
- list_entry_files(store, key, path_params=None)[source]¶
Returns the list of files in a store for the specified catalog entry
- property monitor¶
Returns the concrete implementation for
datalake.interface.IMonitor
- property provider¶
Returns the configured name of the provider
- class datalake.ServiceDiscovery(provider, monitoring=None, *args, **kwargs)[source]¶
Cloud resources reslover
- Parameters
provider (str) – the cloud provider (
"aws","azure"or"gcp") or"local"monitoring (dict) – the monitoring implementation spec
Example
local provider with a console monitoring:
monitoring_spec = { "class": "NoMonitor", "params": { "quiet": False } } service_discovery = ServiceDiscovery("local", monitoring_spec)
Google cloud typical setup:
monitoring_spec = { "class": "datalake.provider.gcp.GoogleMonitor", "params": { "project_id": "my-google-project-id" } } service_discovery = ServiceDiscovery("gcp", monitoring_spec)
- Raises
DatalakeError – when the provider is invalid
BadConfiguration – when monitoring spec is invalid
- get_secret(name)[source]¶
Returns a
datalake.interface.ISecretinstance- Parameters
name (str) – the name of the secret to fetch
- get_storage(bucket)[source]¶
Returns a
datalake.interface.IStorageinstance- Parameters
bucket (str) – the name of the bucket to fetch
- property monitor¶
Returns a
datalake.interface.IMonitorinstance