API Reference

class datalake.Datalake(config={})[source]
Parameters

config (dict) – configuration parameters in key/value pairs

property csv_dialect

Returns the configured csv.Dialect instance

download(store, key, filepath, path_params=None)[source]

Downloads the specified catalog entry from a store to a local file

Parameters
  • store (str) – the name of the store

  • key (str) – the catalog key for the file to download

  • filepath (str) – the local file path

  • path_params (dict) – a map of key/value pair to fill the key path placeholders

get_entry(key)[source]

Get the catalog definition for an entry key

Returns

a dict

get_entry_path(key, path_params=None, strict=False)[source]

Builds a path for the specified entry with the given parameters

get_entry_path_resolved(store, key, path_params=None, strict=False)[source]

Returns the resolved path for the specified entry with the given parameters and a storage for the specifed store

get_secret(name)[source]

Get a Secret instance for the provider secret name

Parameters

name (str) – the name of the secret (depending on the underlying provider)

Returns

A concrete instance of datalake.interface.ISecret

get_storage(bucket)[source]

Get a Storage instance for the provided bucket

Parameters

bucket (str) – the name of the bucket (depending on the underlying provider)

Returns

A concrete instance of datalake.interface.IStorage

identify(path)[source]

Returns a tuple with the catalog entry that matches the path and a dict with the path’s placeholders

list_entry_files(store, key, path_params=None)[source]

Returns the list of files in a store for the specified catalog entry

property monitor

Returns the concrete implementation for datalake.interface.IMonitor

new_dataset_builder(key, path=None, lang='en_US', date_formats=None, ciphered=False)[source]
new_dataset_reader(store, key, path_params=None, ciphered=False)[source]
property provider

Returns the configured name of the provider

resolve_path(store, path)[source]

Resolves a path in a store name to a fully qualified bucket path

Parameters
  • store (str) – the name of a store

  • path (str) – the path to resolve in the store

Returns

a tuple (bucket, path, uri) with the bucket name, the full path and a fully qualified URI

upload(filepath, store, key, path_params=None, content_type='text/plain', encoding='utf-8', metadata={})[source]

Uploads a local file in a store as the specified catalog entry

class datalake.ServiceDiscovery(provider, monitoring=None, *args, **kwargs)[source]

Cloud resources reslover

Parameters
  • provider (str) – the cloud provider ("aws", "azure" or "gcp") or "local"

  • monitoring (dict) – the monitoring implementation spec

Example

local provider with a console monitoring:

monitoring_spec = {
    "class": "NoMonitor",
    "params": {
        "quiet": False
    }
}
service_discovery = ServiceDiscovery("local", monitoring_spec)

Google cloud typical setup:

monitoring_spec = {
    "class": "datalake.provider.gcp.GoogleMonitor",
    "params": {
        "project_id": "my-google-project-id"
    }
}
service_discovery = ServiceDiscovery("gcp", monitoring_spec)
Raises
get_secret(name)[source]

Returns a datalake.interface.ISecret instance

Parameters

name (str) – the name of the secret to fetch

get_storage(bucket)[source]

Returns a datalake.interface.IStorage instance

Parameters

bucket (str) – the name of the bucket to fetch

property monitor

Returns a datalake.interface.IMonitor instance