Preprocess

class datalake_ingestion.preprocess.Preprocessor(datalake)[source]

Abstract class for file preprocessing.

Parameters

datalake (datalake.Datalake) – a datalake framework instance

abstract action(metric, storage, path, path_extracts, catalog_entry, **kwargs)[source]

Main method is called by the Collector. The preprocessor logic must be implemented starting with this method.

Parameters
  • metric (datalake.telemetry.Measurement) – a measurement point that will be sent after the action finished. Any custom labels and metrics can be added during the action tasks

  • storage (datalake.interface.IStorage) – the storage interface bound to the input file

  • path (str) – the path of the input file to process

  • path_extracts (dict) – custom parameters extracted from the file path

  • catalog_entry (dict) – the catalog entry (may be None)

  • kwargs (dict) – additionnal parameters from the collect configuration (see action_params)

Raises
property logger

Returns a logger for this preprocessor