Preprocess¶
- class datalake_ingestion.preprocess.Preprocessor(datalake)[source]¶
Abstract class for file preprocessing.
- Parameters
datalake (datalake.Datalake) – a datalake framework instance
- abstract action(metric, storage, path, path_extracts, catalog_entry, **kwargs)[source]¶
Main method is called by the Collector. The preprocessor logic must be implemented starting with this method.
- Parameters
metric (datalake.telemetry.Measurement) – a measurement point that will be sent after the action finished. Any custom labels and metrics can be added during the action tasks
storage (datalake.interface.IStorage) – the storage interface bound to the input file
path (str) – the path of the input file to process
path_extracts (dict) – custom parameters extracted from the file path
catalog_entry (dict) – the catalog entry (may be
None)kwargs (dict) – additionnal parameters from the collect configuration (see
action_params)
- Raises
PreprocessFailed – when any error occured during the action tasks
HackDetected – when a suspicious content is detected
- property logger¶
Returns a logger for this preprocessor