Collect¶
- class datalake_ingestion.collect.Collector(datalake, collect_config)[source]¶
Runs the main ingestion workflow.
- Parameters
datalake (datalake.Datalake) – a datalake framework instance
collect_config (list(dict)) – a collect configuration
- identify(path)[source]¶
Searches the collect configuration for a match with the given file path
- Parameters
path (str) – the file path to identify
- Returns
the configuration entry
dictif an entry is found,Noneotherwise. The values captured from the path are stored in thedictunder the pattern_extract key
- process(storage, path)[source]¶
Identifies the file path and runs the preprocessor.
Also builds a Measurement and sends it to the telemetry backend
- Parameters
storage (datalake.interface.IStorage) – the input storage
path (str) – the file path to process