enb.compression package

Submodules

enb.compression.codec module

Codecs implement the compress and decompress methods as well as a name and label to identify and represent them.

A param_dict is passed on initialization that describes the configuration of each codec instance. Codecs may choose the number of parameters and their names.

class enb.compression.codec.AbstractCodec(param_dict=None)

Bases: ExperimentTask

Base class for all codecs.

__init__(param_dict=None)

Parameters:: param_dict – dictionary of parameters for this codec instance.

compress(original_path: str, compressed_path: str, original_file_info=None) → CompressionResults

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

compression_results_from_paths(original_path, compressed_path): Get the default CompressionResults instance corresponding to the compression of original_path into compressed_path

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

decompression_results_from_paths(compressed_path, reconstructed_path): Return a enb.icompression.DecompressionResults instance given the compressed and reconstructed paths.

property label: Label to be displayed for the codec. May not be strictly unique nor fully informative. By default, self’s class name is returned.

property name: Name of the codec. Subclasses are expected to yield different values when different parameters are used. By default, the class name is folled by all elements in self.param_dict sorted alphabetically are included in the name.

class enb.compression.codec.LosslessCodec(param_dict=None)

Bases: AbstractCodec

An AbstractCodec that identifies itself as lossless.

class enb.compression.codec.LossyCodec(param_dict=None)

Bases: AbstractCodec

An AbstractCodec that identifies itself as lossy.

class enb.compression.codec.NearLosslessCodec(param_dict=None)

Bases: LossyCodec

An AbstractCodec that identifies itself as near lossless.

class enb.compression.codec.PassthroughCodec

Bases: LosslessCodec

Codec that simply copies the input into the output in both compression and decompression.

__init__()

Parameters:: param_dict – dictionary of parameters for this codec instance.

compress(original_path: str, compressed_path: str, original_file_info=None)

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

enb.compression.compression module

Data compression tools common to any compression modality.

exception enb.compression.compression.CompressionException(original_path=None, compressed_path=None, file_info=None, status=None, output=None)

Bases: Exception

Base class for exceptions occurred during a compression instance

__init__(original_path=None, compressed_path=None, file_info=None, status=None, output=None)

class enb.compression.compression.CompressionExperiment(codecs, dataset_paths=None, csv_experiment_path=None, csv_dataset_path=None, dataset_info_table=None, overwrite_file_properties=False, reconstructed_dir_path=None, compressed_copy_dir_path=None, task_families=None)

Bases: Experiment

This class allows seamless execution of compression experiments.

In the functions decorated with @atable,column_function, the row argument contains two magic properties, compression_results and decompression_results. These give access to the CompressionResults and DecompressionResults instances resulting respectively from compressing and decompressing according to the row index parameters. The paths referenced in the compression and decompression results are valid while the row is being processed, and are disposed of afterwards. Also, the image_info_row attribute gives access to the image metainformation (e.g., geometry)

class CompressionDecompressionWrapper(file_path, codec, image_info_row, reconstructed_copy_dir=None, compressed_copy_dir=None)

Bases: object

This class is instantiated for each row of the table, and added to a temporary column row_wrapper_column_name. Column-setting methods can then access this wrapper, and in particular its compression_results and decompression_results properties, which will run compression and decompression at most once. This way, many columns can be defined independently without needing to compress and decompress for each one.

__init__(file_path, codec, image_info_row, reconstructed_copy_dir=None, compressed_copy_dir=None)

Parameters:

file_path – path to the original image being compressed
codec – AbstractCodec instance to be used for compression/decompression
image_info_row – dict-like object with geometry and data type information about file_path
reconstructed_copy_dir – if not None, a copy of the reconstructed images is stored, based on the class of codec.
compressed_copy_dir – if not None, a copy of the compressed images is stored, based on the class of codec.

property compression_results: CompressionResults: Perform the actual compression experiment for the selected row.

property decompression_results: DecompressionResults: Perform the actual decompression experiment for the selected row.

property numpy_dtype: Get the numpy dtype corresponding to the original image’s data format

__init__(codecs, dataset_paths=None, csv_experiment_path=None, csv_dataset_path=None, dataset_info_table=None, overwrite_file_properties=False, reconstructed_dir_path=None, compressed_copy_dir_path=None, task_families=None)

Parameters:

codecs – list of AbstractCodec instances. Note that codecs are compatible with the interface of ExperimentTask.
dataset_paths – list of paths to the files to be used as input for compression. If it is None, this list is obtained automatically from the configured base dataset dir.
csv_experiment_path – if not None, path to the CSV file giving persistence support to this experiment. If None, it is automatically determined within options.persistence_dir.
csv_dataset_path – if not None, path to the CSV file given persistence support to the dataset file properties. If None, it is automatically determined within options.persistence_dir.
dataset_info_table – if not None, it must be a ImagePropertiesTable instance or subclass instance that can be used to obtain dataset file metainformation, and/or gather it from csv_dataset_path. If None, a new ImagePropertiesTable instance is created and used for this purpose.
overwrite_file_properties – if True, file properties are recomputed before starting the experiment. Useful for temporary and/or random datasets. Note that overwrite control for the experiment results themselves is controlled in the call to get_df
reconstructed_dir_path – if not None, a directory where reconstructed images are to be stored.
compressed_copy_dir_path – if not None, it gives the directory where a copy of the compressed images. is to be stored. If may not be generated for images for which all columns are known
task_families – if not None, it must be a list of TaskFamily instances. It is used to set the “family_label” column for each row. If the codec is not found within the families, a default label is set indicating so.

property codecs

Returns:: an iterable of defined codecs

property codecs_by_name: Alias for tasks_by_name

column_to_properties = {'bpppc': ColumnProperties('name'='bpppc', 'fun'=<function CompressionExperiment.set_bpppc>, 'label'='Compressed data rate (bpppc)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_file_sha256': ColumnProperties('name'='compressed_file_sha256', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'="Compressed file's SHA256", 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_size_bytes': ColumnProperties('name'='compressed_size_bytes', 'fun'=<function CompressionExperiment.set_compressed_data_size>, 'label'='Compressed data size (Bytes)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_efficiency_1byte_entropy': ColumnProperties('name'='compression_efficiency_1byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 1byte entropy', 'plot_min'=0, 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (1bytes entropy)'), 'compression_efficiency_2byte_entropy': ColumnProperties('name'='compression_efficiency_2byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 2byte entropy', 'plot_min'=0, 'plot_max'=2, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (2bytes entropy)'), 'compression_efficiency_4byte_entropy': ColumnProperties('name'='compression_efficiency_4byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 4byte entropy', 'plot_min'=0, 'plot_max'=4, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (4bytes entropy)'), 'compression_memory_kb': ColumnProperties('name'='compression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio': ColumnProperties('name'='compression_ratio', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio_dr': ColumnProperties('name'='compression_ratio_dr', 'fun'=<function CompressionExperiment.set_compression_ratio_dr>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_time_seconds': ColumnProperties('name'='compression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_memory_kb': ColumnProperties('name'='decompression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_time_seconds': ColumnProperties('name'='decompression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'family_label': ColumnProperties('name'='family_label', 'fun'=<function Experiment.set_family_label>, 'label'='Family label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'lossless_reconstruction': ColumnProperties('name'='lossless_reconstruction', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Lossless?', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'param_dict': ColumnProperties('name'='param_dict', 'fun'=<function Experiment.set_param_dict>, 'label'='Param dict', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=True, 'has_iterable_values'=False, 'has_object_values'=False), 'repetitions': ColumnProperties('name'='repetitions', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Number of compression/decompression repetitions', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_apply_time': ColumnProperties('name'='task_apply_time', 'fun'=<function Experiment.set_task_apply_time>, 'label'='Task apply time', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_label': ColumnProperties('name'='task_label', 'fun'=<function Experiment.set_task_label>, 'label'='Task label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_name': ColumnProperties('name'='task_name', 'fun'=<function Experiment.set_task_name>, 'label'='Task name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

property compression_results: CompressionResults: Get the current compression results from self.codec_results. This property is intended to be read from functions that set columns of a row. It triggers the compression of that row’s sample with that row’s codec if it hasn’t been compressed yet. Otherwise, None is returned.

compute_one_row(filtered_df, index, loc, column_fun_tuples, overwrite)

Process a single row of an ATable instance, returning a Series object corresponding to that row. If an error is detected, an exception is returned instead of a Series. Note that the exception is not raised here, but intended to be detected by the compute_target_rows(), i.e., the dispatcher function.

Parameters:

filtered_df – |DataFrame| retrieved from persistent storage, with index compatible with loc. The loc argument itself needs not be present in filtered_df, but is used to avoid recomputing in case overwrite is not True and columns had been set.
index – index value or values corresponding to the row to be processed.
loc – location compatible with .loc of filtered_df (although it might not be present there), and that will be set into the full loaded_df also using its .loc accessor.
column_fun_tuples – a list of (column, fun) tuples, where fun is to be invoked to fill column
overwrite – if True, existing values are overwritten with newly computed data. Otherwise, only missing or None columns are populated (and therefore only their column functions called)

Returns:

a pandas.Series instance corresponding to this row, with a column named as given by self.private_index_column set to the loc argument passed to this function.

dataset_files_extension = 'raw': Default input sample extension. If affects the result of enb.atable.get_all_test_files,

property decompression_results: DecompressionResults: Get the current decompression results from self.codec_results. This property is intended to be read from functions that set columns of a row. It triggers the compression and decompression of that row’s sample with that row’s codec if they have not been compressed yet. Otherwise, None is returned.

default_file_properties_table_class: alias of ImagePropertiesTable

row_wrapper_column_name = '_codec_wrapper'

set_bpppc(index, row)

set_comparison_results(index, row): Perform a compression-decompression cycle and store the comparison results

set_compressed_data_size(index, row)

set_compression_ratio_dr(index, row): Set the compression ratio calculated based on the dynamic range of the input samples, as opposed to 8*bytes_per_sample.

set_efficiency(index, row)

class enb.compression.compression.CompressionResults(codec_name=None, codec_param_dict=None, original_path=None, compressed_path=None, compression_time_seconds=None, maximum_memory_kb=None)

Bases: object

Base class that defines the minimal fields that are returned by a call to a coder’s compress() method (or produced by the CompressionExperiment instance).

__init__(codec_name=None, codec_param_dict=None, original_path=None, compressed_path=None, compression_time_seconds=None, maximum_memory_kb=None)

Parameters:

codec_name – codec’s reported_name
codec_param_dict – dictionary of parameters to the codec
original_path – path to the input original file
compressed_path – path to the output compressed file
compression_time_seconds – effective average compression time in seconds
maximum_memory_kb – maximum resident memory in kilobytes

exception enb.compression.compression.DecompressionException(compressed_path=None, reconstructed_path=None, file_info=None, status=None, output=None)

Bases: Exception

Base class for exceptions occurred during a decompression instance

__init__(compressed_path=None, reconstructed_path=None, file_info=None, status=None, output=None)

class enb.compression.compression.DecompressionResults(codec_name=None, codec_param_dict=None, compressed_path=None, reconstructed_path=None, decompression_time_seconds=None, maximum_memory_kb=None)

Bases: object

Base class that defines the minimal fields that are returned by a call to a coder’s decompress() method (or produced by the CompressionExperiment instance).

__init__(codec_name=None, codec_param_dict=None, compressed_path=None, reconstructed_path=None, decompression_time_seconds=None, maximum_memory_kb=None)

Parameters:

codec_name – codec’s reported_name
codec_param_dict – dictionary of parameters to the codec
compressed_path – path to the output compressed file
reconstructed_path – path to the reconstructed file after decompression
decompression_time_seconds – effective decompression time in seconds
maximum_memory_kb – maximum resident memory in kilobytes

class enb.compression.compression.GeneralLosslessExperiment(codecs, dataset_paths=None, csv_experiment_path=None, csv_dataset_path=None, dataset_info_table=None, overwrite_file_properties=False, reconstructed_dir_path=None, compressed_copy_dir_path=None, task_families=None)

Bases: LosslessCompressionExperiment

Lossless compression experiment for general data contents.

codec_results: CompressionDecompressionWrapper | None

column_to_properties = {'bpppc': ColumnProperties('name'='bpppc', 'fun'=<function CompressionExperiment.set_bpppc>, 'label'='Compressed data rate (bpppc)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_file_sha256': ColumnProperties('name'='compressed_file_sha256', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'="Compressed file's SHA256", 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_size_bytes': ColumnProperties('name'='compressed_size_bytes', 'fun'=<function CompressionExperiment.set_compressed_data_size>, 'label'='Compressed data size (Bytes)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_efficiency_1byte_entropy': ColumnProperties('name'='compression_efficiency_1byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 1byte entropy', 'plot_min'=0, 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (1bytes entropy)'), 'compression_efficiency_2byte_entropy': ColumnProperties('name'='compression_efficiency_2byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 2byte entropy', 'plot_min'=0, 'plot_max'=2, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (2bytes entropy)'), 'compression_efficiency_4byte_entropy': ColumnProperties('name'='compression_efficiency_4byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 4byte entropy', 'plot_min'=0, 'plot_max'=4, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (4bytes entropy)'), 'compression_memory_kb': ColumnProperties('name'='compression_memory_kb', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Compression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio': ColumnProperties('name'='compression_ratio', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio_dr': ColumnProperties('name'='compression_ratio_dr', 'fun'=<function CompressionExperiment.set_compression_ratio_dr>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_time_seconds': ColumnProperties('name'='compression_time_seconds', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Compression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_memory_kb': ColumnProperties('name'='decompression_memory_kb', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Decompression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_time_seconds': ColumnProperties('name'='decompression_time_seconds', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Decompression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'family_label': ColumnProperties('name'='family_label', 'fun'=<function Experiment.set_family_label>, 'label'='Family label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'lossless_reconstruction': ColumnProperties('name'='lossless_reconstruction', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Lossless?', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'param_dict': ColumnProperties('name'='param_dict', 'fun'=<function Experiment.set_param_dict>, 'label'='Param dict', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=True, 'has_iterable_values'=False, 'has_object_values'=False), 'repetitions': ColumnProperties('name'='repetitions', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Number of compression/decompression repetitions', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_apply_time': ColumnProperties('name'='task_apply_time', 'fun'=<function Experiment.set_task_apply_time>, 'label'='Task apply time', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_label': ColumnProperties('name'='task_label', 'fun'=<function Experiment.set_task_label>, 'label'='Task label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_name': ColumnProperties('name'='task_name', 'fun'=<function Experiment.set_task_name>, 'label'='Task name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

dataset_files_extension = '': Default input sample extension. If affects the result of enb.atable.get_all_test_files,

default_file_properties_table_class: alias of GenericFilePropertiesTable

class enb.compression.compression.GenericFilePropertiesTable(csv_support_path=None, base_dir=None)

Bases: ImagePropertiesTable

File properties table that considers the input path as a 1D, u8be array.

column_to_properties = {'big_endian': ColumnProperties('name'='big_endian', 'fun'=<function GenericFilePropertiesTable.set_image_geometry>, 'label'='Big endian', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'bytes_per_sample': ColumnProperties('name'='bytes_per_sample', 'fun'=<function GenericFilePropertiesTable.set_bytes_per_sample>, 'label'='Bytes per sample', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'component_count': ColumnProperties('name'='component_count', 'fun'=<function GenericFilePropertiesTable.set_image_geometry>, 'label'='Components', 'plot_min'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'corpus': ColumnProperties('name'='corpus', 'fun'=<function FilePropertiesTable.set_corpus>, 'label'='Corpus name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'dtype': ColumnProperties('name'='dtype', 'fun'=<function ImageGeometryTable.set_column_dtype>, 'label'='Numpy dtype', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'dynamic_range_bits': ColumnProperties('name'='dynamic_range_bits', 'fun'=<function ImagePropertiesTable.set_dynamic_range_bits>, 'label'='Dynamic range (bits)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'entropy_1B_bps': ColumnProperties('name'='entropy_1B_bps', 'fun'=<function ImagePropertiesTable.set_file_entropy>, 'label'='Entropy (bits, 1-byte samples)', 'plot_min'=0, 'plot_max'=8, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'entropy_2B_bps': ColumnProperties('name'='entropy_2B_bps', 'fun'=<function ImagePropertiesTable.set_file_entropy>, 'label'='Entropy (bits, 2-byte samples)', 'plot_min'=0, 'plot_max'=16, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'entropy_4B_bps': ColumnProperties('name'='entropy_4B_bps', 'fun'=<function ImagePropertiesTable.set_file_entropy>, 'label'='Entropy (bits, 4-byte samples)', 'plot_min'=0, 'plot_max'=32, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'float': ColumnProperties('name'='float', 'fun'=<function GenericFilePropertiesTable.set_image_geometry>, 'label'='Float', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'height': ColumnProperties('name'='height', 'fun'=<function GenericFilePropertiesTable.set_image_geometry>, 'label'='Height', 'plot_min'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sample_max': ColumnProperties('name'='sample_max', 'fun'=<function GenericFilePropertiesTable.set_sample_stats>, 'label'='Max sample value (byte samples)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sample_min': ColumnProperties('name'='sample_min', 'fun'=<function GenericFilePropertiesTable.set_sample_stats>, 'label'='Min sample value (byte samples)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'samples': ColumnProperties('name'='samples', 'fun'=<function ImageGeometryTable.set_samples>, 'label'='Sample count', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sha256': ColumnProperties('name'='sha256', 'fun'=<function FilePropertiesTable.set_hash_digest>, 'label'='sha256 hex digest', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'signed': ColumnProperties('name'='signed', 'fun'=<function ImageGeometryTable.set_signed>, 'label'='Signed samples', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'size_bytes': ColumnProperties('name'='size_bytes', 'fun'=<function FilePropertiesTable.set_file_size>, 'label'='File size (bytes)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'type_name': ColumnProperties('name'='type_name', 'fun'=<function ImageGeometryTable.set_type_name>, 'label'='Type name usable in file names', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'unique_sample_count': ColumnProperties('name'='unique_sample_count', 'fun'=<function GenericFilePropertiesTable.set_sample_stats>, 'label'='Number of different sample values', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'width': ColumnProperties('name'='width', 'fun'=<function GenericFilePropertiesTable.set_image_geometry>, 'label'='Width', 'plot_min'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

set_bytes_per_sample(file_path, row): Infer the number of bytes per sample based from the file path.

set_image_geometry(file_path, row): Obtain the image’s geometry (width, height and number of components) based on the filename tags (and possibly its size)

set_sample_stats(file_path, row): Set basic file statistics (unique count, min, max)

verify_file_size = False

class enb.compression.compression.LosslessCompressionExperiment(codecs, dataset_paths=None, csv_experiment_path=None, csv_dataset_path=None, dataset_info_table=None, overwrite_file_properties=False, reconstructed_dir_path=None, compressed_copy_dir_path=None, task_families=None)

Bases: CompressionExperiment

Lossless data compression experiment. It fails if lossless reconstruction is not achieved.

codec_results: CompressionDecompressionWrapper | None

column_to_properties = {'bpppc': ColumnProperties('name'='bpppc', 'fun'=<function CompressionExperiment.set_bpppc>, 'label'='Compressed data rate (bpppc)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_file_sha256': ColumnProperties('name'='compressed_file_sha256', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'="Compressed file's SHA256", 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_size_bytes': ColumnProperties('name'='compressed_size_bytes', 'fun'=<function CompressionExperiment.set_compressed_data_size>, 'label'='Compressed data size (Bytes)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_efficiency_1byte_entropy': ColumnProperties('name'='compression_efficiency_1byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 1byte entropy', 'plot_min'=0, 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (1bytes entropy)'), 'compression_efficiency_2byte_entropy': ColumnProperties('name'='compression_efficiency_2byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 2byte entropy', 'plot_min'=0, 'plot_max'=2, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (2bytes entropy)'), 'compression_efficiency_4byte_entropy': ColumnProperties('name'='compression_efficiency_4byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 4byte entropy', 'plot_min'=0, 'plot_max'=4, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (4bytes entropy)'), 'compression_memory_kb': ColumnProperties('name'='compression_memory_kb', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Compression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio': ColumnProperties('name'='compression_ratio', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio_dr': ColumnProperties('name'='compression_ratio_dr', 'fun'=<function CompressionExperiment.set_compression_ratio_dr>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_time_seconds': ColumnProperties('name'='compression_time_seconds', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Compression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_memory_kb': ColumnProperties('name'='decompression_memory_kb', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Decompression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_time_seconds': ColumnProperties('name'='decompression_time_seconds', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Decompression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'family_label': ColumnProperties('name'='family_label', 'fun'=<function Experiment.set_family_label>, 'label'='Family label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'lossless_reconstruction': ColumnProperties('name'='lossless_reconstruction', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Lossless?', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'param_dict': ColumnProperties('name'='param_dict', 'fun'=<function Experiment.set_param_dict>, 'label'='Param dict', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=True, 'has_iterable_values'=False, 'has_object_values'=False), 'repetitions': ColumnProperties('name'='repetitions', 'fun'=<function LosslessCompressionExperiment.set_comparison_results>, 'label'='Number of compression/decompression repetitions', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_apply_time': ColumnProperties('name'='task_apply_time', 'fun'=<function Experiment.set_task_apply_time>, 'label'='Task apply time', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_label': ColumnProperties('name'='task_label', 'fun'=<function Experiment.set_task_label>, 'label'='Task label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_name': ColumnProperties('name'='task_name', 'fun'=<function Experiment.set_task_name>, 'label'='Task name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

set_comparison_results(index, row): Perform a compression-decompression cycle and store the comparison results

enb.compression.fits module

FITS format manipulation tools. See https://fits.gsfc.nasa.gov/fits_documentation.html.

class enb.compression.fits.FITSVersionTable(original_base_dir, version_base_dir)

Bases: FileVersionTable, FilePropertiesTable

Read FITS files and convert them to raw files, sorting them by type ( integer or float) and by bits per pixel.

__init__(original_base_dir, version_base_dir)

Parameters:

version_base_dir – path to the versioned base directory (versioned directories preserve names and structure within the base dir)
original_base_dir – path to the original directory (it must contain all indices requested later with self.get_df()). If None, options.base_datset_dir is used

allowed_extensions = ['fit', 'fits']

column_to_properties = {'corpus': ColumnProperties('name'='corpus', 'fun'=<function FileVersionTable.set_corpus>, 'label'='Corpus name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'original_file_path': ColumnProperties('name'='original_file_path', 'fun'=<function FileVersionTable.set_original_file_path>, 'label'='Original file path', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sha256': ColumnProperties('name'='sha256', 'fun'=<function FilePropertiesTable.set_hash_digest>, 'label'='sha256 hex digest', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'size_bytes': ColumnProperties('name'='size_bytes', 'fun'=<function FilePropertiesTable.set_file_size>, 'label'='File size (bytes)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_name': ColumnProperties('name'='version_name', 'fun'=<function FileVersionTable.column_version_name>, 'label'='Version name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_time': ColumnProperties('name'='version_time', 'fun'=<function FileVersionTable.set_version_time>, 'label'='Versioning time (s)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

fits_extension = 'fit'

get_default_target_indices(): Get the list of samples in self.original_base_dir and its subdirs that have extension self.dataset_files_extension.

original_to_versioned_path(original_path): Get the path of the versioned file corresponding to original_path. This function will replicate the folder structure within self.original_base_dir.

set_version_repetitions(file_path, row): Set the number of times the versioning process is performed.

version(input_path, output_path, row)

Create a version of input_path and write it into output_path.

Parameters:

input_path – path to the file to be versioned
output_path – path where the version should be saved
row – metainformation available using super().get_df for input_path

Returns:

if not None, the time in seconds it took to perform the ( forward) versioning.

version_name = 'FitsToRaw'

class enb.compression.fits.FITSWrapperCodec(compressor_path, decompressor_path, param_dict=None, output_invocation_dir=None, signature_in_name=False)

Bases: WrapperCodec

Raw images are coded into FITS before compression with the wrapper, and FITS is decoded to raw after decompression.

compress(original_path: str, compressed_path: str, original_file_info=None)

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

enb.compression.icompression module

Image compression experiment module.

class enb.compression.icompression.GiciLibHelper

Bases: object

Definition of helper methods that can be used with software based on the GiciLibs (see gici.uab.cat/GiciWebPage/downloads.php).

file_info_to_data_str(original_file_info)

file_info_to_endianness_str(original_file_info)

get_gici_geometry_str(original_file_info): Get a string to be passed to the -ig or -og parameters. The ‘-ig’ or ‘-og’ part is not included in the returned string.

class enb.compression.icompression.LossyCompressionExperiment(codecs, dataset_paths=None, csv_experiment_path=None, csv_dataset_path=None, dataset_info_table=None, overwrite_file_properties=False, reconstructed_dir_path=None, compressed_copy_dir_path=None, task_families=None)

Bases: CompressionExperiment

Lossy compression of raw image files.

codec_results: CompressionDecompressionWrapper | None

column_to_properties = {'bpppc': ColumnProperties('name'='bpppc', 'fun'=<function CompressionExperiment.set_bpppc>, 'label'='Compressed data rate (bpppc)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_file_sha256': ColumnProperties('name'='compressed_file_sha256', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'="Compressed file's SHA256", 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_size_bytes': ColumnProperties('name'='compressed_size_bytes', 'fun'=<function CompressionExperiment.set_compressed_data_size>, 'label'='Compressed data size (Bytes)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_efficiency_1byte_entropy': ColumnProperties('name'='compression_efficiency_1byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 1byte entropy', 'plot_min'=0, 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (1bytes entropy)'), 'compression_efficiency_2byte_entropy': ColumnProperties('name'='compression_efficiency_2byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 2byte entropy', 'plot_min'=0, 'plot_max'=2, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (2bytes entropy)'), 'compression_efficiency_4byte_entropy': ColumnProperties('name'='compression_efficiency_4byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 4byte entropy', 'plot_min'=0, 'plot_max'=4, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (4bytes entropy)'), 'compression_memory_kb': ColumnProperties('name'='compression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio': ColumnProperties('name'='compression_ratio', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio_dr': ColumnProperties('name'='compression_ratio_dr', 'fun'=<function CompressionExperiment.set_compression_ratio_dr>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_time_seconds': ColumnProperties('name'='compression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_memory_kb': ColumnProperties('name'='decompression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_time_seconds': ColumnProperties('name'='decompression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'family_label': ColumnProperties('name'='family_label', 'fun'=<function Experiment.set_family_label>, 'label'='Family label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'lossless_reconstruction': ColumnProperties('name'='lossless_reconstruction', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Lossless?', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'mse': ColumnProperties('name'='mse', 'fun'=<function LossyCompressionExperiment.set_MSE>, 'label'='MSE', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'pae': ColumnProperties('name'='pae', 'fun'=<function LossyCompressionExperiment.set_PAE>, 'label'='PAE', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'param_dict': ColumnProperties('name'='param_dict', 'fun'=<function Experiment.set_param_dict>, 'label'='Param dict', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=True, 'has_iterable_values'=False, 'has_object_values'=False), 'psnr_bps': ColumnProperties('name'='psnr_bps', 'fun'=<function LossyCompressionExperiment.set_PSNR_nominal>, 'label'='PSNR (dB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'psnr_dr': ColumnProperties('name'='psnr_dr', 'fun'=<function LossyCompressionExperiment.set_PSNR_dynamic_range>, 'label'='PSNR (dB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'repetitions': ColumnProperties('name'='repetitions', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Number of compression/decompression repetitions', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_apply_time': ColumnProperties('name'='task_apply_time', 'fun'=<function Experiment.set_task_apply_time>, 'label'='Task apply time', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_label': ColumnProperties('name'='task_label', 'fun'=<function Experiment.set_task_label>, 'label'='Task label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_name': ColumnProperties('name'='task_name', 'fun'=<function Experiment.set_task_name>, 'label'='Task name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

set_MSE(index, row): Set the mean squared error of the reconstructed image.

set_PAE(index, row): Set the peak absolute error (maximum absolute pixelwise difference) of the reconstructed image.

set_PSNR_dynamic_range(index, row): Set the PSNR assuming dynamic range given by dynamic_range_bits.

set_PSNR_nominal(index, row): Set the PSNR assuming nominal dynamic range given by bytes_per_sample.

class enb.compression.icompression.SpectralAngleTable(codecs, dataset_paths=None, csv_experiment_path=None, csv_dataset_path=None, dataset_info_table=None, overwrite_file_properties=False, reconstructed_dir_path=None, compressed_copy_dir_path=None, task_families=None)

Bases: LossyCompressionExperiment

Lossy compression experiment that computes spectral angle “distance” measures between the compressed and the reconstructed images.

Subclasses of LossyCompressionExperiment may inherit from this one to automatically add the data columns defined here

codec_results: CompressionDecompressionWrapper | None

column_to_properties = {'bpppc': ColumnProperties('name'='bpppc', 'fun'=<function CompressionExperiment.set_bpppc>, 'label'='Compressed data rate (bpppc)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_file_sha256': ColumnProperties('name'='compressed_file_sha256', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'="Compressed file's SHA256", 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_size_bytes': ColumnProperties('name'='compressed_size_bytes', 'fun'=<function CompressionExperiment.set_compressed_data_size>, 'label'='Compressed data size (Bytes)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_efficiency_1byte_entropy': ColumnProperties('name'='compression_efficiency_1byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 1byte entropy', 'plot_min'=0, 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (1bytes entropy)'), 'compression_efficiency_2byte_entropy': ColumnProperties('name'='compression_efficiency_2byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 2byte entropy', 'plot_min'=0, 'plot_max'=2, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (2bytes entropy)'), 'compression_efficiency_4byte_entropy': ColumnProperties('name'='compression_efficiency_4byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 4byte entropy', 'plot_min'=0, 'plot_max'=4, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (4bytes entropy)'), 'compression_memory_kb': ColumnProperties('name'='compression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio': ColumnProperties('name'='compression_ratio', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio_dr': ColumnProperties('name'='compression_ratio_dr', 'fun'=<function CompressionExperiment.set_compression_ratio_dr>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_time_seconds': ColumnProperties('name'='compression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_memory_kb': ColumnProperties('name'='decompression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_time_seconds': ColumnProperties('name'='decompression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'family_label': ColumnProperties('name'='family_label', 'fun'=<function Experiment.set_family_label>, 'label'='Family label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'lossless_reconstruction': ColumnProperties('name'='lossless_reconstruction', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Lossless?', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'max_spectral_angle_deg': ColumnProperties('name'='max_spectral_angle_deg', 'fun'=<function SpectralAngleTable.set_spectral_distances>, 'label'='Max spectral angle (deg)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'mean_spectral_angle_deg': ColumnProperties('name'='mean_spectral_angle_deg', 'fun'=<function SpectralAngleTable.set_spectral_distances>, 'label'='Mean spectral angle (deg)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'mse': ColumnProperties('name'='mse', 'fun'=<function LossyCompressionExperiment.set_MSE>, 'label'='MSE', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'pae': ColumnProperties('name'='pae', 'fun'=<function LossyCompressionExperiment.set_PAE>, 'label'='PAE', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'param_dict': ColumnProperties('name'='param_dict', 'fun'=<function Experiment.set_param_dict>, 'label'='Param dict', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=True, 'has_iterable_values'=False, 'has_object_values'=False), 'psnr_bps': ColumnProperties('name'='psnr_bps', 'fun'=<function LossyCompressionExperiment.set_PSNR_nominal>, 'label'='PSNR (dB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'psnr_dr': ColumnProperties('name'='psnr_dr', 'fun'=<function LossyCompressionExperiment.set_PSNR_dynamic_range>, 'label'='PSNR (dB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'repetitions': ColumnProperties('name'='repetitions', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Number of compression/decompression repetitions', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_apply_time': ColumnProperties('name'='task_apply_time', 'fun'=<function Experiment.set_task_apply_time>, 'label'='Task apply time', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_label': ColumnProperties('name'='task_label', 'fun'=<function Experiment.set_task_label>, 'label'='Task label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_name': ColumnProperties('name'='task_name', 'fun'=<function Experiment.set_task_name>, 'label'='Task name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

get_spectral_angles_deg(index, row): Return a sequence of spectral angles (in degrees), one per (x,y) position in the image, flattened in raster order.

set_spectral_distances(index, row)

class enb.compression.icompression.StructuralSimilarity(codecs, dataset_paths=None, csv_experiment_path=None, csv_dataset_path=None, dataset_info_table=None, overwrite_file_properties=False, reconstructed_dir_path=None, compressed_copy_dir_path=None, task_families=None)

Bases: CompressionExperiment

Set the Structural Similarity (SSIM) and Multi-Scale Structural Similarity metrics (MS-SSIM) to measure the similarity between two images.

Authors:

codec_results: CompressionDecompressionWrapper | None

column_to_properties = {'bpppc': ColumnProperties('name'='bpppc', 'fun'=<function CompressionExperiment.set_bpppc>, 'label'='Compressed data rate (bpppc)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_file_sha256': ColumnProperties('name'='compressed_file_sha256', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'="Compressed file's SHA256", 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compressed_size_bytes': ColumnProperties('name'='compressed_size_bytes', 'fun'=<function CompressionExperiment.set_compressed_data_size>, 'label'='Compressed data size (Bytes)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_efficiency_1byte_entropy': ColumnProperties('name'='compression_efficiency_1byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 1byte entropy', 'plot_min'=0, 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (1bytes entropy)'), 'compression_efficiency_2byte_entropy': ColumnProperties('name'='compression_efficiency_2byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 2byte entropy', 'plot_min'=0, 'plot_max'=2, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (2bytes entropy)'), 'compression_efficiency_4byte_entropy': ColumnProperties('name'='compression_efficiency_4byte_entropy', 'fun'=<function CompressionExperiment.set_efficiency>, 'label'='Compression efficiency 4byte entropy', 'plot_min'=0, 'plot_max'=4, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False, 'labytesel'='Compression efficiency (4bytes entropy)'), 'compression_memory_kb': ColumnProperties('name'='compression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio': ColumnProperties('name'='compression_ratio', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_ratio_dr': ColumnProperties('name'='compression_ratio_dr', 'fun'=<function CompressionExperiment.set_compression_ratio_dr>, 'label'='Compression ratio', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'compression_time_seconds': ColumnProperties('name'='compression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Compression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_memory_kb': ColumnProperties('name'='decompression_memory_kb', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression memory usage (KB)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'decompression_time_seconds': ColumnProperties('name'='decompression_time_seconds', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Decompression time (s)', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'family_label': ColumnProperties('name'='family_label', 'fun'=<function Experiment.set_family_label>, 'label'='Family label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'lossless_reconstruction': ColumnProperties('name'='lossless_reconstruction', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Lossless?', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'ms_ssim': ColumnProperties('name'='ms_ssim', 'fun'=<function StructuralSimilarity.set_StructuralSimilarity>, 'label'='MS-SSIM', 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'param_dict': ColumnProperties('name'='param_dict', 'fun'=<function Experiment.set_param_dict>, 'label'='Param dict', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=True, 'has_iterable_values'=False, 'has_object_values'=False), 'repetitions': ColumnProperties('name'='repetitions', 'fun'=<function CompressionExperiment.set_comparison_results>, 'label'='Number of compression/decompression repetitions', 'plot_min'=0, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'ssim': ColumnProperties('name'='ssim', 'fun'=<function StructuralSimilarity.set_StructuralSimilarity>, 'label'='SSIM', 'plot_max'=1, 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_apply_time': ColumnProperties('name'='task_apply_time', 'fun'=<function Experiment.set_task_apply_time>, 'label'='Task apply time', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_label': ColumnProperties('name'='task_label', 'fun'=<function Experiment.set_task_label>, 'label'='Task label', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'task_name': ColumnProperties('name'='task_name', 'fun'=<function Experiment.set_task_name>, 'label'='Task name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

compute_SSIM(img1, img2, max_val=255, filter_size=11, filter_sigma=1.5, k1=0.01, k2=0.03, full=False)

Return the Structural Similarity Map between img1 and img2.

This function attempts to match the functionality of ssim_index_new.m by Zhou Wang: http://www.cns.nyu.edu/~lcv/ssim/msssim.zip

Author’s Python implementation: https://github.com/dashayushman/TAC-GAN/blob/master/msssim.py

Parameters:

img1 – Numpy array holding the first RGB image batch.
img2 – Numpy array holding the second RGB image batch.
max_val – the dynamic range of the images (i.e., the difference between the maximum the and minimum allowed values).
filter_size – Size of blur kernel to use (will be reduced for small images). :param filter_sigma: Standard deviation for Gaussian blur kernel (will be reduced for small images). :param k1: Constant used to maintain stability in the SSIM calculation (0.01 in the original paper). :param k2: Constant used to maintain stability in the SSIM calculation (0.03 in the original paper).

cumpute_MSSIM(img1, img2, max_val=255, filter_size=11, filter_sigma=1.5, k1=0.01, k2=0.03, weights=None)

Return the MS-SSIM score between img1 and img2.

This function implements Multi-Scale Structural Similarity (MS-SSIM) Image Quality Assessment according to Zhou Wang’s paper, “Multi-scale structural similarity for image quality assessment” (2003). Link: https://ece.uwaterloo.ca/~z70wang/publications/msssim.pdf

Author’s MATLAB implementation: http://www.cns.nyu.edu/~lcv/ssim/msssim.zip

Author’s Python implementation: https://github.com/dashayushman/TAC-GAN/blob/master/msssim.py

Authors documentation:

Parameters:

img1 – Numpy array holding the first RGB image batch.
img2 – Numpy array holding the second RGB image batch.
max_val – the dynamic range of the images (i.e., the difference between the maximum the and minimum allowed values).
filter_size – Size of blur kernel to use (will be reduced for small images).
filter_sigma – Standard deviation for Gaussian blur kernel ( will be reduced for small images).
k1 – Constant used to maintain stability in the SSIM calculation (0.01 in the original paper).
k2 – Constant used to maintain stability in the SSIM calculation (0.03 in the original paper).

set_StructuralSimilarity(index, row)

enb.compression.jpg module

JPEG manipulation (e.g., curation) tools.

class enb.compression.jpg.JPEGCurationTable(original_base_dir, version_base_dir, csv_support_path=None)

Bases: PNGCurationTable

Given a directory tree containing JPEG images, copy those images into a new directory tree in raw BSQ format adding geometry information tags to the output names recognized by enb.isets.load_array_bsq.

column_to_properties = {'corpus': ColumnProperties('name'='corpus', 'fun'=<function FileVersionTable.set_corpus>, 'label'='Corpus name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'original_file_path': ColumnProperties('name'='original_file_path', 'fun'=<function FileVersionTable.set_original_file_path>, 'label'='Original file path', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sha256': ColumnProperties('name'='sha256', 'fun'=<function FilePropertiesTable.set_hash_digest>, 'label'='sha256 hex digest', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'size_bytes': ColumnProperties('name'='size_bytes', 'fun'=<function FilePropertiesTable.set_file_size>, 'label'='File size (bytes)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_name': ColumnProperties('name'='version_name', 'fun'=<function FileVersionTable.column_version_name>, 'label'='Version name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_time': ColumnProperties('name'='version_time', 'fun'=<function FileVersionTable.set_version_time>, 'label'='Versioning time (s)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

dataset_files_extension = 'jpg': Default input sample extension. If affects the result of enb.atable.get_all_test_files,

enb.compression.pgm module

Module to handle PGM (P5) and PPM (P6) images

class enb.compression.pgm.PGMCurationTable(original_base_dir, version_base_dir, csv_support_path=None)

Bases: PNGCurationTable

Given a directory tree containing PGM images, copy those images into a new directory tree in raw BSQ format adding geometry information tags to the output names recognized by enb.isets.load_array_bsq.

column_to_properties = {'corpus': ColumnProperties('name'='corpus', 'fun'=<function FileVersionTable.set_corpus>, 'label'='Corpus name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'original_file_path': ColumnProperties('name'='original_file_path', 'fun'=<function FileVersionTable.set_original_file_path>, 'label'='Original file path', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sha256': ColumnProperties('name'='sha256', 'fun'=<function FilePropertiesTable.set_hash_digest>, 'label'='sha256 hex digest', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'size_bytes': ColumnProperties('name'='size_bytes', 'fun'=<function FilePropertiesTable.set_file_size>, 'label'='File size (bytes)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_name': ColumnProperties('name'='version_name', 'fun'=<function FileVersionTable.column_version_name>, 'label'='Version name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_time': ColumnProperties('name'='version_time', 'fun'=<function FileVersionTable.set_version_time>, 'label'='Versioning time (s)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

dataset_files_extension = 'pgm': Default input sample extension. If affects the result of enb.atable.get_all_test_files,

class enb.compression.pgm.PGMWrapperCodec(compressor_path, decompressor_path, param_dict=None, output_invocation_dir=None, signature_in_name=False)

Bases: WrapperCodec

Raw images are coded into PNG before compression with the wrapper, and PNG is decoded to raw after decompression.

compress(original_path: str, compressed_path: str, original_file_info=None)

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

class enb.compression.pgm.PPMCurationTable(original_base_dir, version_base_dir, csv_support_path=None)

Bases: PNGCurationTable

Given a directory tree containing PPM images, copy those images into a new directory tree in raw BSQ format adding geometry information tags to the output names recognized by enb.isets.load_array_bsq.

column_to_properties = {'corpus': ColumnProperties('name'='corpus', 'fun'=<function FileVersionTable.set_corpus>, 'label'='Corpus name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'original_file_path': ColumnProperties('name'='original_file_path', 'fun'=<function FileVersionTable.set_original_file_path>, 'label'='Original file path', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sha256': ColumnProperties('name'='sha256', 'fun'=<function FilePropertiesTable.set_hash_digest>, 'label'='sha256 hex digest', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'size_bytes': ColumnProperties('name'='size_bytes', 'fun'=<function FilePropertiesTable.set_file_size>, 'label'='File size (bytes)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_name': ColumnProperties('name'='version_name', 'fun'=<function FileVersionTable.column_version_name>, 'label'='Version name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_time': ColumnProperties('name'='version_time', 'fun'=<function FileVersionTable.set_version_time>, 'label'='Versioning time (s)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

dataset_files_extension = 'ppm': Default input sample extension. If affects the result of enb.atable.get_all_test_files,

enb.compression.pgm.pgm_to_raw(input_path, output_path): Read a file in PGM format and write its contents in raw format, which does not include any geometry or data type information.

enb.compression.pgm.ppm_to_raw(input_path, output_path): Read a file in PPM format and write its contents in raw format, which does not include any geometry or data type information.

enb.compression.pgm.read_pgm(input_path, byteorder='>')

Return image data from a raw PGM file as numpy array. Format specification: http://netpbm.sourceforge.net/doc/pgm.html

(From answer: https://stackoverflow.com/questions/7368739/numpy-and-16-bit-pgm)

enb.compression.pgm.read_ppm(input_path, byteorder='>')

Return image data from a raw PGM file as numpy array. Format specification: http://netpbm.sourceforge.net/doc/pgm.html

(From answer: https://stackoverflow.com/questions/7368739/numpy-and-16-bit-pgm)

enb.compression.pgm.write_pgm(array, bytes_per_sample, output_path, byteorder='>'): Write a 2D array indexed with [x,y] into output_path with PGM format.

enb.compression.pgm.write_ppm(array, bytes_per_sample, output_path): Write a 3-component 3D array indexed with [x,y,z] into output_path with PPM format.

enb.compression.png module

PNG manipulation (e.g., curation) tools.

class enb.compression.png.PDFToPNG(input_pdf_dir, output_png_dir, csv_support_path=None)

Bases: FileVersionTable

Take all .pdf files in input dir and save them as .png files into output_dir, maintining the relative folder structure.

__init__(input_pdf_dir, output_png_dir, csv_support_path=None)

Parameters:

version_base_dir – path to the versioned base directory (versioned directories preserve names and structure within the base dir)
version_name – arbitrary name of this file version
original_base_dir – path to the original directory (it must contain all indices requested later with self.get_df()). If None, enb.config.options.base_dataset_dir is used
original_properties_table – instance of the file properties subclass to be used when reading the original data to be versioned. If None, a FilePropertiesTable is instanced automatically.
csv_support_path – path to the file where results (of the versioned data) are to be long-term stored. If None, one is assigned by default based on options.persistence_dir.
check_generated_files – if True, the table checks that each call to version() produces a file to output_path. Set to false to create arbitrarily named output files.

column_to_properties = {'corpus': ColumnProperties('name'='corpus', 'fun'=<function FileVersionTable.set_corpus>, 'label'='Corpus name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'original_file_path': ColumnProperties('name'='original_file_path', 'fun'=<function FileVersionTable.set_original_file_path>, 'label'='Original file path', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sha256': ColumnProperties('name'='sha256', 'fun'=<function FilePropertiesTable.set_hash_digest>, 'label'='sha256 hex digest', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'size_bytes': ColumnProperties('name'='size_bytes', 'fun'=<function FilePropertiesTable.set_file_size>, 'label'='File size (bytes)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_name': ColumnProperties('name'='version_name', 'fun'=<function FileVersionTable.column_version_name>, 'label'='Version name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_time': ColumnProperties('name'='version_time', 'fun'=<function FileVersionTable.set_version_time>, 'label'='Versioning time (s)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

dataset_files_extension = 'pdf': Default input sample extension. If affects the result of enb.atable.get_all_test_files,

version(input_path, output_path, row)

Create a version of input_path and write it into output_path.

Parameters:

input_path – path to the file to be versioned
output_path – path where the version should be saved
row – metainformation available using super().get_df for input_path

Returns:

if not None, the time in seconds it took to perform the ( forward) versioning.

class enb.compression.png.PNGCurationTable(original_base_dir, version_base_dir, csv_support_path=None)

Bases: FileVersionTable

Given a directory tree containing PNG images, copy those images into a new directory tree in raw BSQ format adding geometry information tags to the output names recognized by load_array_bsq.

__init__(original_base_dir, version_base_dir, csv_support_path=None)

Parameters:

original_base_dir – path to the original directory (it must contain all indices requested later with self.get_df()). If None, options.base_datset_dir is used
version_base_dir – path to the versioned base directory (versioned directories preserve names and structure within the base dir)
csv_support_path – path to the file where results (of the versioned data) are to be long-term stored. If None, one is assigned by default based on options.persistence_dir.

column_to_properties = {'corpus': ColumnProperties('name'='corpus', 'fun'=<function FileVersionTable.set_corpus>, 'label'='Corpus name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'original_file_path': ColumnProperties('name'='original_file_path', 'fun'=<function FileVersionTable.set_original_file_path>, 'label'='Original file path', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'sha256': ColumnProperties('name'='sha256', 'fun'=<function FilePropertiesTable.set_hash_digest>, 'label'='sha256 hex digest', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'size_bytes': ColumnProperties('name'='size_bytes', 'fun'=<function FilePropertiesTable.set_file_size>, 'label'='File size (bytes)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_name': ColumnProperties('name'='version_name', 'fun'=<function FileVersionTable.column_version_name>, 'label'='Version name', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False), 'version_time': ColumnProperties('name'='version_time', 'fun'=<function FileVersionTable.set_version_time>, 'label'='Versioning time (s)', 'semilog_x'=False, 'semilog_y'=False, 'semilog_x_base'=10, 'semilog_y_base'=10, 'has_dict_values'=False, 'has_iterable_values'=False, 'has_object_values'=False)}: The column_properties attribute keeps track of what columns have been defined, and the methods that need to be called to computed them. The keys of this attribute can be used to determine the columns defined in a given class or instance. The values are |ColumnProperties| instances, which can be set manually after definition and before calling |Analyzer| subclasses’ get_df.

dataset_files_extension = 'png': Default input sample extension. If affects the result of enb.atable.get_all_test_files,

version(input_path, output_path, row): Transform PNG files into raw images with name tags recognized by isets.

class enb.compression.png.PNGWrapperCodec(compressor_path, decompressor_path, param_dict=None, output_invocation_dir=None, signature_in_name=False)

Bases: WrapperCodec

Raw images are coded into PNG before compression with the wrapper, and PNG is decoded to raw after decompression.

compress(original_path: str, compressed_path: str, original_file_info=None)

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

enb.compression.png.pdf_to_png(input_dir, output_dir)

Take all .pdf files in input dir and save them as .png files into output_dir, maintining the relative folder structure.

It is perfectly valid for input_dir and output_dir to point to the same location, but input_dir must exist beforehand.

enb.compression.png.raw_path_to_png(raw_path, png_path, image_properties_row=None)

Render an uint8 or uint16 raw image with 1, 3 or 4 components.

Parameters:

raw_path – path to the image in raw format to render in png.
png_path – path where the png file is to be stored.
image_properties_row – if row_path does not contain geometry information, this parameter should be a dict-like object that indicates width, height, number of components, bytes per sample, signedness and endianness if applicable.

enb.compression.png.render_array_png(img, png_path): Render an uint8 or uint16 image with 1, 3 or 4 components. :param img: image array indexed by [x,y,z]. :param png_path: path where the png file is to be stored.

enb.compression.tarlite module

Lite archiving format to write several files into a single one.

class enb.compression.tarlite.TarliteReader(tarlite_path)

Bases: object

Extract files created by TarliteWriter.

__init__(tarlite_path)

extract_all(output_dir_path): Extract all files to output_dir_path.

class enb.compression.tarlite.TarliteWriter(initial_input_paths=None)

Bases: object

Input a series of file paths and output a single file with all the inputs contents, plus some meta-information to reconstruct them. Files are stored flatly, i.e., only names are stored, discarding any information about their directory structure. Therefore, it is not possible to store two files with the same name even if all input paths point to different files.

__init__(initial_input_paths=None)

add_file(input_path): Add a file path to the list of pending ones. Note that files are not read until the write() method is invoked.

write(output_path): Save the current list of input paths into output_path.

enb.compression.tarlite.tarlite_files(input_paths, output_tarlite_path): Take a list of input paths and combine them into a single tarlite file.

enb.compression.tarlite.untarlite_files(input_tarlite_path, output_dir_path): Take a tarlite file and output the contents into the given directory. The file names are preserved.

enb.compression.wrapper module

Wrapper codec classes.

Existing codec implementations (including non-python binaries) can be easily added to enb via WrapperCodec (sub)classes.

class enb.compression.wrapper.JavaWrapperCodec(compressor_jar, decompressor_jar, param_dict=None)

Bases: WrapperCodec

Wrapper for *.jar codecs. The compression and decompression parameters are those that need to be passed to the ‘java’ command.

The compressor_jar and decompressor_jar attributes are added upon initialization based on the params to __init__.

__init__(compressor_jar, decompressor_jar, param_dict=None)

Parameters:

compressor_path – path to the executable to be used for compression
decompressor_path – path to the executable to be used for decompression
param_dict – name-value mapping of the parameters to be used for compression
output_invocation_dir – if not None, invocation strings are stored in this directory with name based on the codec and the sample’s full path.

Pram signature_in_name:

if True, the default codec name includes part of the hexdigest of the compressor and decompressor binaries being used

class enb.compression.wrapper.LittleEndianWrapper(compressor_path, decompressor_path, param_dict=None, output_invocation_dir=None, signature_in_name=False)

Bases: WrapperCodec

Wrapper with identical semantics as WrapperCodec, but performs a big endian to little endian conversion for (big-endian) 2-byte and 4-byte samples. If the input is flagged as little endian, e.g., if -u16le- is in the original file name, then no transformation is performed.

Codecs inheriting from this class automatically receive little-endian samples, and are expected to reconstruct little-endian files (which are then translated back to big endian if and only if the original image was flagged as big endian.

compress(original_path: str, compressed_path: str, original_file_info=None)

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

class enb.compression.wrapper.QuantizationWrapperCodec(codec: AbstractCodec, qstep: int)

Bases: NearLosslessCodec

Perform uniform scalar quantization before compressing and after decompressing with a wrapped codec instance. Midpoint reconstruction is used in the dequantization stage.

__init__(codec: AbstractCodec, qstep: int)

Parameters:

codec – The codec instance used to compress and decompress the quantized data.
qstep – The quantization interval length

compress(original_path: str, compressed_path: str, original_file_info=None)

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

property label: Return the original codec label and the quantization parameter.

property name: Return the original codec name and the quantization parameter

class enb.compression.wrapper.ReindexWrapper(codec: AbstractCodec, width_bytes: int)

Bases: AbstractCodec

Input samples are first reindexed to a contiguous support preserving the ordering (If x and y are two sample values present in the input file, then x < y <=> reindex(x) < reindex(y)).

Reindexed data are stored as unsigned, big-endian samples of the width configured on initialization. After reindexing, the codec passed to the initializer is used for compression.

The user is responsible for using a codec compatible with the type of the reindexed data, and a data type that can hold the number of unique samples present in the input file.

Note that only integer input samples are currently supported.

__init__(codec: AbstractCodec, width_bytes: int)

Parameters:: param_dict – dictionary of parameters for this codec instance.

compress(original_path: str, compressed_path: str, original_file_info=None) → CompressionResults

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path: str, reconstructed_path: str, original_file_info=None) → DecompressionResults

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

property label: Return the original codec label and the quantization parameter.

property name: Return the original codec name and the quantization parameter

class enb.compression.wrapper.WrapperCodec(compressor_path, decompressor_path, param_dict=None, output_invocation_dir=None, signature_in_name=False)

Bases: AbstractCodec

A codec that uses an external process to compress and decompress.

__init__(compressor_path, decompressor_path, param_dict=None, output_invocation_dir=None, signature_in_name=False)

Parameters:

compressor_path – path to the executable to be used for compression
decompressor_path – path to the executable to be used for decompression
param_dict – name-value mapping of the parameters to be used for compression
output_invocation_dir – if not None, invocation strings are stored in this directory with name based on the codec and the sample’s full path.

Pram signature_in_name:

if True, the default codec name includes part of the hexdigest of the compressor and decompressor binaries being used

compress(original_path: str, compressed_path: str, original_file_info=None)

Compress original_path into compress_path using param_dict as params. :param original_path: path to the original file to be compressed :param compressed_path: path to the compressed file to be created :param original_file_info: a dict-like object describing

original_path’s properties (e.g., geometry), or None.

Returns:: (optional) a CompressionResults instance, or None (see self.compression_results_from_paths)

decompress(compressed_path, reconstructed_path, original_file_info=None)

Decompress compressed_path into reconstructed_path using param_dict as params (if needed).

Parameters:

compressed_path – path to the input compressed file
reconstructed_path – path to the output reconstructed file
original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None. Should only be actually used in special cases, since codecs are expected to store all needed metainformation in the compressed file.

Returns:

(optional) a DecompressionResults instance, or None (see

self.decompression_results_from_paths)

static get_binary_signature(binary_path): Return a string with a (hopefully) unique signature for the contents of binary_path. By default, the first 5 digits of the sha-256 hexdigest are returned.

get_compression_params(original_path, compressed_path, original_file_info)

Return a string (shell style) with the parameters to be passed to the compressor.

Same parameter semantics as AbstractCodec.compress().

Parameters:: original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None

get_decompression_params(compressed_path, reconstructed_path, original_file_info)

Return a string (shell style) with the parameters to be passed to the decompressor. Same parameter semantics as AbstractCodec.decompress().

Parameters:: original_file_info – a dict-like object describing original_path’s properties (e.g., geometry), or None

property name: Return the codec’s name and parameters, also including the encoder and decoder hash summaries (so that changes in the reference binaries can be easily detected)

Module contents

enb.compression: data compression in enb.

The compression and icompression modules implement enb.experiment.Experiment classes and other basic tools to facilitate them.

Several other modules are declared for specific compressed data formats.