z5py package

Submodules

z5py.attribute_manager module

class z5py.attribute_manager.AttributeManager(path, is_zarr)[source]

Bases: _abcoll.MutableMapping

Provides access to custom user attributes.

Attributes will be saved as json in attributes.json for n5, .zattributes for zarr. Supports the default python dict api. N5 stores the dataset attributes in the same file; these attributes are NOT mutable via the AttributeManager.

z5py.converter module

z5py.converter.convert_from_h5(in_path, out_path, in_path_in_file, out_path_in_file, out_chunks, n_threads, out_blocks=None, use_zarr_format=None, **z5_kwargs)[source]

Convert hdf5 dataset to n5 or zarr dataset.

The chunks of the output dataset must be spcified. The dataset is converted in parallel over the chunks. Datatype and compression can be specified, otherwise defaults will be used.

Parameters:
  • in_path (str) – path to hdf5 file.
  • out_path (str) – path to output zarr or n5 file.
  • in_path_in_file (str) – name of input dataset.
  • out_path_in_file (str) – name of output dataset.
  • out_chunks (tuple) – chunks of output dataset.
  • n_threads (int) – number of threads used for converting.
  • out_blocks (tuple) – block size used for converting, must be multiple of out_chunks. If None, the chunk size will be used (default: None).
  • use_zarr_format (bool) – flag to indicate zarr format. If None, an attempt will be made to infer the format from the file extension, otherwise zarr will be used (default: None).
  • **z5_kwargs – keyword arguments for z5py dataset, e.g. datatype or compression.
z5py.converter.convert_to_h5(in_path, out_path, in_path_in_file, out_path_in_file, out_chunks, n_threads, out_blocks=None, **h5_kwargs)[source]

Convert n5 ot zarr dataset to hdf5 dataset.

The chunks of the output dataset must be spcified. The dataset is converted to hdf5 in parallel over the chunks. Note that hdf5 does not support parallel write access, so more threads may not speed up the conversion. Datatype and compression can be specified, otherwise defaults will be used.

Parameters:
  • in_path (str) – path to n5 or zarr file.
  • out_path (str) – path to output hdf5 file.
  • in_path_in_file (str) – name of input dataset.
  • out_path_in_file (str) – name of output dataset.
  • out_chunks (tuple) – chunks of output dataset.
  • n_threads (int) – number of threads used for converting.
  • out_blocks (tuple) – block size used for converting, must be multiple of out_chunks. If None, the chunk size will be used (default: None).
  • **h5_kwargs – keyword arguments for h5py dataset, e.g. datatype or compression.

z5py.dataset module

class z5py.dataset.Dataset(path, dset_impl, n_threads=1)[source]

Bases: object

Dataset for access to data on disc.

Should not be instantiated directly, but rather be created or opened via create_dataset, require_dataset or the [] operator of File or Group.

array_to_format(array)[source]

Convert array to serialization.

Convert an array to the (1d) binary data that would be serialized to disc for the format of the dataset.

Parameters:array (np.ndarray) – array to be converted to serialization.
Returns:np.ndarray
attrs

The AttributeManager of this dataset.

chunk_exists(chunk_indices)[source]

Check if chunk has data.

Check for the given indices if the chunk has data.

Parameters:chunk_indices (tuple) – chunk indices.
Returns:bool
chunks

Chunks of this dataset.

chunks_per_dimension

Number of chunks in each dimension of this dataset.

compression_options

Compression library options of this dataset.

compressors_n5 = ['raw', 'gzip', 'bzip2', 'xz', 'lz4']

Compression libraries supported by n5 format

compressors_zarr = ['raw', 'blosc', 'zlib', 'bzip2']

Compression libraries supported by zarr format

dtype

Datatype of this dataset.

find_maximum_coordinates(dim)[source]

Find coordinates of chunk with largest coordinate along dimension.

Only considers chunks that contain data.

Parameters:dim (int) – query dimension.
Returns:start coordinates of the chunk.
Return type:tuple
find_minimum_coordinates(dim)[source]

Find coordinates of chunk with smallest coordinate along dimension.

Only considers chunks that contain data.

Parameters:dim (int) – query dimension.
Returns:start coordinates of the chunk.
Return type:tuple
index_to_roi(index)[source]

Convert index to region of interest.

Convert an index, which can be a slice or a tuple of slices / ellipsis to a region of interest. The roi consists of the region offset and the region shape.

Parameters:index (slice or tuple) – index into dataset.
Returns:offset of the region of interest. tuple: shape of the region of interest.
Return type:tuple
is_zarr

Flag to indicate zarr or n5 format of this dataset.

n5_default_compressor = 'gzip'

Default compression for n5 format

ndim

Number of dimensions of this dataset.

number_of_chunks

Total number of chunks of this dataset.

read_subarray(start, stop)[source]

Read subarray from region of interest.

Region of interest is defined by start and stop and must be in bounds of the dataset.

Parameters:
  • start (tuple) – start coordinates of the roi.
  • stop (tuple) – stop coordinates of the roi.
Returns:

np.ndarray

shape

Shape of this dataset.

size

Size (total number of elements) of this dataset.

write_subarray(start, data)[source]

Write subarray to dataset.

data is written to region of interest, defined by start and the shape of data. The region of interest must be in bounds of the dataset and the datatype must agree with the dataset.

Parameters:
  • start (tuple) – offset of the roi to write.
  • data (np.ndarray) – data to write; shape determines the roi shape.
zarr_default_compressor = 'blosc'

Default compression for zarr format

z5py.file module

class z5py.file.File(path, use_zarr_format=None, mode='a')[source]

Bases: z5py.group.Group

File to access zarr or n5 containers on disc.

The container corresponds to a directory on the filesystem. Groups are subdirectories and datasets are subdirectories that contain multi-dimensional data stored in binary format. Supports python dict api.

Parameters:
  • path (str) – path on filesystem that holds the container.
  • use_zarr_format (bool) – flag to determine if container is zarr or n5 (default: None).
  • mode (str) – file mode used to open / create the file (default: ‘a’).
classmethod infer_format(path)[source]

Infer the file format from the file extension.

Returns:True for zarr, False for n5 and None if the format could not be infered.
Return type:bool
n5_exts = set(['.n5'])

file extensions that are inferred as n5 file

zarr_exts = set(['.zr', '.zarr'])

file extensions that are inferred as zarr file

class z5py.file.N5File(path, mode='a')[source]

Bases: z5py.file.File

File to access n5 containers on disc.

Parameters:
  • path (str) – path on filesystem that holds the container.
  • mode (str) – file mode used to open / create the file (default: ‘a’).
class z5py.file.ZarrFile(path, mode='a')[source]

Bases: z5py.file.File

File to access zarr containers on disc.

Parameters:
  • path (str) – path on filesystem that holds the container.
  • mode (str) – file mode used to open / create the file (default: ‘a’).

z5py.group module

class z5py.group.Group(path, is_zarr=True, mode='a')[source]

Bases: _abcoll.Mapping

Group inside of a z5py container.

Corresponds to a directory on the filesystem. Supports python dict api. Should not be instantiated directly, but rather be created or opened via the create_group, request_group or [] operators of Group or File.

attrs

Access additional attributes.

Returns:AttributeManager.
create_dataset(name, shape=None, dtype=None, data=None, chunks=None, compression=None, fillvalue=0, n_threads=1, **compression_options)[source]

Create a new dataset.

Create a new dataset in the group. Syntax and behaviour similar to the corresponding h5py functionality. In contrast to h5py, there is no option to store a dataset without chunking (if no chunks are given default values, suitable for the dimension of the dataset, will be used). Also, if a dataset is created with data and a dtype that is different from the data’s is specified, the function throws a RuntimeError, instead of converting the data.

Parameters:
  • name (str) – name of the new dataset.
  • shape (tuple) – shape of the new dataset. If no shape is given, the data argument must be given. (default: None).
  • dtype (str or np.dtpye) – datatype of the new dataset. If no dtype is given, the data argument must be given (default: None).
  • data (np.ndarray) – data used to infer shape, dtype and fill the dataset upon creation (default: None).
  • chunks (tuple) – chunk sizes of the new dataset. If no chunks are given, a suitable default value for the number of dimensions will be used (default: None).
  • compression (str) – name of the compression library used to compress chunks. If no compression is given, the default for the current format is used (default: None).
  • fillvalue (float) – fillvalue for empty chunks (only zarr) (default: 0).
  • n_threads (int) – number of threads used for chunk I/O (default: 1).
  • **compression_options – options for the compression library.
Returns:

the new dataset.

Return type:

Dataset

create_group(name)[source]

Create a new group.

Create new (sub-)group of the group. Fails if a group of this name already exists.

Parameters:name (str) – name of the new group.
Returns:group of the requested name.
Return type:Group
file_modes = {'a': <Mock name='mock.FileMode.a' id='139742137566224'>, 'r': <Mock name='mock.FileMode.r' id='139742137129616'>, 'r+': <Mock name='mock.FileMode.r_p' id='139742137129744'>, 'w': <Mock name='mock.FileMode.w' id='139742137129872'>, 'w-': <Mock name='mock.FileMode.w_m' id='139742137130000'>, 'x': <Mock name='mock.FileMode.w_m' id='139742137130000'>}

available modes for opening files. these correspond to the h5py file modes

require_dataset(name, shape, dtype=None, chunks=None, n_threads=1, **kwargs)[source]

Require dataset.

Require dataset in the group. Will create the dataset if it does not exist, otherwise returns existing dataset. If the dataset already exists, consistency with the arguments shape, dtype (if given) and chunks (if given) is enforced.

Parameters:
  • name (str) – name of the dataset.
  • shape (tuple) – shape of the dataset.
  • dtype (str or np.dtpye) – datatype of dataset (default: None).
  • chunks (tuple) – chunk sizes of the dataset (default: None).
  • n_threads (int) – number of threads used for chunk I/O (default: 1).
  • **kwargs – additional arguments that will only be used for creation if the dataset does not exist.
Returns:

the required dataset.

Return type:

Dataset

require_group(name)[source]

Require group.

Require that a group of the given name exists. The group will be created if it does not already exist.

Parameters:name (str) – name of the required group.
Returns:group of the requested name.
Return type:Group

z5py.util module

class z5py.util.Timer[source]

Bases: object

elapsed
start()[source]
stop()[source]
z5py.util.blocking(shape, block_shape)[source]

Generator for nd blocking.

z5py.util.fetch_test_data()[source]
z5py.util.fetch_test_data_stent()[source]
z5py.util.rechunk(in_path, out_path, in_path_in_file, out_path_in_file, out_chunks, n_threads, out_blocks=None, dtype=None, use_zarr_format=None, **new_compression)[source]

Copy and rechunk a dataset.

The input dataset will be copied to the output dataset chunk by chunk. Allows to change datatype, file format and compression as well.

Parameters:
  • in_path (str) – path to the input file.
  • out_path (str) – path to the output file.
  • in_path_in_file (str) – name of input dataset.
  • out_path_in_file (str) – name of output dataset.
  • out_chunks (tuple) – chunks of the output dataset.
  • n_threads (int) – number of threads used for copying.
  • out_blocks (tuple) – blocks used for copying. Must be a multiple of out_chunks, which are used by default (default: None)
  • dtype (str) – datatype of the output dataset, default does not change datatype (default: None).
  • use_zarr_format (bool) – file format of the output file, default does not change format (default: None).
  • **new_compression – compression library and options for output dataset. If not given, the same compression as in the input is used.

Module contents