z5py package¶
Submodules¶
z5py.attribute_manager module¶
-
class
z5py.attribute_manager.
AttributeManager
(path, is_zarr)[source]¶ Bases:
_abcoll.MutableMapping
Provides access to custom user attributes.
Attributes will be saved as json in attributes.json for n5, .zattributes for zarr. Supports the default python dict api. N5 stores the dataset attributes in the same file; these attributes are NOT mutable via the AttributeManager.
z5py.converter module¶
-
z5py.converter.
convert_from_h5
(in_path, out_path, in_path_in_file, out_path_in_file, out_chunks, n_threads, out_blocks=None, use_zarr_format=None, **z5_kwargs)[source]¶ Convert hdf5 dataset to n5 or zarr dataset.
The chunks of the output dataset must be spcified. The dataset is converted in parallel over the chunks. Datatype and compression can be specified, otherwise defaults will be used.
Parameters: - in_path (str) – path to hdf5 file.
- out_path (str) – path to output zarr or n5 file.
- in_path_in_file (str) – name of input dataset.
- out_path_in_file (str) – name of output dataset.
- out_chunks (tuple) – chunks of output dataset.
- n_threads (int) – number of threads used for converting.
- out_blocks (tuple) – block size used for converting, must be multiple of
out_chunks
. If None, the chunk size will be used (default: None). - use_zarr_format (bool) – flag to indicate zarr format. If None, an attempt will be made to infer the format from the file extension, otherwise zarr will be used (default: None).
- **z5_kwargs – keyword arguments for
z5py
dataset, e.g. datatype or compression.
-
z5py.converter.
convert_to_h5
(in_path, out_path, in_path_in_file, out_path_in_file, out_chunks, n_threads, out_blocks=None, **h5_kwargs)[source]¶ Convert n5 ot zarr dataset to hdf5 dataset.
The chunks of the output dataset must be spcified. The dataset is converted to hdf5 in parallel over the chunks. Note that hdf5 does not support parallel write access, so more threads may not speed up the conversion. Datatype and compression can be specified, otherwise defaults will be used.
Parameters: - in_path (str) – path to n5 or zarr file.
- out_path (str) – path to output hdf5 file.
- in_path_in_file (str) – name of input dataset.
- out_path_in_file (str) – name of output dataset.
- out_chunks (tuple) – chunks of output dataset.
- n_threads (int) – number of threads used for converting.
- out_blocks (tuple) – block size used for converting, must be multiple of
out_chunks
. If None, the chunk size will be used (default: None). - **h5_kwargs – keyword arguments for
h5py
dataset, e.g. datatype or compression.
z5py.dataset module¶
-
class
z5py.dataset.
Dataset
(path, dset_impl, n_threads=1)[source]¶ Bases:
object
Dataset for access to data on disc.
Should not be instantiated directly, but rather be created or opened via
create_dataset
,require_dataset
or the[]
operator of File or Group.-
array_to_format
(array)[source]¶ Convert array to serialization.
Convert an array to the (1d) binary data that would be serialized to disc for the format of the dataset.
Parameters: array (np.ndarray) – array to be converted to serialization. Returns: np.ndarray
-
attrs
¶ The
AttributeManager
of this dataset.
-
chunk_exists
(chunk_indices)[source]¶ Check if chunk has data.
Check for the given indices if the chunk has data.
Parameters: chunk_indices (tuple) – chunk indices. Returns: bool
-
chunks
¶ Chunks of this dataset.
-
chunks_per_dimension
¶ Number of chunks in each dimension of this dataset.
-
compression_options
¶ Compression library options of this dataset.
-
compressors_n5
= ['raw', 'gzip', 'bzip2', 'xz', 'lz4']¶ Compression libraries supported by n5 format
-
compressors_zarr
= ['raw', 'blosc', 'zlib', 'bzip2']¶ Compression libraries supported by zarr format
-
dtype
¶ Datatype of this dataset.
-
find_maximum_coordinates
(dim)[source]¶ Find coordinates of chunk with largest coordinate along dimension.
Only considers chunks that contain data.
Parameters: dim (int) – query dimension. Returns: start coordinates of the chunk. Return type: tuple
-
find_minimum_coordinates
(dim)[source]¶ Find coordinates of chunk with smallest coordinate along dimension.
Only considers chunks that contain data.
Parameters: dim (int) – query dimension. Returns: start coordinates of the chunk. Return type: tuple
-
index_to_roi
(index)[source]¶ Convert index to region of interest.
Convert an index, which can be a slice or a tuple of slices / ellipsis to a region of interest. The roi consists of the region offset and the region shape.
Parameters: index (slice or tuple) – index into dataset. Returns: offset of the region of interest. tuple: shape of the region of interest. Return type: tuple
-
is_zarr
¶ Flag to indicate zarr or n5 format of this dataset.
-
n5_default_compressor
= 'gzip'¶ Default compression for n5 format
-
ndim
¶ Number of dimensions of this dataset.
-
number_of_chunks
¶ Total number of chunks of this dataset.
-
read_subarray
(start, stop)[source]¶ Read subarray from region of interest.
Region of interest is defined by
start
andstop
and must be in bounds of the dataset.Parameters: - start (tuple) – start coordinates of the roi.
- stop (tuple) – stop coordinates of the roi.
Returns: np.ndarray
-
shape
¶ Shape of this dataset.
-
size
¶ Size (total number of elements) of this dataset.
-
write_subarray
(start, data)[source]¶ Write subarray to dataset.
data
is written to region of interest, defined bystart
and the shape ofdata
. The region of interest must be in bounds of the dataset and the datatype must agree with the dataset.Parameters: - start (tuple) – offset of the roi to write.
- data (np.ndarray) – data to write; shape determines the roi shape.
-
zarr_default_compressor
= 'blosc'¶ Default compression for zarr format
-
z5py.file module¶
-
class
z5py.file.
File
(path, use_zarr_format=None, mode='a')[source]¶ Bases:
z5py.group.Group
File to access zarr or n5 containers on disc.
The container corresponds to a directory on the filesystem. Groups are subdirectories and datasets are subdirectories that contain multi-dimensional data stored in binary format. Supports python dict api.
Parameters: - path (str) – path on filesystem that holds the container.
- use_zarr_format (bool) – flag to determine if container is zarr or n5 (default: None).
- mode (str) – file mode used to open / create the file (default: ‘a’).
-
classmethod
infer_format
(path)[source]¶ Infer the file format from the file extension.
Returns: True for zarr, False for n5 and None if the format could not be infered. Return type: bool
-
n5_exts
= set(['.n5'])¶ file extensions that are inferred as n5 file
-
zarr_exts
= set(['.zr', '.zarr'])¶ file extensions that are inferred as zarr file
-
class
z5py.file.
N5File
(path, mode='a')[source]¶ Bases:
z5py.file.File
File to access n5 containers on disc.
Parameters: - path (str) – path on filesystem that holds the container.
- mode (str) – file mode used to open / create the file (default: ‘a’).
-
class
z5py.file.
ZarrFile
(path, mode='a')[source]¶ Bases:
z5py.file.File
File to access zarr containers on disc.
Parameters: - path (str) – path on filesystem that holds the container.
- mode (str) – file mode used to open / create the file (default: ‘a’).
z5py.group module¶
-
class
z5py.group.
Group
(path, is_zarr=True, mode='a')[source]¶ Bases:
_abcoll.Mapping
Group inside of a z5py container.
Corresponds to a directory on the filesystem. Supports python dict api. Should not be instantiated directly, but rather be created or opened via the create_group, request_group or [] operators of Group or File.
-
attrs
¶ Access additional attributes.
Returns: AttributeManager
.
-
create_dataset
(name, shape=None, dtype=None, data=None, chunks=None, compression=None, fillvalue=0, n_threads=1, **compression_options)[source]¶ Create a new dataset.
Create a new dataset in the group. Syntax and behaviour similar to the corresponding
h5py
functionality. In contrast toh5py
, there is no option to store a dataset without chunking (if no chunks are given default values, suitable for the dimension of the dataset, will be used). Also, if a dataset is created with data and a dtype that is different from the data’s is specified, the function throws a RuntimeError, instead of converting the data.Parameters: - name (str) – name of the new dataset.
- shape (tuple) – shape of the new dataset. If no shape is given,
the
data
argument must be given. (default: None). - dtype (str or np.dtpye) – datatype of the new dataset. If no dtype is given,
the
data
argument must be given (default: None). - data (np.ndarray) – data used to infer shape, dtype and fill the dataset upon creation (default: None).
- chunks (tuple) – chunk sizes of the new dataset. If no chunks are given, a suitable default value for the number of dimensions will be used (default: None).
- compression (str) – name of the compression library used to compress chunks. If no compression is given, the default for the current format is used (default: None).
- fillvalue (float) – fillvalue for empty chunks (only zarr) (default: 0).
- n_threads (int) – number of threads used for chunk I/O (default: 1).
- **compression_options – options for the compression library.
Returns: the new dataset.
Return type: Dataset
-
create_group
(name)[source]¶ Create a new group.
Create new (sub-)group of the group. Fails if a group of this name already exists.
Parameters: name (str) – name of the new group. Returns: group of the requested name. Return type: Group
-
file_modes
= {'a': <Mock name='mock.FileMode.a' id='139742137566224'>, 'r': <Mock name='mock.FileMode.r' id='139742137129616'>, 'r+': <Mock name='mock.FileMode.r_p' id='139742137129744'>, 'w': <Mock name='mock.FileMode.w' id='139742137129872'>, 'w-': <Mock name='mock.FileMode.w_m' id='139742137130000'>, 'x': <Mock name='mock.FileMode.w_m' id='139742137130000'>}¶ available modes for opening files. these correspond to the
h5py
file modes
-
require_dataset
(name, shape, dtype=None, chunks=None, n_threads=1, **kwargs)[source]¶ Require dataset.
Require dataset in the group. Will create the dataset if it does not exist, otherwise returns existing dataset. If the dataset already exists, consistency with the arguments
shape
,dtype
(if given) andchunks
(if given) is enforced.Parameters: - name (str) – name of the dataset.
- shape (tuple) – shape of the dataset.
- dtype (str or np.dtpye) – datatype of dataset (default: None).
- chunks (tuple) – chunk sizes of the dataset (default: None).
- n_threads (int) – number of threads used for chunk I/O (default: 1).
- **kwargs – additional arguments that will only be used for creation if the dataset does not exist.
Returns: the required dataset.
Return type: Dataset
-
z5py.util module¶
-
z5py.util.
rechunk
(in_path, out_path, in_path_in_file, out_path_in_file, out_chunks, n_threads, out_blocks=None, dtype=None, use_zarr_format=None, **new_compression)[source]¶ Copy and rechunk a dataset.
The input dataset will be copied to the output dataset chunk by chunk. Allows to change datatype, file format and compression as well.
Parameters: - in_path (str) – path to the input file.
- out_path (str) – path to the output file.
- in_path_in_file (str) – name of input dataset.
- out_path_in_file (str) – name of output dataset.
- out_chunks (tuple) – chunks of the output dataset.
- n_threads (int) – number of threads used for copying.
- out_blocks (tuple) – blocks used for copying. Must be a multiple
of
out_chunks
, which are used by default (default: None) - dtype (str) – datatype of the output dataset, default does not change datatype (default: None).
- use_zarr_format (bool) – file format of the output file, default does not change format (default: None).
- **new_compression – compression library and options for output dataset. If not given, the same compression as in the input is used.