The fill rate of a matrix is a ration between non-zero and zero elements. If the latter significantly outweighs the former then we speak of Sparse Matrices. Depending on the sparsity pattern some storage format are more efficient than others. Nevertheless a sparse matrix is an object of multiple fields as opposed to a single contagious memory location with homogeneous type.
Netlib considers the following sparse storage formats:
|Compressed Sparse Row||
|Compressed Sparse Column||
|Block Compressed Sparse Storage||
|Compressed Diagonal Storage||
|Jagged Diagonal Storage||
Multi Dataset Storage Format#
Single Dataset Storage Format#
TODO: write code and documentation
Interop With Other Systems#
has no direct support to save / load sparse matrices
import scipy.sparse as sp_sparse import tables with tables.open_file(filename, 'r') as f: mat_group = f.get_node(f.root, 'matrix') data = getattr(mat_group, 'data').read() indices = getattr(mat_group, 'indices').read() indptr = getattr(mat_group, 'indptr').read() shape = getattr(mat_group, 'shape').read() matrix = sp_sparse.csc_matrix((data, indices, indptr), shape=shape)
_refsdirectory. The screen shot shows A,B sparse matrices saved in Julia, and a Pyhton
h5sparseto compare. On the bright side the julia HDF5 package is feature full, it is possible loading sparse matrices to H5PY.
using JLD, SparseArrays A = sprand(Float64, 10,20, 0.1) B = sprand(Float64, 10,20, 0.1) @save "interop.h5" "data-01/A" A "data-02/B" B
is an efficient file format for large omics datasets. Loom files contain a main matrix, optional additional layers, a variable number of row and column annotations, and sparse graph objects. Under the hood, Loom files are HDF5 and can be opened from many programming languages, including Python, R, C, C++, Java, MATLAB, Mathematica, and Julia.
The top level of the file contains a single HDF5 group, called matrix, and metadata stored as HDF5 attributes. Within the matrix group are datasets containing the dimensions of the matrix, the matrix entries, as well as the features and cell-barcodes associated with the matrix rows and columns, respectively. format
|barcodes||string||Barcode sequences and their corresponding GEM wells (e.g. AAACGGGCAGCTCGAC-1)|
|data||uint32||Nonzero UMI counts in column-major order|
|indices||uint32||Zero-based row index of corresponding element in data|
|indptr||uint32||Zero-based index into data / indices of the start of each column, i.e., the data corresponding to each barcode sequence|
|shape||uint64||Tuple of (# rows, # columns) indicating the matrix dimensions|
Material based on Netlib Documentation ↩