P03 - Pylspack: Fast Parallel Algorithms, Data Structures and Software for Sparse Matrix Sketching, Column Subset Selection, Regression and Leverage Scores
Presenter
DescriptionIn recent work, we developed novel parallel algorithms and data structures and software implementations for three fundamental operations in Numerical Linear Algebra: (i) matrix sketching, (ii) computation of the Gram matrix and (iii) computation of the squared row norms of the product of two matrices. This presentation focuses on the ubiquitous Gaussian and CountSketch random projections, as well as their combination. We present the data structures for storing such random projection matrices, that are memory efficient and fully parallelizable both to construct and to multiply with a dense or sparse input matrix. We show how these results can applied to other important problems, namely column subset selection, least squares regression and leverage scores estimations. We also present details of our publicly available implementation (https://github.com/IBM/pylspack), the Pylspack Python package, whose core is written in C++ and paralellized with OpenMP. We show that our implementations outperform existing state-of-the-art libraries, namely the corresponding implementations from libskylark (https://xdata-skylark.github.io/libskylark/) and scikit-learn (https://scikit-learn.org) for the same tasks. Pylspack is fully compatible with standard numerical packages like SciPy and NumPy and is easily obtained via a single command: pip install git+https://github.com/IBM/pylspack.
TimeTuesday, June 289:00 - 11:00 CEST
LocationFoyer 2nd Floor
Event Type
Poster