1. Introduction
Elimination and inversion reduce to panel factorizations and trailing matrix–matrix updates [
1,
2]. Our bit-sliced (bitplane) GEMM reconstructs integer products from Boolean plane products using AND+POPCNT and power-of-two shifts, thus removing
all scalar multiplications from matrix products. This paper applies the same mechanism to Schur-complement updates in elimination/inversion and combines it with exact pivot regimes: Gauss–Jordan over
, Bareiss over
, and modular LU + CRT. We emphasize exactness and reproducibility with simple Python reference code.
Contributions.
(1) Bit-sliced Boolean trailing updates inside blocked LU (zero scalar multiplications for all GEMMs); (2) a pure-Boolean Gauss–Jordan over ; (3) fraction-free Bareiss with bit-sliced numerators; (4) modular LU + CRT path with bit-sliced updates modulo primes; (5) executable code and verified small-case numerics.
2. Related Work
Foundational accuracy and stability are treated by Higham and by Golub–Van Loan [
1,
2]. Bareiss introduced fraction-free elimination [
3]. CRT and residue-number reconstruction for exact linear algebra are classical [
4,
5]. Bit-sliced/bit-serial multiplication appears in hardware/software works such as BISMO [
6] and in popcount-oriented references [
7,
8].
3. Method
3.1. Bit-Sliced Boolean GEMM (Recap)
Let
,
, with
. With row-packing for
and column-packing for
along the shared inner dimension,
using only Boolean operations and shifts. Exactness holds in integer/modular settings provided accumulator widths (or CRT) are sufficient.
3.2. Where GEMM Appears in Elimination
In a blocked LU, after forming a panel with pivots, the trailing update is
a GEMM addressed by the bit-sliced core. Pivoting (row swaps) is compatible; we keep all
panel solves and
scalar inverses outside the matrix-product core.
3.3. Exact Regimes
GF(2) Gauss–Jordan. Addition is XOR and , so row scaling vanishes; the algorithm is purely Boolean.
Bareiss (fraction-free). Divisions are exact by construction; numerators are bilinear (bit-sliced); exactness follows from Bareiss divisibility [
3].
Modular LU + CRT. Perform LU modulo several primes; panel inverses are modular scalars; updates are bit-sliced modulo
p; reconstruct over
via CRT once the product of primes exceeds a bound [
4,
5].
4. Results: Multiplication Counts
We count matrix–matrix multiplications (GEMMs) only. Trailing updates in a blocked LU are 1 GEMM per block traditionally and
0 in the bit-sliced model (Boolean GEMM). Gauss–Jordan over
uses row ops/XOR only (0 GEMMs).
| Task |
Traditional GEMMs |
Bit-sliced GEMMs |
| Trailing update
|
1 per block |
0 |
| Blocked LU (integer/mod p) |
many |
0 (all trailing) |
| Gauss–Jordan over
|
0 |
0 |
5. Numerical Illustrations
Executable tests: (i) bit-sliced GEMM equals NumPy integer GEMM; (ii) GF(2) inverse on an invertible example; (iii) blocked LU driver that calls Boolean GEMM for each trailing update and reports GEMM counts; (iv) modular CRT inverse sanity check; and (v) Bareiss step with exact divisibility.
6. Discussion and Limitations
The bit-sliced model eliminates scalar multiplications from all matrix products within elimination/inversion. Exactness over follows by Bareiss or modular LU + CRT. Practical speed depends on POPCNT throughput and bandwidth. Large problems need careful blocking, packing, and prime/bound selection for CRT.
Funding
No external funding was received.
Data and Code Availability
All illustrative code is in the Appendix.
AI Assistance Disclosure
Drafting assistance and editing supported by an AI system; the author is responsible for all content and claims.
Conflicts of Interest
The author declares no conflicts of interests.
Appendix A Python Reference (Executable)
Appendix A.1. Bit-Sliced Boolean GEMM, GF(2) Inverse, Blocked LU Driver, CRT Inverse, Bareiss, Tests




References
- N. J. Higham. Accuracy and Stability of Numerical Algorithms, 2nd ed. SIAM, 2002.
- G. H. Golub and C. F. Van Loan. Matrix Computations, 4th ed. Johns Hopkins, 2013.
- E. H. Bareiss. Sylvester’s Identity and Multistep Integer-Preserving Gaussian Elimination. Math. Comp. 1968, 22, 565–578. [Google Scholar]
- H. Garner. The Residue Number System. IRE Trans. Electronic Computers 1959, EC-8, 140–147. [Google Scholar]
- Crandall and C. Pomerance. Prime Numbers: A Computational Perspective, 2nd ed. Springer, 2005.
- Y. Umuroglu and M. Jahre. BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing. FPL, 2018.
- H. S. Warren. Hacker’s Delight, 2nd ed. Addison-Wesley, 2012.
- D. E. Knuth. The Art of Computer Programming, Vol. 4A. Addison-Wesley, 2011.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).