Arithmetic circuits and methods that perform efficient matrix multiplication for hardware acceleration of neural networks, machine learning, web search and other applications are disclosed herein. Various arrays of multiplier-accumulators may be coupled to form a matrix multiplier which processes data using high precision, fixed point residue number arithmetic.