Patent attributes
An apparatus includes a memory and a circuit. The memory may be configured to store data. The circuit generally includes a local buffer. The circuit may be configured to (i) fetch all or a portion of a first array of values from the memory to the local buffer, (ii) fetch all or a portion of a second array of values from the memory to the local buffer, (iii) calculate an intermediate array of values by multiplying a converted version of the first array by a converted version of the second array, and (iv) calculate an output array comprising a plurality of output values based on values of the intermediate array and a predefined dimensional reduction.