Patent attributes
Some embodiments provide a method for a neural network inference circuit (NNIC) that implements a neural network including multiple computation nodes at multiple layers. Each computation node includes a dot product of input values and weight values and a set of post-processing operations. The method retrieves a set of weight values and a set of input values for a computation node from a set of memories of the NNIC. The method computes a dot product of the retrieved sets of weight values and input values. The method performs the post-processing operations for the computation node on a result of the dot product computation to compute an output value for the computation node. The method stores the output value in the set of memories. No intermediate results of the dot product or the set of post-processing operations are stored in any RAM of the NNIC during the computation.