In one example, an apparatus comprises: a local on-chip memory; a computation engine configured to generate local data and to store the local data at the local on-chip memory; and a controller. The apparatus is configured to be coupled with a second device via an interconnect, the second device comprising a local memory. The controller is configured to: fetch the local data from the local on-chip memory; fetch remote data generated by another device from a local off-chip memory; generate output data based on combining the local data and the remote data; and store, via the interconnect, the output data at the local memory of the second device.