SBIR/STTR Award attributes
Most AI accelerator hardware, such as GPU and TPU, are focused on increasing the throughput and efficiency for training and inference algorithms by massively parallelizing the computation across distinct computing unit and processors. Massive parallelization has to be powered by the continuous shrink of transistor nodes. Unfortunately, an increasing number of barriers are inhibiting the scaling of processing and energy densities. Fundamentally, as transistor shrinks, short circuit current and leakage become more pronounced, causing the energy dissipation to hit the power density ceiling that IC can reasonably remove. Another issue is, a consequence of optimizing for parallelization is a compromise in latency. The TPU classification engine can process 280,000 inferences per second, but the response time remains around 10 ms. Defense applications requiring real-time processing, such as GPS-less navigation in the jammed battlefield. With digital electronics, such as TPU, the first issue is the processing latency. Real-time feedback based on such a latency (i.e. ~10 ms) would limit the processing bandwidth of only 100 Hz. Hypersonic aircraft and missile tracking, for example, requires feedback loop latency of under 1 ms.The second issue is the bandwidth of each data channel. For example, command of the radio spectrum requires real-time processing of many GHz signals, e.g. suppressing jamming and interference in adversarial environments. This is only made possible with reconfigurable hardware that does not depend on synchronous clocks. The third issue is, defense applications need edge devices with predominantly low size, weight and power (SWaP). There are two primary operations in implementing neural network: data movement (i.e. interconnect) and performing linear operations such as matrix vector multiplications (MVMs). In highly parallel processors, i.e. TPU, data movement and MVMs take as large as 90% (or more) of the total energy cost. Photonic circuits are well suitable for the implementations of neural networks because of these exact two predominant reasons: interconnectivity and linear operations, which promises to fundamentally alter the bandwidth and interconnectivity tradeoff of electronics. We propose to investigate and develop an integrated photonic hardware for linear vector, matrix, and tensor operations, based on silicon microring (MRR) weight banks with forward-biased PIN junctions. This photonic hardware would enable high-performance computing with ultrafast matrix loading speed of >1 GHz, high computation density > 10 TOPs/mm2/s, and with only 0.2 mW power consumption per computing unit (i.e. MRR, and each unit offers over 20 GTOP/s). Our photonic accelerator is based on high-performance and mature silicon photonic devices available in the mainstream foundries, and thus is ready for fast prototype development for Phase I, as well as for further scaling up for massive fabrication for Phase II.