A New Kind of Computer

Foreword: This blog post outlines our breakthrough photonic AI acceleration research. This body of work is essential for the future of computing and as a look-ahead at what may come tomorrow. Today, the problem to solve is interconnecting millions of chips, and at Lightmatter, we’re fully committed to addressing that challenge for our customers.

Computing stands at an inflection point unlike anything we’ve seen since the transistor was invented. Artificial intelligence workloads are driving computational demands beyond what traditional scaling laws—Moore’s Law, Dennard scaling, and memory scaling—can deliver. All three have effectively stalled, particularly on a per-silicon-area basis.

_{Photonic processor PCI-e card top view and side view as well as a bottom view of the photonic processor chip package.}

To double the performance of a chip package in your next generation, you double the silicon area in the package. To double chip memory, double the number of DRAM chips in the chip package. This is mostly great news for silicon foundries. While the approach of doubling silicon area is reasonable in the near term, it has a major side effect—exponential cost. Soon, you will not be able to afford your computer. Consumer GPUs are already prohibitively expensive.

It wasn’t like this in the past. The fundamental units of compute (and memory) were shrinking and reducing their energy consumption commensurate with the density improvements.

Zooming out, the future of computing requires breakthroughs in three critical areas: memory, interconnect, and compute. Developing a scalable, DRAM-like memory solution remains a significant and unresolved challenge without clear practical solutions. For interconnect, there is growing awareness of Lightmatter’s groundbreaking Passage photonic interconnect and advanced packaging technology. Today, we’re excited to unveil our latest achievement in the area of compute—a revolutionary photonic processor detailed in our publication in Nature.

A Fundamental Departure

Over the past decade, accelerated matrix computations have become foundational to advances in artificial intelligence (AI) and deep learning, driving demand for improved energy efficiency and computational performance. Photonics has emerged as a compelling alternative to traditional electronics by offering inherent advantages such as high bandwidth, low latency, color-enabled parallelism, and the potential for superior energy efficiency through light-based computation. Recent developments, including photonic accelerators utilizing interleaved time-wavelength modulation and photoelectric multiplication, have demonstrated significant progress toward practical photonic processors for AI tasks. However, substantial challenges remain, particularly in achieving the necessary computational precision, scalability, seamless system integration, and compatibility with modern AI architectures. Addressing these hurdles is crucial for photonic processors to become competitive with electronic accelerators, unlocking the potential for substantial enhancements in computational speed and efficiency.

Interestingly, the evolution of AI computing has continually reduced computational precision requirements from 32-bit floating-point down to as low as 4-bit operations today. Given that photonic computation’s energy cost scales exponentially (and super-exponentially in some cases) with increased precision demands, this reduction in precision requirements presents a significant opportunity. Nonetheless, no photonic chip has yet achieved the precision required for practical AI applications, and previous demonstrations have been limited to simplified benchmark tasks.

Today, we introduce a photonic processor capable of executing advanced AI models such as ResNet, BERT, and the Atari deep reinforcement learning algorithm pioneered by DeepMind. To our knowledge, this represents the first photonic processor capable of accurately executing state-of-the-art neural networks—including transformers, convolutional network classification and segmentation tasks, and reinforcement learning algorithms—without modifications. Critically, this processor achieves accuracies approaching those of conventional 32-bit floating-point digital systems “out-of-the-box,” without relying on advanced methods such as fine-tuning or quantization-aware training. This breakthrough validates the computational robustness of photonic processing, marking a significant milestone for photonic computing as it competes with established electronic AI accelerators and moves closer to realizing post-transistor computing technologies.

The processor integrates six chips within a single package, employing high-speed interconnects between vertically aligned photonic tensor cores and control dies, delivering exceptional efficiency and scalability in AI computations. Despite several hardware non-idealities discussed in supplemental materials, the processor performs 65.5 trillion Adaptive Block Floating-Point 16-bit (ABFP) operations per second, consuming only 78 watts of electrical power and 1.6 watts of optical power. This integration level represents the highest yet achieved in photonic processing. The 3D packaging involved is a technological marvel—50 billion transistors spanning 6 chips with 1 million photonic components. This level of integration has not been achieved to date.

Beyond the Laboratory

_{(Left) Photonic processor rack with eight servers. (Right) Server housing up to eight photonic processors and two host processors (in this case Intel Xeon processors).}

To fully appreciate the significance of this breakthrough, let’s consider how it differs from other alternative computing technologies that researchers have pursued for decades. Various non-traditional computing methods have aimed to address limitations inherent to transistor-based architectures:

Quantum computing promises exponential speedups for certain problems but currently struggles with error correction, scalability, and maintaining coherence. It’s also hard to create new algorithms for quantum computers that are provably more efficient than algorithms that can be executed on classical computers.
DNA computing leverages molecular-scale parallelism but faces significant practical barriers, including slow operational speeds and difficulties in interfacing with conventional computing systems.
Neuromorphic and analog computing methods provide unique ways of processing information inspired by biological neural networks but often lack flexibility, general-purpose applicability, and compatibility with existing algorithms.
Carbon nanotube-based processors aim to replace silicon transistors, yet remain fundamentally limited by the energy and time costs associated with charging and discharging the electrical wires interconnecting nanotube compute elements. As a result, these processors may only offer a 2x improvement in computational efficiency over traditional transistor-based systems, since interconnects consume approximately half the total energy. It’s important to note that this theoretic maximum efficiency gain improvement only applies to the computational energy budget and not to data movement or memory access.
Earlier generations of photonic and optical computing have shown theoretical potential, yet consistently failed to evolve beyond proof-of-concept demonstrations in controlled environments.

Our advancement stands out because it transcends these limitations, crossing the crucial threshold from theoretical promise to practical reality. Unlike previous photonic computing efforts, our photonic processor isn’t merely a lab prototype; it’s a fully operational system capable of running advanced neural network models actively used in production today.

_{(Left) Micrograph of the photonic tensor core with bumps and waveguides visible. Metalization obscures much of the intricate photonic structure. (Right) Micrograph of the digital control interface dies. Metalization and bumps are visible.}

For the first time in computing history, we’ve demonstrated a non-transistor-based technology capable of running complex, real-world workloads with accuracy and efficiency comparable to existing electronic systems. This shift from theoretical possibility to practical implementation signals a new chapter in computing, validating photonics as a viable solution that can significantly impact the future of AI processing.

Behind the Scenes

Running large AI models demands high numerical precision, a significant challenge for analog processors like our photonic chip. Standard fixed-point numbers, often used in analog systems, struggle with the vast range of values encountered in deep neural networks, leading to potential accuracy loss. To overcome this, we utilized and refined the ABFP numerical format mentioned earlier. The core idea is to group numbers (like segments of weight or activation vectors) into blocks. Instead of assigning an exponent to every single number (like standard floating-point), ABFP assigns a single, shared exponent (or scale factor) to the entire block, determined by the largest absolute value within that block. These vectors are normalized using their respective scales, quantized to fixed-point format for the analog computation, and the result is then rescaled using the appropriate block exponents. This adaptive scaling, applied independently to input and weight blocks feeding into each computation tile, drastically reduces quantization errors compared to simpler fixed-point or non-adaptive block schemes, preserving crucial information even with limited bit precision.

_{FP32 accuracy for various tensore tile widths and AI models as a function of gain factor. Tensore tile widths and gain play a large role in the accuracy of AI model execution.}

A key innovation in our work was combining ABFP with analog gain control. When performing matrix multiplication in analog, summing many terms (determined by the tile size, N) increases the potential range of the output signal. If the analog-to-digital converter (ADC) has fixed precision, this larger range means the smallest variations (the least significant bits) can get lost. We introduced a technique where the analog signal resulting from the computation is physically amplified by a gain factor, G, before it hits the ADC. Mathematically, the quantized output yijq becomes Q(G⋅wijq⋅xjq;…). This gain boosts the lower-order bits, allowing the ADC to capture them, effectively increasing precision without needing a higher-resolution (and more power-hungry) ADC. The gain is then divided out digitally (yi=∑(…)sijy/G). While this risks saturating the largest values, careful management allows a net gain in representational accuracy, proving critical for achieving the precision needed for complex AI models on our photonic hardware.

Photonic components are inherently sensitive to environmental factors like temperature – a trait valuable for sensors, but a formidable obstacle for stable computing. Achieving the high precision needed for AI calculations meant actively controlling and stabilizing every single one of the roughly one million photonic elements in our processor using dedicated mixed-signal circuits.

Evolution Rather Than Revolution

Our approach doesn’t require abandoning existing infrastructure and algorithms. Our processor leverages a hybrid photonic-electronic design, integrating photonic tensor cores with conventional electronic control and memory systems. The system is programmed using standard AI frameworks like PyTorch and TensorFlow, making adoption straightforward for developers.

This pragmatic approach mirrors how other transformative technologies gained traction. Early GPUs didn’t replace CPUs but complemented them for graphics processing before expanding to general computation. Similarly, photonic processors may initially serve as specialized accelerators for AI workloads before potentially expanding to broader applications.

The integration of photonic technology also offers a promising roadmap for continued improvement. Techniques such as quantization-aware training have already shown significant accuracy improvements. Additionally, wavelength division multiplexing could dramatically increase computational density without proportional increases in size or power.

A New Computing Pathway

Today, memory access and data movement dominate the energy consumption and execution time of AI workloads as much as—or more than—the computation itself. Even if computational units consumed nearly zero energy, overall efficiency would still be constrained by data-transfer overhead. Lightmatter’s breakthrough photonic interconnect technology, Passage^TM, addresses this critical bottleneck by dramatically reducing energy spent on data movement and by providing ultra-high bandwidth for today’s cutting-edge AI computing infrastructure. Our publication highlights that continued innovation in compute technology remains critically valuable: we’ve introduced a photonic processor that unveils a path towards commercially viable alternative processors (especially with modern workloads executing with 4 bits of precision). These advances in interconnect and compute are the first major steps towards fundamentally reshaping how computers work.

Make no mistake—this work represents a historic moment in computing. For the first time, a non-transistor-based technology can execute complex, real-world workloads with accuracy comparable to traditional digital electronics. As we approach the end of fundamental electronic scaling improvements, our work demonstrates a viable path forward that doesn’t depend solely on transistor scaling. This doesn’t mean electronic computing will vanish; rather, we’re entering an era where multiple computing paradigms coexist, each optimized for specific tasks.

The invention of the integrated circuit, the microprocessor, or the transistor itself—none of these innovations immediately replaced their predecessors, but each fundamentally changed what was achievable. At Lightmatter, we’ve demonstrated that computing’s next chapter need not remain bound by transistor limitations. For an industry accustomed to continual reinvention, photonics represents an exciting and necessary new frontier.

Humanity must innovate at the fundamental level of computing technology or risk approaching a computational cost singularity.

Read “Universal photonic artificial intelligence acceleration” over at Nature.

Nick Harris, Ph.D.
Founder and CEO