Reference: Byrd, G. T. & Delagi, B. A. Support for Fine-Grained Message Passing in Shared Memory Multiprocessors. 1989.
Abstract: Recent high-performance multiprocessors exploit cut-through routing, with packets routed as their first bytes arrive. Hardware-supported multicast can benefit the many parallel programs in which producers provide each value to multiple consumers. We describe several cut-through multicast protocols, including a restrictive (yet adaptive) routing scheme for deadlock avoidance. Simulations using synthetic and application-driven loads show it has significantly better performance than either multicast emulation or deadlock detection and resolution. The scheme provides cut-through multicast without requiring dedicated storage in the communication facilities for a full packet. We thus extend ideas considered for efficient cut-through routing in multiprocessor systems to include multicast.