DataflowBibliography

From PublicWiki
Jump to: navigation, search

Conference Papers

Lays out two "fundamental" requirements for good parallel processing: The ability to tolerate long-latency memory operations and fast synchronization. they critique a range of multiprocessors and describe their own. I think it's the TTDA.
swanson 16:46, 8 March 2006 (PST)
  • Nikhil, R. S., G. M. Papadopoulos, Arvind, "*T: a multithreaded massively parallel architecture" Proceedings of the 19th annual international symposium on Computer architecture table of contents, 1992. http://doi.acm.org/10.1145/139669.139715
This is a coars-grain dataflow machine. It's basically an implementation of TAM. A multi-threaded processor with fast synchronization support. Messages for memory access and communication carry continuations with them. The introduction would serve well as the beginnings of an outline of a survey of dataflow machines. swanson 19:12, 11 March 2006 (PST)
  • Culler, David E., Klaus Erik Schauser, Thorsten von Eicken "Two Fundamental Limits on Dataflow Multiprocessing", Proceedings of the IFIP Working Group 10.3 (Concurrent Systems) Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, Jan. 1993. http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-716.pdf
This is the "dataflow is dead" paper. There are two arguments. First, you can't keep enough threads close to the processor to cover the latency of communication, because the top of the memory hierarchy is small. Second, even if you have smarter local scheduler that does its best to exploit locality (via TAM), performance is still not good enough. They argue that you need a global scheduling strategy instead of the local strategy that the dataflow firing rule defines. If you apply their analytic model to WaveScalar, it shows that performance is fine as long as we can localize communication pretty well and we can fit the working set of instructions into the array.
swanson 16:46, 8 March 2006 (PST)
  • Dennis, Jack B. and David P. Misunas. "A preliminary architecture for a basic dataflow processor." In the Proceedings of the 2nd Annual Symposium on Computer Architecture (ISCA), 1975. http://doi.acm.org/10.1145/642089.642111
Description of static dataflow and an architecture for executing programs composed of static dataflow graphs. This is a very early paper on the topic of dataflow architectures, if not the earliest. The first few pages provide a reasonable introduction to the fundamentals of dataflow computation, though they are limited (I believe) to static dataflow models. The architecture described foreshadows that of many future dataflow machines, and the instruction caching problem that became the matching table overflow problem of later machines is discussed (or at least mentioned).
Schwerin 13:25, 5 January 2006 (PST)
  • Arvind and Kathail. "A multiple processor data flow machine that supports generalized procedures." In the Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA), 1981. ACM lists no DOI for this object, but it's in the digital library.
I believe this paper describes the architecture of the MIT tagged-token dataflow machine.
Schwerin 14:37, 9 January 2006 (PST)
  • Grafe, V.G., G.S. Davidson, J.E. Hoch, and V.P. Holmes. "The Epsilon Dataflow Processor." In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989. http://doi.acm.org/10.1145/74925.74930
Eplison is a tagged token machine. It has an indirect send-style instruction. It uses a matching algorthim vagualy similar to WaveScalar's, although without the caching aspect. Each processor in the machine has private tagged memory of finite size, so there was no recursion or dynamic loop unrolling. A follow on was planned the esp'88. it was a dynamic machine. swanson 14:31, 9 March 2006 (PST)
  • Nikhil, R.S. "Can dataflow subsume von Neumann computing?" In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989. http://doi.acm.org/10.1145/74925.74955
This paper describes the P-RISC architecture, which adds dataflow features to RISC-style multiprocessors. The features come from the addition of fine grain threading and synchronization operations at the ISA level.
Schwerin 14:19, 9 January 2006 (PST)
  • T. Shimada, K. Hiraki, K. Nishida, S. Sekiguchi "Evaluation of a Prototype Data Flow Processor of the SIGMA-1 for Scientific Computations" Proceedings of the 13th annual international symposium on Computer architecture, 1983. http://doi.acm.org/10.1145/17407.17383
Describes the Sigma-1, an early dataflow machine from Japan. They have a single processing element from a 128 element system. The proposed system uses a hierarchical network to allow the PEs to communicate. They evaluate its performance on a bunch of small loops and compare it to a single von Neumann core. They measure the extra loop and control overhead that dataflow requires and add some special instructions to reduce it.
swanson 17:27, 8 March 2006 (PST)
  • Sakai, S., Y. Yamaguchi, K. Hiraki, Y. Kodama and T. Yuba. "An architecture of a dataflow single chip processor." In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989. http://doi.acm.org/10.1145/74925.74931
Description of the EM-4, a follow-on to the Sigma-1. EM-4 was to be a first dataflow multiprocessor in which every processing node occupied no more than a single chip. Describes a strongly connected arc model, which is broadly similar to the Monsoon frame model and its ilk. Another similarity to Monsoon is a frame-based matching scheme implemented with normal memories. Discusses shortcomings of prior dataflow machines. Attempts to address these in the new architecture.
Schwerin 13:45, 5 January 2006 (PST)
  • Papadopoulos, Gregory M., and Kenneth R. Traub. "Multithreading: a revisionist view of dataflow architectures." In the Proceedings of the 18th Annual International Symposium on Computer Architecture (ISCA), 1991. http://doi.acm.org/10.1145/115952.115986
This paper looks at the Monsoon dataflow architecture as a multithreaded computer with fine-grain threads.
Schwerin 14:19, 9 January 2006 (PST)
The discussion in the conclusion about the relationship between von Neumann multithreading and coarse-grain dataflow is pretty interesting. swanson 12:25, 17 March 2006 (PST)


The Manchester machine is dynamic dataflow machine. It uses a tag-matching scheme quite similar to WaveScalar's. Tokens are stored in a hash table, and there is an overflow unit to hold extra token in case of conflicts in the hash-table. swanson 15:09, 9 March 2006 (PST)
  • Papadopoulos, Gregory M., David E. Culler, "Monsoon: an explicit token-store architecture" in Proceedings of the 17th annual international symposium on Computer Architecture table of contents, 1990. http://doi.acm.org/10.1145/325164.325117
The key contribution of Monsoon is the explicit token store (ETS). A frame of memory is allocated for each function call. It contains an entry (marked with a pressence bit) for each arc in the function's dataflow graph. Match checking is a read to see if the other token is there. If it is, there is a match, so fire the instruction. Otherwise, there is no match, so just write the new token and set the bit. swanson 22:02, 11 March 2006 (PST)
This is a pretty thin description of the Epsilon-2 dataflow machine. The intro says it's a coarse-grain machine, although the description of architecture seems to suggest otherwise. swanson 13:18, 17 March 2006 (PST)
  • Davis, A. L. 1978. The architecture and system method of DDM1: A recursively structured Data Driven Machine. In Proceedings of the 5th Annual Symposium on Computer Architecture (April 03 - 05, 1978). ISCA '78. ACM Press, New York, NY, 210-215. DOI= http://doi.acm.org/10.1145/800094.803050
This paper describes the DDM1. It is, as near as I can tell, a non-tagged dynamic dataflow machine. Multiple inputs values can reside on each dataflow arc. They are consumed in FIFO order. The overall machine structure is hierarchical, with processing elements and memory at each level. The processing elements at one level can be made up of other processing elements and more memory. There is a poorly described process of dynamically decomposing dataflow programs to fit the structure of the machine.swanson 13:55, 17 March 2006 (PST)


Refereed Journal Papers

The TTDA processing element has a very similar structure to WaveScalar's. They even remark that matching could be done using hashing, although they do not describe how to deal with overflow. They also provide rudimentary tag-management instructions for function calls. The machine is very similar to the Manchester machine. The paper provides a nice discussion of the pregression from the U-interpreter to TTDA to monsoon. swanson 19:54, 11 March 2006 (PST)

Theses

Describes resource management issue in TTDA. Also provides the best description I have found of the TTDA machine. swanson 14:18, 16 March 2006 (PST)