Difference between revisions of "DataflowBibliography"

From PublicWiki
Jump to: navigation, search
 
(30 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Conference Papers ==
 
== Conference Papers ==
 +
* Arvind, Robert A. lannucci "A Critique of Multiprocessing von Neumann Style" In Proceedings of the 10th annual international symposium on Computer architecture, 1983. http://portal.acm.org/citation.cfm?id=801684
 +
: Lays out two "fundamental" requirements for good parallel processing:  The ability to tolerate long-latency memory operations and fast synchronization.  they critique a range of multiprocessors and describe their own.  I think it's the TTDA. <br> [[User:Swanson|swanson]] 16:46, 8 March 2006 (PST)
 +
 +
*  Nikhil, R. S., G. M. Papadopoulos, Arvind, "*T: a multithreaded massively parallel architecture" Proceedings of the 19th annual international symposium on Computer architecture table of contents, 1992. http://doi.acm.org/10.1145/139669.139715
 +
: This is a coars-grain dataflow machine.  It's basically an implementation of TAM.  A multi-threaded processor with fast synchronization support.  Messages for memory access and communication carry continuations with them. The introduction would serve well as the beginnings of an outline of a survey of dataflow machines. [[User:Swanson|swanson]] 19:12, 11 March 2006 (PST)
 +
 +
* Culler, David E., Klaus Erik Schauser, Thorsten von Eicken "Two Fundamental Limits on Dataflow Multiprocessing",  Proceedings of the IFIP Working Group 10.3 (Concurrent Systems) Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, Jan. 1993.  http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-716.pdf
 +
: This is the "dataflow is dead" paper.  There are two arguments.  First, you can't keep enough threads close to the processor to cover the latency of communication, because the top of the memory hierarchy is small.  Second, even if you have smarter local scheduler that does its best to exploit locality (via TAM), performance is still not good enough.  They argue that you need a global scheduling strategy instead of the local strategy that the dataflow firing rule defines.  If you apply their analytic model to WaveScalar, it shows that performance is fine as long as we can localize communication pretty well and we can fit the working set of instructions into the array. <br>[[User:Swanson|swanson]] 16:46, 8 March 2006 (PST)
 +
 
*Dennis, Jack B. and David P. Misunas.  "A preliminary architecture for a basic dataflow processor."  In the Proceedings of the 2nd Annual Symposium on Computer Architecture (ISCA), 1975.  http://doi.acm.org/10.1145/642089.642111
 
*Dennis, Jack B. and David P. Misunas.  "A preliminary architecture for a basic dataflow processor."  In the Proceedings of the 2nd Annual Symposium on Computer Architecture (ISCA), 1975.  http://doi.acm.org/10.1145/642089.642111
 
: ''Description of static dataflow and an architecture for executing programs composed of static dataflow graphs.  This is a very early paper on the topic of dataflow architectures, if not the earliest.  The first few pages provide a reasonable introduction to the fundamentals of dataflow computation, though they are limited (I believe) to static dataflow models.  The architecture described foreshadows that of many future dataflow machines, and the instruction caching problem that became the matching table overflow problem of later machines is discussed (or at least mentioned).'' <br>[[User:Schwerin|Schwerin]] 13:25, 5 January 2006 (PST)
 
: ''Description of static dataflow and an architecture for executing programs composed of static dataflow graphs.  This is a very early paper on the topic of dataflow architectures, if not the earliest.  The first few pages provide a reasonable introduction to the fundamentals of dataflow computation, though they are limited (I believe) to static dataflow models.  The architecture described foreshadows that of many future dataflow machines, and the instruction caching problem that became the matching table overflow problem of later machines is discussed (or at least mentioned).'' <br>[[User:Schwerin|Schwerin]] 13:25, 5 January 2006 (PST)
Line 7: Line 16:
  
 
*Grafe, V.G., G.S. Davidson, J.E. Hoch, and V.P. Holmes.  "The Epsilon Dataflow Processor."  In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989.  http://doi.acm.org/10.1145/74925.74930
 
*Grafe, V.G., G.S. Davidson, J.E. Hoch, and V.P. Holmes.  "The Epsilon Dataflow Processor."  In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989.  http://doi.acm.org/10.1145/74925.74930
 +
:Eplison is a tagged token machine.  It has an indirect send-style instruction.  It uses a matching algorthim vagualy similar to WaveScalar's, although without the caching aspect.  Each processor in the machine has private tagged memory of finite size, so there was no recursion or dynamic loop unrolling.  A follow on was planned the esp'88.  it was a dynamic machine.  [[User:Swanson|swanson]] 14:31, 9 March 2006 (PST) 
  
Nikhil, R.S.  "Can dataflow subsume von Neumann computing?"  In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989.  http://doi.acm.org/10.1145/74925.74955
+
* Nikhil, R.S.  "Can dataflow subsume von Neumann computing?"  In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989.  http://doi.acm.org/10.1145/74925.74955
 
: ''This paper describes the P-RISC architecture, which adds dataflow features to RISC-style multiprocessors.  The features come from the addition of fine grain threading and synchronization operations at the ISA level.'' <br> [[User:Schwerin|Schwerin]] 14:19, 9 January 2006 (PST)
 
: ''This paper describes the P-RISC architecture, which adds dataflow features to RISC-style multiprocessors.  The features come from the addition of fine grain threading and synchronization operations at the ISA level.'' <br> [[User:Schwerin|Schwerin]] 14:19, 9 January 2006 (PST)
 +
 
 +
* T. Shimada, K. Hiraki, K. Nishida, S. Sekiguchi "Evaluation of a Prototype Data Flow Processor of the SIGMA-1 for Scientific Computations" Proceedings of the 13th annual international symposium on Computer architecture, 1983. http://doi.acm.org/10.1145/17407.17383
 +
: Describes the Sigma-1, an early dataflow machine from Japan.  They have a single processing element from a 128 element system.  The proposed system uses a hierarchical network to allow the PEs to communicate.  They evaluate its performance on a bunch of small loops and compare it to a single von Neumann core.  They measure the extra loop and control overhead that dataflow requires and add some special instructions to reduce it. <br> [[User:Swanson|swanson]] 17:27, 8 March 2006 (PST)
  
 
*Sakai, S., Y. Yamaguchi, K. Hiraki, Y. Kodama and T. Yuba.  "An architecture of a dataflow single chip processor."  In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989.  http://doi.acm.org/10.1145/74925.74931
 
*Sakai, S., Y. Yamaguchi, K. Hiraki, Y. Kodama and T. Yuba.  "An architecture of a dataflow single chip processor."  In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989.  http://doi.acm.org/10.1145/74925.74931
: ''Description of the EM-4, a follow-on to the Sigma-1.  EM-4 was to be a first dataflow multiprocessor in which every processing node occupied no more than a single chip.  Describes a ''strongly connected arc'' model, which is broadly similar to the Monsoon frame model and its ilk.  Discusses shortcomings of prior dataflow machines.  Attempts to address these in the new architecture.'' <br>[[User:Schwerin|Schwerin]] 13:45, 5 January 2006 (PST)
+
: Description of the EM-4, a follow-on to the Sigma-1.  EM-4 was to be a first dataflow multiprocessor in which every processing node occupied no more than a single chip.  Describes a ''strongly connected arc'' model, which is broadly similar to the Monsoon frame model and its ilk.  Another similarity to  Monsoon is a frame-based matching scheme implemented with normal memories.  Discusses shortcomings of prior dataflow machines.  Attempts to address these in the new architecture. <br>[[User:Schwerin|Schwerin]] 13:45, 5 January 2006 (PST)
  
 
*Papadopoulos, Gregory M., and Kenneth R. Traub.  "Multithreading: a revisionist view of dataflow architectures."  In the Proceedings of the 18th Annual International Symposium on Computer Architecture (ISCA), 1991.  http://doi.acm.org/10.1145/115952.115986
 
*Papadopoulos, Gregory M., and Kenneth R. Traub.  "Multithreading: a revisionist view of dataflow architectures."  In the Proceedings of the 18th Annual International Symposium on Computer Architecture (ISCA), 1991.  http://doi.acm.org/10.1145/115952.115986
: ''This paper looks at the Monsoon dataflow architecture as a multithreaded computer with fine-grain threads.'' <br>[[User:Schwerin|Schwerin]] 14:19, 9 January 2006 (PST)
+
: This paper looks at the Monsoon dataflow architecture as a multithreaded computer with fine-grain threads. <br>[[User:Schwerin|Schwerin]] 14:19, 9 January 2006 (PST)
 +
 
 +
: The discussion in the conclusion about the relationship between von Neumann multithreading and coarse-grain dataflow is pretty interesting. [[User:Swanson|swanson]] 12:25, 17 March 2006 (PST)
 +
 
 +
 
 +
* Gurd, J. R, C. C Kirkham, I. Watson "The Manchester prototype dataflow computer", Communications of the ACM, Volume 28,Issue 1.  http://doi.acm.org/10.1145/2465.2468
 +
: The Manchester machine is dynamic dataflow machine.  It uses a tag-matching scheme quite similar to WaveScalar's.  Tokens are stored in a hash table, and there is an overflow unit to hold extra token in case of conflicts in the hash-table. [[User:Swanson|swanson]] 15:09, 9 March 2006 (PST)
 +
 
 +
*Papadopoulos, Gregory M., David E. Culler, "Monsoon: an explicit token-store architecture" in Proceedings of the 17th annual international symposium on Computer Architecture table of contents, 1990. http://doi.acm.org/10.1145/325164.325117
 +
: The key contribution of Monsoon is the explicit token store (ETS).  A frame of memory is allocated for each function call.  It contains an entry (marked with a pressence bit) for each arc in the function's dataflow graph.  Match checking is a read to see if the other token is there.  If it is, there is a match, so fire the instruction.  Otherwise, there is no match, so just write the new token and set the bit. [[User:Swanson|swanson]] 22:02, 11 March 2006 (PST)
 +
 
 +
* V. G. Grafe and J. E. Hoch, "The Epsilon-2 hybrid dataflow architecture," in Proceedings of the 35 th IEEE Computer Society Conference. Feb 26--Mar 2, 1990. San Francisco, CA, pp. 88--93, 1990. http://ieeexplore.ieee.org/iel2/276/2320/00063658.pdf?arnumber=63658
 +
:  This is a pretty thin description of the Epsilon-2 dataflow machine.  The intro says it's a coarse-grain machine, although the description of architecture seems to suggest otherwise. [[User:Swanson|swanson]] 13:18, 17 March 2006 (PST)
 +
 
 +
*Davis, A. L. 1978. The architecture and system method of DDM1: A recursively structured Data Driven Machine. In Proceedings of the 5th Annual Symposium on Computer Architecture (April 03 - 05, 1978). ISCA '78. ACM Press, New York, NY, 210-215. DOI= http://doi.acm.org/10.1145/800094.803050
 +
: This paper describes the DDM1.  It is, as near as I can tell, a non-tagged dynamic dataflow machine.  Multiple inputs values can reside on each dataflow arc.  They are consumed in FIFO order.  The overall machine structure is hierarchical, with processing elements and memory at each level.  The processing elements at one level can be made up of other processing elements and more memory.  There is a poorly described process of dynamically decomposing dataflow programs to fit the structure of the machine.[[User:Swanson|swanson]] 13:55, 17 March 2006 (PST)
 +
 
  
 
== Refereed Journal Papers ==
 
== Refereed Journal Papers ==
*Arvind and Nikhil, V.C.  "Executing a program on the MIT tagged-token dataflow architecture."  In IEEE Transactions on Computers, March, 1990. http://doi.ieeecomputersociety.org/10.1109/12.48862
+
*Arvind and Nikhil, V.C.  "Executing a program on the MIT tagged-token dataflow architecture."  In IEEE Transactions on Computers, March, 1990. http://doi.ieeecomputersociety.org/10.1109/12.48862 http://www.cs.wisc.edu/~isca2005/ttda.pdf
 +
:  The TTDA processing element has a very similar structure to WaveScalar's.  They even remark that matching could be done using hashing, although they do not describe how to deal with overflow.  They also provide rudimentary tag-management instructions for function calls.  The machine is very similar to the Manchester machine. The paper provides a nice discussion of the pregression from the U-interpreter to TTDA to monsoon. [[User:Swanson|swanson]] 19:54, 11 March 2006 (PST)
 +
 
 +
== Theses ==
 +
 
 +
* Culler, D.E, "Resource Management for the Tagged Token Dataflow Architecture" Ph.D. Thesis, MIT, 1985. http://www.lcs.mit.edu/specpub.php?id=900
 +
:  Describes resource management issue in TTDA.  Also provides the best description I have found of the TTDA machine. [[User:Swanson|swanson]] 14:18, 16 March 2006 (PST)

Latest revision as of 21:55, 17 March 2006

Conference Papers

Lays out two "fundamental" requirements for good parallel processing: The ability to tolerate long-latency memory operations and fast synchronization. they critique a range of multiprocessors and describe their own. I think it's the TTDA.
swanson 16:46, 8 March 2006 (PST)
  • Nikhil, R. S., G. M. Papadopoulos, Arvind, "*T: a multithreaded massively parallel architecture" Proceedings of the 19th annual international symposium on Computer architecture table of contents, 1992. http://doi.acm.org/10.1145/139669.139715
This is a coars-grain dataflow machine. It's basically an implementation of TAM. A multi-threaded processor with fast synchronization support. Messages for memory access and communication carry continuations with them. The introduction would serve well as the beginnings of an outline of a survey of dataflow machines. swanson 19:12, 11 March 2006 (PST)
  • Culler, David E., Klaus Erik Schauser, Thorsten von Eicken "Two Fundamental Limits on Dataflow Multiprocessing", Proceedings of the IFIP Working Group 10.3 (Concurrent Systems) Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, Jan. 1993. http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-716.pdf
This is the "dataflow is dead" paper. There are two arguments. First, you can't keep enough threads close to the processor to cover the latency of communication, because the top of the memory hierarchy is small. Second, even if you have smarter local scheduler that does its best to exploit locality (via TAM), performance is still not good enough. They argue that you need a global scheduling strategy instead of the local strategy that the dataflow firing rule defines. If you apply their analytic model to WaveScalar, it shows that performance is fine as long as we can localize communication pretty well and we can fit the working set of instructions into the array.
swanson 16:46, 8 March 2006 (PST)
  • Dennis, Jack B. and David P. Misunas. "A preliminary architecture for a basic dataflow processor." In the Proceedings of the 2nd Annual Symposium on Computer Architecture (ISCA), 1975. http://doi.acm.org/10.1145/642089.642111
Description of static dataflow and an architecture for executing programs composed of static dataflow graphs. This is a very early paper on the topic of dataflow architectures, if not the earliest. The first few pages provide a reasonable introduction to the fundamentals of dataflow computation, though they are limited (I believe) to static dataflow models. The architecture described foreshadows that of many future dataflow machines, and the instruction caching problem that became the matching table overflow problem of later machines is discussed (or at least mentioned).
Schwerin 13:25, 5 January 2006 (PST)
  • Arvind and Kathail. "A multiple processor data flow machine that supports generalized procedures." In the Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA), 1981. ACM lists no DOI for this object, but it's in the digital library.
I believe this paper describes the architecture of the MIT tagged-token dataflow machine.
Schwerin 14:37, 9 January 2006 (PST)
  • Grafe, V.G., G.S. Davidson, J.E. Hoch, and V.P. Holmes. "The Epsilon Dataflow Processor." In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989. http://doi.acm.org/10.1145/74925.74930
Eplison is a tagged token machine. It has an indirect send-style instruction. It uses a matching algorthim vagualy similar to WaveScalar's, although without the caching aspect. Each processor in the machine has private tagged memory of finite size, so there was no recursion or dynamic loop unrolling. A follow on was planned the esp'88. it was a dynamic machine. swanson 14:31, 9 March 2006 (PST)
  • Nikhil, R.S. "Can dataflow subsume von Neumann computing?" In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989. http://doi.acm.org/10.1145/74925.74955
This paper describes the P-RISC architecture, which adds dataflow features to RISC-style multiprocessors. The features come from the addition of fine grain threading and synchronization operations at the ISA level.
Schwerin 14:19, 9 January 2006 (PST)
  • T. Shimada, K. Hiraki, K. Nishida, S. Sekiguchi "Evaluation of a Prototype Data Flow Processor of the SIGMA-1 for Scientific Computations" Proceedings of the 13th annual international symposium on Computer architecture, 1983. http://doi.acm.org/10.1145/17407.17383
Describes the Sigma-1, an early dataflow machine from Japan. They have a single processing element from a 128 element system. The proposed system uses a hierarchical network to allow the PEs to communicate. They evaluate its performance on a bunch of small loops and compare it to a single von Neumann core. They measure the extra loop and control overhead that dataflow requires and add some special instructions to reduce it.
swanson 17:27, 8 March 2006 (PST)
  • Sakai, S., Y. Yamaguchi, K. Hiraki, Y. Kodama and T. Yuba. "An architecture of a dataflow single chip processor." In the Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA), 1989. http://doi.acm.org/10.1145/74925.74931
Description of the EM-4, a follow-on to the Sigma-1. EM-4 was to be a first dataflow multiprocessor in which every processing node occupied no more than a single chip. Describes a strongly connected arc model, which is broadly similar to the Monsoon frame model and its ilk. Another similarity to Monsoon is a frame-based matching scheme implemented with normal memories. Discusses shortcomings of prior dataflow machines. Attempts to address these in the new architecture.
Schwerin 13:45, 5 January 2006 (PST)
  • Papadopoulos, Gregory M., and Kenneth R. Traub. "Multithreading: a revisionist view of dataflow architectures." In the Proceedings of the 18th Annual International Symposium on Computer Architecture (ISCA), 1991. http://doi.acm.org/10.1145/115952.115986
This paper looks at the Monsoon dataflow architecture as a multithreaded computer with fine-grain threads.
Schwerin 14:19, 9 January 2006 (PST)
The discussion in the conclusion about the relationship between von Neumann multithreading and coarse-grain dataflow is pretty interesting. swanson 12:25, 17 March 2006 (PST)


The Manchester machine is dynamic dataflow machine. It uses a tag-matching scheme quite similar to WaveScalar's. Tokens are stored in a hash table, and there is an overflow unit to hold extra token in case of conflicts in the hash-table. swanson 15:09, 9 March 2006 (PST)
  • Papadopoulos, Gregory M., David E. Culler, "Monsoon: an explicit token-store architecture" in Proceedings of the 17th annual international symposium on Computer Architecture table of contents, 1990. http://doi.acm.org/10.1145/325164.325117
The key contribution of Monsoon is the explicit token store (ETS). A frame of memory is allocated for each function call. It contains an entry (marked with a pressence bit) for each arc in the function's dataflow graph. Match checking is a read to see if the other token is there. If it is, there is a match, so fire the instruction. Otherwise, there is no match, so just write the new token and set the bit. swanson 22:02, 11 March 2006 (PST)
This is a pretty thin description of the Epsilon-2 dataflow machine. The intro says it's a coarse-grain machine, although the description of architecture seems to suggest otherwise. swanson 13:18, 17 March 2006 (PST)
  • Davis, A. L. 1978. The architecture and system method of DDM1: A recursively structured Data Driven Machine. In Proceedings of the 5th Annual Symposium on Computer Architecture (April 03 - 05, 1978). ISCA '78. ACM Press, New York, NY, 210-215. DOI= http://doi.acm.org/10.1145/800094.803050
This paper describes the DDM1. It is, as near as I can tell, a non-tagged dynamic dataflow machine. Multiple inputs values can reside on each dataflow arc. They are consumed in FIFO order. The overall machine structure is hierarchical, with processing elements and memory at each level. The processing elements at one level can be made up of other processing elements and more memory. There is a poorly described process of dynamically decomposing dataflow programs to fit the structure of the machine.swanson 13:55, 17 March 2006 (PST)


Refereed Journal Papers

The TTDA processing element has a very similar structure to WaveScalar's. They even remark that matching could be done using hashing, although they do not describe how to deal with overflow. They also provide rudimentary tag-management instructions for function calls. The machine is very similar to the Manchester machine. The paper provides a nice discussion of the pregression from the U-interpreter to TTDA to monsoon. swanson 19:54, 11 March 2006 (PST)

Theses

Describes resource management issue in TTDA. Also provides the best description I have found of the TTDA machine. swanson 14:18, 16 March 2006 (PST)