Chart uncertainties

From PublicWiki
Jump to: navigation, search

Head back to Chart_research for some perspective.

Here is a space for our goals, challengs, ideas, restrictions, thoughts, worries, and anything else.

  • Streams vs. single arguments?
  • How do you preserve state?
  • How do you accumulate transparently when a node's execution is split?

More Recent

We want to allow "simple things" to be wrapped up into more "complex things" by something that understands their simple interface and the interface of the more complex thing. So a function of a single value can be wrapped by some foreach operator to create a function on a collection. In this way we encapsulate more primitive operations, and though this cannot be done as a chart the result is itself a valid node. Statefulness comes from the wrapper.

Because the wrapper can be implemented arbitrarily it can be built to take advantage of networks. It can itself be built as a chart system, I believe. The implementation of map/reduce as wrappers is evident: ->map(x1)->reduce(x2)->, where the I/O types of x1 and of x2 must match as in map/reduce because these wrappers produce operators on collections.

The final purpose is to build a bridge from semantics to implementation, but without externally offering any large system. Instead we want to offer a small system (a chart runner) that can, in a sense, split itself into ever more detailed and system-specific chart runners as it follows its chart and its table.

Recent Work

A stream is an ordered collection [if order doesn't matter: ignore it, but it's there]. A collection is a single data element; even an empty stream can fulfil an input or output as marked on a chart. As a practical matter a stream may be cut from either end; error handling is implicit in the system as it exists, as will be seen.

State must exist during evaluation of any procedure; that is, while it is evaluating we regard it as a procedure and it has state, but outside of its scope it is regarded as a pure mathematical function. Within a chart there is state; outside of a chart, and looking in, there is no state. Streams are a suficient mechanism for the imposition of time onto the system: consider that the computer evaluates one function in its running, but that this function is on a collection and is implemented as a procedure on a stream; state endures for the length of the evaluation, which is the runtime of the computer.

In this model there is no illusion that a procedure evaluates itself: it is always evaluated from outside. Indeed it is always put together from outside, because until the chart and its realization are combined there is no procedure to execute (as will be seen). As a consequence the primary chart runner is the system, but it may run routines which themselves behave as chart runners (scoping rules prohibit them from seeing outside except as permitted; they cannot therefore distinguish whether they are primary). Such a routine manifestly needs a chart to run; this is the general form of chart nesting.

If I want to realize one node of a chart as a chart, this means I want to realize it as a chartrunner running a chart. Let's introduce a scale of constructs:

  • chartrunner (processor)
  • chartrunner + table (evaluator)
  • chartrunner + chart (function)
  • chartrunner + table + chart (procedure)
  • chartrunner + table + chart + partial input (closure)
  • chartrunner + table + chart + complete input (evaluation [yields output])

The table in all these cases is the information required by the chartrunner to realize the chart. That is, the chart may contain tokens: it might have ids at each node, or a block of code, or a filename: and the table is the means by which the chartrunner interprets the data stored in the chart. Notice that we construct procedures at a finer granularity than a (naive) applicative language. We have a complete bridge between the theory and the implementation.

LISP's eval is an example of our chartrunner+table; eval(x) is chartrunner+table(chart+complete input).

Now back to streams and state. A stream is an admission that because time and state are irrelevant to a function a procedure can exist in time and with state however it wants, so long as it implements the function. Consider a function that accepts a stream of integers and squares them all. If we have ->square->, this can run arbitrarily many times: but we have only one object coming in; we need a way to logically regroup the stream. A foreach operator would do this, but we don't want a foreach primitive: instead we make this transformation the duty of the table (viz., the evaluator). So a foreach operator exists at the level of the table and can invoke a new chartrunner to run a procedure: it looks like st<x>->foreach[chart]->st<y>, but the type for chart is x->chart->y.

Iteration, recursion, streaming, and so on should all be handled at the level of the table and the chartrunner: which means that, in sum, the top-level chartrunner handles all state. Splendid, and (although getting it to work the first time might be nasty) quite simple.

Object orientation can now be dealt with: an object can be described reasonably as a (stateless) function obj which represents the object's behavior in any state. The runner only needs to add a node that accepts a stream as input and produces the inputs for obj based on the elements of the stream. In this way the state (the stream consumer) is factored completely away from the object behavior (the function obj). The object dies when its input stream is gone. An output stream can be produced or the function's state at its death can be the output, as desired.

Now, programs can be passed as data anywhere that they can be executed: generally an evaluator (chartrunner+table) can evaluate any chart that its evaluation produces. So we have a highly controlled ability to produce and evaluate code, treated entirely as data.

It is the chartrunner's prerogative to give input to the procedures it evaluates. This means that it can, for instance, run one chart with one set of inputs, use this in a conditional to choose one of two tables to use on another chart, evaluate that, pass its output to another chart, and so on. This is the next notational adventure: to find a way to represent a chartrunner's behavior as a chart. When this is done, theory is ready for implementation.

I think this can handle everything and a little more, and in a nice orderly fashion, with modularity and some clever tricks I've never seen made available.

Deadlock

Because every necessary resource is known when a chart is first executed, and known in finer detail at the finer scales of recursive charts, deadlock is impossible. A node will not be executed if its needs are not met, and if it exceeds its stated needs it will (in the most likely implementation) be terminated. Failure is possible, deadlock is not.

Circuits do not need to be forbidden for this reason, but there is another: because we don't want nodes to "run" and "stop", as they would have to for a circuit ever to produce a result, we cannot have naive circuits. Circuits the properties and completion conditions of which are precisely known may be permissible. How will we represent this? It comes down to the same problem, I think, but is another approach.

Control Charts

What happens if we use charts, with their current limitations, to describe program flow instead of data flow? If each node executes after its parents and before its children -- and that's the only guarantee -- what do we see? It's intimately related to our larger problem.

To reduce a dataflow chart to a control chart, realize it with nodes that take unary input (no information is exchanged).

Control charts can work with global data, particularly if each node needs a guarantee of state before it runs and gives a guarantee of state after it runs. Now think of a program in a classic language, where not every line really depends on the lines before it, but will still be run in order. What if you marked off the dependencies of lines of code, and structured the program that way? It would be convenient, among other things, for debugging and efficient compilation. It might also offer interesting ways to do conditional expressions.

I think I might study these as a simpler case.

Network Charts

I once listened to a lecture where declaritive overlay networks were being constructed using SQL like query commands. This greatly allows for the simplification of creating overlay networks, or at least reducing the amount of code required per system as so much became abstrated away.

Might a chart be used in some way to describe how nodes interact and are connected?

Ideally we need to find exactly what operations on a Network_chart would be necessary to make this useful. Once we know that, we can apply that knowledge in studying more general computational charts: what operations on a network chart turn it into a computational chart?

Computation Charts

Charts can be used to declare areas of computation and the interactions/dependancies between them. This is the obvious applicaiton of charts, but is significant. It would allow for increased modularity of code, parallelism and diversity in code base. Predominantly the chart abstracts the computation further away from the computer it is being processed on.

See Charts and the Lambda Calculus for more.

Layering

Thus far we have described charts as residing above all other code, no matter how nested the charts themselves are. Could perhaps the charts themselves be used as programming level constructs? Similar to a function or an event handler a programmer could set up inputs or outputs to a chart.

This is, in general, a difficult problem. Consider the following line of arguments:

  • Most input systems to the computer require output to trigger. Certainly a node that is considered to be input can be fulfilled with a program that technically performs both input and output, but if we find a way to describe both aspects of this, it will be lovely. [incidentally: a program's output is needed, so its input is needed; its input needs to give an output, so it gives an output. This is a subset of the problems we already have. it's not all too bad.]
  • Charts themselves must be loaded into memory. Therefore the solution to the previous point can be applied to the loading of a chart. Simpler methods can be applied, too, ultimately.
  • In general, chart meta-programming is acceptable. We need to understand what properties can be known in such a case.

Cycles and Reversibility

It seems that running different programs on the reversal of a chart and applying those results to the forward execution of the same chart will allow the same functionality offered by graph cycles without the potential deadlocks.

That is, you can more accurately explain why you are trading data back and forth, and when those conditions are no longer met, you can pull out. It seems better to me.

This applies strongly to Network Charts and do the issue of metaprogramming ("Layering", above). After all, in both cases it takes output to get an input. The uni-directionality is broken -- but it is only partially broken, not as completely as by cycles. I don't think we need to represent relationships with cycles that are not full cycles.

Incompleteness

As a further question: might we actually gain by making the language selectively Turing incomplete? Obviously the language will be Turing incomplete if there are no deadlocks, because we can decide the halting problem ("every program will halt"). Every program halting is a Good Thing.

This is reality. No computer ever built will be truly Turing complete unless we drop one into the singularity at Universe End. No computer can ever know whether it can run a truly Turing complete program. So our results are much cleaner if we can select precisely what abilities we will lose.

I say, ban: running forever and infinite memory. These don't need to be arbitrarily bounded by the language but the language should guarantee that a bound exists. What do we do to ensure this?

(I recognize that, as each component is implemented separately, any may freeze up. But this is a bug, not a language feature.)

Object Orientation

A chart with some of its inputs set constant represents a simple form of object; more properly, this is a curried function.

A chart with some subset of its internal channels preserved, however, is an object. Pair a chart with constistent memory and an object results. We need a way to determine which data channels need not (or even may not) be overwritten.

Prior work

  • Sawzall
  • Cilk is simple, solid, and down-to-earth and there is likely to be some work we can draw from it. It gives us one more model for program interaction. It requires POSIX threads and the general form does not allow distributed memory. There is information about a distributed memory version, developed as part of Keith Randall's PhD thesis, at a page deep in the Cilk website.