الجمعة، 4 نوفمبر 2011

Almost all programming is metaprogramming

Any sufficiently large program involves metaprogramming, whether the program's author meant it or not. By so I mean it:
  • Creates a 'program' in some specialized encoding, or 'language'
  • Writes an interpreter for that language
  • During runtime, creates a new program in the newly created language and executes it.
Or in other words, a sufficiently advanced data structure is indistinguishable from code.

Let's think of this in terms of real-world (and some not so real-world) examples:

1- Think of implementing a word processor; you have a specialized 'language' to describe paragraphs, lines, formatting, and so on and one or more 'interpreters' to take the program and render it to the screen or print it. Sometimes the language is very real, like for example Postscript.

2- Also, parser combinators: You are essentially creating a program out of lambdas or objects. Consider the parsing primitives to be like instructions of a virtual machine, and the resulting parser as an AST that knits those instructions together. Running the parser is passing the instructions calls to the VM.

3- Similarly, the reason a Turing machine is so powerful is because it has an infinite tape that can be freely accessed. If you study TM programs you'd find a lot of them generating intermediate data on the tape and then traversing this intermediate data using fixed circuitry in its transition diagram; so in other words interpreting it.

All of this seems rather obvious, and somewhat too philosophical. What practical benefit do we get out of this? I think if we realize that our programming is mostly about creating and executing more specialized programs; we'd start thinking about tooling support for our specialized programs..

Whether we're programming with C, Java or your favorite functional language, you probably have debugging, refactoring, and other support for the first level program; the (e.g Java code itself) the higher level program, however, is neglected as mere 'data'. So you'd be working on a lower abstraction level.

There's ongoing research about tooling support for domain specific languages and making it easy to integrate them with the host language's debugger...etc. I suggest going to the next level: make it easy to treat any data structure as a DSL..

That would probably require the host language to take homoiconism very seriously: If it's an OOP language the program itself would be composed of object literals. ML-like languages would have their code be composed of calls to data constructors. Prolog-like languages would have the program be a set of program facts. And so on. Basically, if the tools in the IDE work on code they should work similarly on anything else.

After all this is done, imagine again working on our word processor: We could step over the rendering of each paragraph, then step into a given line to troubleshoot a bug. Tracing the running of a parser created from combinators would be on a rule-by-rule basis. The 'safe delete' refactoring could have a lot in common with the language's garbage collector. An error would make the IDE stop and show which piece of input caused the problem instead of the troubled code in the main program.

Or maybe I'm wrong. Maybe no one came with the idea because there's some obvious flaw in this reasoning. Or that the idea is vaguely defined and falls apart if studied in detail..I don't know!

ليست هناك تعليقات: