الاثنين، 11 مارس 2013

Kalimat: an educational programming language with a twist

I've been working on a language for teaching programming to children. One sentence summary: It looks like the child of QBasic and Google's Go, with influences from other languages.



The  syntax and programming environment are geared towards educational use, but the VM on which programs run (called SmallVM) has support for microthreads, CSP channels, and an FFI. The language itself has things like lambda expressions, a limited form of destructuring, and opt-in tail call elimination.

This article will be divided into two main parts, the first of which discusses Kalimat as an educational language for beginners, and the second shows some of its more advanced features, talks about the implementation, and mentions some ideas about taking the language forwards.

A little background

Kalimat started as an Arabic-based programming language, but I've decided to turn it into a bilingual one; with a compiler switch you can have Arabic or English-based syntax with a corresponding IDE and standard library. I'm doing what I can to make both languages equal citizens, but Arabic still has a higher precedence since the goal of Kalimat is to help the Arab-speaking world learn programming, and to provide a means of "computational empowerment" to non-programmers.

Kalimat was made bilingual because:
  • Often, the novelty of an Arabic-based language dominated the conversation when I wanted to talk about the design and features of the language itself. Having an English version would make it simpler to talk about the technical details.
  • I wanted to interact more with the global PL community, the majority of which are English speakers.
  • To help with scientific studies about teaching programming with PLs in native vs. non-native human language.
  • As a bonus, working with an English version of Kalimat showed me some verbosity in the original syntax which I hadn't noticed since I was too used to the original syntax I had designed.

The language can be downloaded from here, with pre-built English and Arabic versions for Windows. Non-Windows users can check out the source and build it with Qt Creator and the Qt library (version 4.8.x), but they would need to uncomment the following line in both of the files 'kalimat.pro' and 'smallvm.pro' to Enable English syntax/IDE:

#DEFINES+= ENGLISH_PL

Moving on....

Basic syntax

This is a word-count sample in Kalimat:

read "Enter some text:", str 
words = split (str, " ")
result = {}
for word in words:
    if not containsKey (result, word):
        result[word]= 1 
    else:
        result[word]= result[word]+ 1 
    done 
loop 

for pair in result:
    print key pair, " : ", value pair 
loop 

Looks kind of Pythonic, and the code is self-explanatory I think! We have the basics: Simple input and output, iteration, dictionaries (arrays are also supprted), and conditionals.

Here is another quick example:

proc lines (a):
    for i = 1 to a :
        drawLine(random (800), random(600))-(random(800), random(600)), random (16)
        wait (100)
    loop 
end 

lines (200)

In Kalimat a user can define a proc or a func, with the latter being able to return values to the caller (similar to the Sub/Function dichotomy in Microsoft's *-Basic languages). In my opinion this makes the language easier to teach since we can explain procs as 'pieces of code with a name', and worry later about passing parameters and using return values. Experimentation would tell if this was the right decision.

As seen from the drawLine statement, Kalimat has graphics commands that were taken as they are from 80s BASIC, we also have drawPixel, drawCircle, drawRect, drawImage and drawSprite.

We have classes and objects in Kalimat too:

class Point :
    has x, y 
    responds to draw ()
end 

response Point p to draw ():
    drawPixel (x p, y p)
end 

p = new Point having x = 120, y = 150 
p : draw ()

Like the proc/function division, an object can respond to a message sent to it, or it can reply with a value.

The syntax is a little verbose but I have a hypothesis that it would pay off when teaching OOP to beginners. In my brief period of teaching OOP (to college students) I have often found that the syntax of a language like C++ has too many implicit meanings (what 'this' means, what happens in a line like a.b(c)...etc) and that when the students see this syntax they need constant reminding of what it actually represents.

I've tried to make the stuff that happen behind the scene explicit in the syntax, things like 'I'm sending a request to the object, the object shall respond, the response has a reference to the actual object...). When they have fully grasped the semantics the learners can move on to more mainstream languages (or an additional, more streamlined syntax, could be added to Kalimat).

Combining graphics, OOP and Kalimat's sprite engine; we can remake the QBasic classic Gorilla example. The code for that example (and others) is bundled with Kalimat's distribution, or it can be seen here.

Finally, Kalimat has features to help specifically with teaching:
  • The 'Edit' menu has a 'Copy as HTML' feature to help post code in online tutorials and the like. The examples in this blog have been formatted that way (but some minor annoyances in spacing had to be manually corrected; I hope I can fix those annoyances soon).
  • The 'Program' menu has a feature called the monitor which provides an animated trace of a program, while showing the call stack in the 'Variables' tab at the bottom.
  • The 'Test' menu has the option of showing the SmallVM IL generated from source, which could be useful if someone wants to use Kalimat for teaching compilers. It's planned to later add features for visualizing basic blocks and control flow in methods, mainly to help me debug the VM; but could also be used in teaching compilers.
This concludes our brief tour of Kalimat as an educational language, now we move on to concurrency and other more advanced features.

Concurrent processes

Those five forms are the main ingedients for working with processes:

launch procInvokation ()
c = channel ()
send val to c 
receive var from c 
select :
 send 123 to c1 :
   -- statements
 or receive x from c2 :
   -- statements
done

Processes are scheduled internally by the VM and are independent from operating system processes or threads. The launch statement starts a process, the channel() function creates a new channel, send, receive and select statements enable communication between processes using channels as a data exchange/synchronization mechanism.

Both sending and receiving are synchronous (a sending operation blocks until another process is ready to receive from the channel and vice versa), and the select statement takes a set of alternative channel operations and applies the first operation that is ready to run, along with its associated sub-statements; if more than one operation were ready at the same time one operation is chosen randomly from among them.

(If you're familiar with Google Go, then all of this is old news to you).

Why were these features added? Again based on some hypothesis I have about teaching programming. The goals were:
  • To enable children to create games and animated scenes where multiple agents move and behave independently
  • To make GUI programming easier (all GUI widget events are represented by channels, you could model a sequence of GUI interactions with sequential statements: receive some event from channel1, then receive another from channel2...)
  • Since we seem to be entering an age of concurrent programming, it doesn't hurt to have a language where it's easier to teach the concepts of concurrency

I think it's interesting that channels support Kalimat's iteration protocol, so you can say

for someVar in myChannel:

..and have the system iteratively receive data from the channel until it's closed. This feature encompasses the capabilities of something like Python's generators (or yield return from C#) but I think it's more powerful than either of them; for example, from what I've seen it's somewhat cumbersome to use yield to recursively create an iterator for a tree in Python, however in Kalimat it's very natural:

for elem in enumerateTree (t):
    print elem 
loop 

proc enumerateTree (t):
    chan = channel ()
    launch traverseTree (t, chan)
    return chan 
end 

proc traverseTree (t, chan):
    traverse (left t)
    send data t to chan 
    traverse (right t)
end

What we did was a little bit of glue code, then a traditional tree traversal, sending whatever we visit to the channel.

Simple destructuring

We have syntax to test if some given data has a given structure and to extract data from that structure. In this example we'll be testing if a given value is an array, map, or object:

proc showName (x):
    if x ~ [?first, ?last]:
        print first, " ", last 
    else if x ~ {"firstName"=>?x, "lastName"=>?y}:
        print x, " ", y 
    else if x ~ Person having firstName = ?x, lastName = ?y :
        print x, " ", y 
    done 
end

Preceding a variable name by ? extracts the corresponding component into the variable, otherwise the component is tested for equality against the variable (using the same ?var twice destroys the first value, this feature is not really pattern matching/unification). Matching keys in a map cares only about the subset of keys being tested (i.e other key/value pairs could exist and the match would succeed), the same goes for testing fields in objects. Matching on arrays like this x ~ [a, b, c] tests against the exact array count, but x ~ [a, b, c, ...] can successfully match a larger array.

Tail call elimination

The statement delegate to someInvokation() can be used inside a procedure or a function in Kalimat to properly tail-call some routine. This feature was created in case an educator wanted to teach certain styles of programming, and to make the VM extensible (for example to make it possible to support continuations if so desired).

Parser generation

Kalimat has a built-in parser generation engine (this was made with the intention of using Kalimat to teach computer science and not just programming, and to enable the creation of natural language interfaces, text adventure games,...etc by kids).

For example writing this code

rules expression :
 
expression = term:a "+" expression:b => ["+", a, b]
or term:a "-" expression:b => ["-", a, b]
or term 

term = number:a => toNum (a)
or "(" expression:a ")" => a 

number = digit:a number : b => a + b 
or digit 

digit = from "0" to "9" 
 
end

...would provide a function called expression that takes a string and returns a parse tree for simple arithmetic expressions. The capability uses PEG notation with a memoizing packrat parser. Unfortunately the generated parsers can't currently handle left recursion (thus the above grammar actually has a subtle mistake because it treats subtraction as right-associative). I've read the various papers about handling left-recursion in PEG's, but I have yet to understand the techniques :)

A function-plotting sample application that demonstrates this feature is bundled with Kalimat, or could be seen here.

There are two interesting additions that could be made to the parsing engine:
1- Adding the ability to parse any enumerable stream of data, not just a string, and to pattern match against data. This would be like OMeta and its various derivatives.

2- Since channels are enumerable, this would make it possible to parse anything that could be sent to channels: imagine the ability to write grammars for GUI interaction models, sequences of packets from the network, routing web requests, and so on; basically enabling the creation of grammars for things for which we now create state machines.

If features #1 and #2 above were well-implemented I think it would give way to an exciting, highly declarative style of programming.

Implementation

Kalimat code runs on a custom-made VM called SmallVM. It's a simple stack-based machine that runs concurrent processes using an internal scheduler. It currently uses two threads: one for code execution and the other for GUI.

Because Qt does not allow GUI operations to run outside the event handling thread; the GUI thread also has an interpreter instance. A process that needs to do a UI operation migrates to the other interpreter, does its job, then migrate back to the main execution thread. There are plans to have multiple execution threads and to multiplex the execution of processes among them (like Erlang or Go).

An ambitious project would be to hook the VM scheduler to libuv and have the ability to schedule tasks Node.js style but without the callbacks, since the VM would suspend and resume processes automatically.

Performance wise, I could describe SmallVM as "somewhat satisfactory for its intended goals". On my machine, executing fib(30) takes about one second in Python but 3 seconds in Kalimat. On another dual core machine the time for Python is still one second, but for SmallVM it becomes about 1.4 seconds. I have yet to run more scientific benchmarks.

The VM has a simple FFI built using libffi; it supports calling C functions and having callbacks to Kalimat functions. The callback feature is currently still in development since it has a complicating issue: what happens if C code calls a Kalimat procedure, then that procedure waits on a channel until another process sends it something ?

To make the above case work invoking a callback has to run a secondary VM execution cycle, not just run the code of the callback procedure. Even running in multiple threads wouldn't completely solve the problem. The code to handle these cases is already in the VM, but not really tested and is expected to be buggy for now.

This applies to Kalimat in general: I think I might have some good ideas for designing a PL but I'm not so good at implementation. The language has its fair share of bugs and quirks, and now with the English based IDE the quirks have increased (expect to see the occasional untranslated error message or right-to-left dialog box for the time being). However I'm pushing on, and I hope the product keeps getting better over time.

Thanks for reading about my small programming language; I hope something from it got your interest!