الخميس، 30 مايو 2013

A Prolog for my Kalimat

I've tried twice to add program analysis to Kalimat, and each time I had trouble. Program analysis is collecting data about the program to use for IDE features like 'go to definition', autocomplete, 'find usages', and so on.

Basically this operation has 3 parts:
  • Regularly parse the code in the editor windows (for speed most IDEs reparse only the changed parts of the code).
  • Update a program database that holds information like what methods are defined in each class, where is some variable declared,...etc
  • When you need to do some action like 'go to definition', query that database
Let's ignore part #1 and focus on updating and querying the program database. The first time I implemented it the solution was very ad-hoc: I stored things in hashtables, queried them with loops, and the code was very tedious, repetitive, and error-prone.

This was clearly not good, so I decided to use a real database: SQLite is a lightweight database engine that can be easily embedded in other programs, and also has the option to store data in memory instead of on disk. Perfect. In the second iteration the code for 'go to definition' looked something like this:

QSqlQuery q;
q = progdb.q("SELECT defining_token_pos, function_definitions.module_filename "
   "FROM function_definitions JOIN definitions ON definitions.id=function_definitions.def_id AND "
   "definitions.module_filename = function_definitions.module_filename WHERE (definitions.module_filename IN (SELECT imported FROM "
   "module_imports WHERE importer=?) OR definitions.module_filename=?) AND definitions.name in (SELECT lexeme FROM tokens WHERE pos <=? "
   "AND pos + length >=?)", filename, cursorPos, cursorPos, filename);
if(q.next()) { 
   docContainer->OpenOrSwitch(q.value(1).toString());
   ((MyEdit *)currentEditor())->;jumpToPos(q.value(0).toInt()); return;
}

It might look complex but the idea is simple, what the query's trying to say is "Find the filename and position of the function definition that has the same name as the name under the cursor. When looking for a function definition, look in the current module or any module imported by the current module".

This works, but has a lot of drawbacks

1- The query isn't recursive. If the current file imports module A, the IDE will look there for definitions, but if we import 'A', and 'A' imports 'B', the IDE will not look there. We want all modules reachable from the current module to be found.

SQL isn't very friendly with recursion so I probably would've had to implement recursion myself with C++ code.

2- The bigger problem is that the SQL here is too complex. Sure it beats loops and hashtables but it's still big, and that's only for 'go to definition'! How about when I add public and private definitions? namespaces?

What I need is a reasoning system. Luckily there's one: Prolog.

Actually I don't need the full Prolog language; a subset of Prolog has been created to support complex queries (much more complex than SQL), this subset is called Datalog, and it's successfully used in program analysis in many research projects.

I set out to find a nice Datalog implementation for my aims, but didn't find something sufficient: there's an excellent library called BDDBDDB but it's in Java, and there's another one called simply 'Datalog' written in C and Lua that seemed not straightforward to integrate with Kalimat (nb: Kalimat is written in C++/Qt).

Then I looked at open source Prolog implementations like SWI or Ciao, and while very powerful and fast, they didn't seem easy to integrate with a standalone C++ app, especially an app like Kalimat that has to run on both Windows and Linux. I might be wrong but I felt it wasn't easy.

So what did I do? Why, created a new Prolog implementation of course! It's called SmallProlog (and nicknamed 小さい Prolog), specially designed for integration with C++/Qt.

Using Prolog from C++ has never been easier:

PrologEngine engine;
engine.load(prologCode);
engine.call("predicate", arg1, arg2, [](QMap<QString, QVariant> sol){ 
 // do stuff with each solution
});

The expression [ ](QMap<QString,QVariant> sol){ ... } is a lambda expression, a new feature of C++. You define here an anonymous function that gets called whenever the Prolog engine finds a solution. The 'solution' will be a map from variable names to their values.

Under the hood the engine stores facts in a SQLite database, but this time the queries are generated for us from a higher level and more expressive language.

So how does 'go to definition' look now?

usagesOfProc(DefModule,DefPos,UseModule,UsePos):-
   
visibleFrom(UseModule,DefModule),
    procInvokation(UseModule,Name,UsePos),
    proc(DefModule,Name,DefPos,_,_,_)
    .
  
visibleFrom(Mod1, Mod2):-
    visibleFrom2(Mod1,Mod2,[Mod1]).
visibleFrom2(M1,M1,_).
visibleFrom2(M1,M2,Visited):-
    imports(M1,Temp),
    not(member(Temp,Visited)),
    visibleFrom2(Temp, M2,[Temp|Visited]).

The second predicate, called visibleFrom, can take a Kalimat module and give us all the other modules reachable (recursively) from the first module, including itself.

It can also work in the opposite direction: given a module it can give us all modules that 'see' it. The power & elegance of Prolog at work.

The first predicate, called usagesOfProc, takes a module & a position of a function call, and gives us another module and position where the function is defined.

It can also work in the opposite direction: given a function definition, it can give us all files and positions where the function is called.

So when I implemented 'go to definition' I was also implementing 'find usages'; write a feature and get one for free! The power & elegance of Prolog at work.

Aside: I wish the makers of Prolog implementations marketed them as languages for more than industrial and vertical market type applications, and made them trivial to integrate with all types of other languages.

Also, to help myself with debugging, I made a Prolog console where I can query the Kalimat program database live, while the IDE is running:



It's an experiment, and it's still unfinished so I don't know if it'll work as I desire or not, but I'm very excited about it. Aren't you? :)

الأربعاء، 15 مايو 2013

g4c: Common problems and frequently asked questions

Can I use normal C++ libraries in my g4c application?

Yes, you can use code from C++ libraries, like files, rand( ), strings,... except any code related to the console (cin, cout, getch( ),.....

My program behaves erroneously if I call wait( ) with a large number, like wait(2000) or wait(5000)

There is a problem with the wait function; to wait for a long time you can wait for a short time many times; for example instead of calling wait(2000) you can call wait(50) forty times. Tip: you can create a function to do that for you.

س: ازاي اتغلب على مشكلة الحد الأقصى بتاع 30 sprite؟

ج1: استخدم ذكاءك؛ كل الألعاب القديمة (غزاة الفضاء، باكمان، ماريو...) احتاجت لعدد قليل من الsprites. لاحظ ان رقم 30 هو حد أقصى للأشكال الظاهرة على الشاشة في لحظة معينة
ج2: لو عايز ترسم صورة من غير ما تحركها، ممكن تستخدم copy_sprite_image بدلاً من put_sprite، ودي مالهاش حد أقصى (لكن ما تسمحش بتحريك الصورة اللي رسمتها أو التصادم)

How to clear the screen?

  • Use fill_rect to draw over normal graphics
  • To hide sprites, use hide_sprite (give any numbers as the x, y parameters; they're not important)


 س: (سؤال متقدم) اللعبة بتاعتي بتهنج لو شغلتها من mouse_proc

ج: شغلها من main، لأنه لو الmouse_proc قعدت تشتغل بلا توقف حتعطل الـmessage loop بتاع البرنامج والمستخدم مش حيقدر يعمل فيه أي حاجة ثانية س: بس أنا عايز اشغل اللعبة لما المستخدم يدوس بالماوس على مكان معين ج: لو عايز main تفضل منتظرة حدث معين اعمل متغير بـfalse، خلي الmain شغالة في loop مادام المتغير ده لسة خطأ، وخليه true في الmouse_proc لما تحب اللعبة تشتغل.

ملاحظة: لو ما مريتش بالموقف اللي السؤال ده بيتكلم عنه، تجاهل السؤال والإجابة.

How can I convert a string to a char* in order to use it with text_out?

Learn from this example:

string s = "Hello";
char *c = (char *) s.c_str();
text_out(c, 20, 20);

How can I convert a char * (that I got by using the input( ) function) to a string?

Learn from this example:

char *c = input("Enter your name:", 30, 30);
string s = c;

ملاحظة: لو عايز تقرأ أكثر من قيمة نصية باستخدام دالة input وتحافظ عليهم، لازم تعمل نسخة من كل حاجة تقرأها قبل ما تقرأ اللي بعدها، لأن دالة input دايماً بتقرأ في نفس المكان في الذاكرة، وبالتالي لو ما عملتش نسخ حتلاقي ان كل حاجة قرأتها بقى لها نفس القيمة.

لو قرأت القيمة وحولتها لرقم فكدة تبقى حافظت عليها، لأن الرقم سوف يخزن في مكان مختلف في الذاكرة
لو قرأت القيمة في صورة char * وحولتها لـstring، تبقى حافظت عليها لأن الـstring حيعمل لنفسه نسخة
لو قرأتها char * ومش عايز تحولها لـstring، استخدم strcpy كما في المثال الآتي:

char *c = input("Enter your name:", 30, 30);
char s2[100]; // make sure it's of sufficient size
strcpy(s2, c);
// now use s2

للأسف طريقة انك تعمل array of char بحجم ثابت كبير عشان تشيل فيه نص غير معروف حجمه هي طريقة سيئة، كان ممكن الـg4c تخبرك بعدد الchar التي قد تم قراءتها وتعمل انت dynamic allocation بالحجم ده. يمكن أحل المشكلة دي في إصدارة قادمة.