I've tried twice to add program analysis to Kalimat, and each time I had trouble. Program analysis is collecting data about the program to use for IDE features like 'go to definition', autocomplete, 'find usages', and so on.
Basically this operation has 3 parts:
This was clearly not good, so I decided to use a real database: SQLite is a lightweight database engine that can be easily embedded in other programs, and also has the option to store data in memory instead of on disk. Perfect. In the second iteration the code for 'go to definition' looked something like this:
It might look complex but the idea is simple, what the query's trying to say is "Find the filename and position of the function definition that has the same name as the name under the cursor. When looking for a function definition, look in the current module or any module imported by the current module".
This works, but has a lot of drawbacks
1- The query isn't recursive. If the current file imports module A, the IDE will look there for definitions, but if we import 'A', and 'A' imports 'B', the IDE will not look there. We want all modules reachable from the current module to be found.
SQL isn't very friendly with recursion so I probably would've had to implement recursion myself with C++ code.
2- The bigger problem is that the SQL here is too complex. Sure it beats loops and hashtables but it's still big, and that's only for 'go to definition'! How about when I add public and private definitions? namespaces?
What I need is a reasoning system. Luckily there's one: Prolog.
Actually I don't need the full Prolog language; a subset of Prolog has been created to support complex queries (much more complex than SQL), this subset is called Datalog, and it's successfully used in program analysis in many research projects.
I set out to find a nice Datalog implementation for my aims, but didn't find something sufficient: there's an excellent library called BDDBDDB but it's in Java, and there's another one called simply 'Datalog' written in C and Lua that seemed not straightforward to integrate with Kalimat (nb: Kalimat is written in C++/Qt).
Then I looked at open source Prolog implementations like SWI or Ciao, and while very powerful and fast, they didn't seem easy to integrate with a standalone C++ app, especially an app like Kalimat that has to run on both Windows and Linux. I might be wrong but I felt it wasn't easy.
So what did I do? Why, created a new Prolog implementation of course! It's called SmallProlog (and nicknamed 小さい Prolog), specially designed for integration with C++/Qt.
Using Prolog from C++ has never been easier:
The expression [ ](QMap<QString,QVariant> sol){ ... } is a lambda expression, a new feature of C++. You define here an anonymous function that gets called whenever the Prolog engine finds a solution. The 'solution' will be a map from variable names to their values.
Under the hood the engine stores facts in a SQLite database, but this time the queries are generated for us from a higher level and more expressive language.
So how does 'go to definition' look now?
The second predicate, called visibleFrom, can take a Kalimat module and give us all the other modules reachable (recursively) from the first module, including itself.
It can also work in the opposite direction: given a module it can give us all modules that 'see' it. The power & elegance of Prolog at work.
The first predicate, called usagesOfProc, takes a module & a position of a function call, and gives us another module and position where the function is defined.
It can also work in the opposite direction: given a function definition, it can give us all files and positions where the function is called.
So when I implemented 'go to definition' I was also implementing 'find usages'; write a feature and get one for free! The power & elegance of Prolog at work.
Aside: I wish the makers of Prolog implementations marketed them as languages for more than industrial and vertical market type applications, and made them trivial to integrate with all types of other languages.
Also, to help myself with debugging, I made a Prolog console where I can query the Kalimat program database live, while the IDE is running:
It's an experiment, and it's still unfinished so I don't know if it'll work as I desire or not, but I'm very excited about it. Aren't you? :)
Basically this operation has 3 parts:
- Regularly parse the code in the editor windows (for speed most IDEs reparse only the changed parts of the code).
- Update a program database that holds information like what methods are defined in each class, where is some variable declared,...etc
- When you need to do some action like 'go to definition', query that database
This was clearly not good, so I decided to use a real database: SQLite is a lightweight database engine that can be easily embedded in other programs, and also has the option to store data in memory instead of on disk. Perfect. In the second iteration the code for 'go to definition' looked something like this:
QSqlQuery
q;
q = progdb.q("SELECT defining_token_pos, function_definitions.module_filename "
"FROM function_definitions JOIN definitions ON definitions.id=function_definitions.def_id AND "
"definitions.module_filename = function_definitions.module_filename WHERE (definitions.module_filename IN (SELECT imported FROM "
"module_imports WHERE importer=?) OR definitions.module_filename=?) AND definitions.name in (SELECT lexeme FROM tokens WHERE pos <=? "
"AND pos + length >=?)", filename, cursorPos, cursorPos, filename);
if(q.next())
{
docContainer->OpenOrSwitch(q.value(1).toString());
((MyEdit *)currentEditor())->;jumpToPos(q.value(0).toInt());
return;
}
It might look complex but the idea is simple, what the query's trying to say is "Find the filename and position of the function definition that has the same name as the name under the cursor. When looking for a function definition, look in the current module or any module imported by the current module".
This works, but has a lot of drawbacks
1- The query isn't recursive. If the current file imports module A, the IDE will look there for definitions, but if we import 'A', and 'A' imports 'B', the IDE will not look there. We want all modules reachable from the current module to be found.
SQL isn't very friendly with recursion so I probably would've had to implement recursion myself with C++ code.
2- The bigger problem is that the SQL here is too complex. Sure it beats loops and hashtables but it's still big, and that's only for 'go to definition'! How about when I add public and private definitions? namespaces?
What I need is a reasoning system. Luckily there's one: Prolog.
Actually I don't need the full Prolog language; a subset of Prolog has been created to support complex queries (much more complex than SQL), this subset is called Datalog, and it's successfully used in program analysis in many research projects.
I set out to find a nice Datalog implementation for my aims, but didn't find something sufficient: there's an excellent library called BDDBDDB but it's in Java, and there's another one called simply 'Datalog' written in C and Lua that seemed not straightforward to integrate with Kalimat (nb: Kalimat is written in C++/Qt).
Then I looked at open source Prolog implementations like SWI or Ciao, and while very powerful and fast, they didn't seem easy to integrate with a standalone C++ app, especially an app like Kalimat that has to run on both Windows and Linux. I might be wrong but I felt it wasn't easy.
So what did I do? Why, created a new Prolog implementation of course! It's called SmallProlog (and nicknamed 小さい Prolog), specially designed for integration with C++/Qt.
Using Prolog from C++ has never been easier:
PrologEngine engine;
engine.load(prologCode);
engine.call("predicate", arg1, arg2, [](QMap<QString, QVariant> sol){
// do stuff with each solution
});
The expression [ ](QMap<QString,QVariant> sol){ ... } is a lambda expression, a new feature of C++. You define here an anonymous function that gets called whenever the Prolog engine finds a solution. The 'solution' will be a map from variable names to their values.
Under the hood the engine stores facts in a SQLite database, but this time the queries are generated for us from a higher level and more expressive language.
So how does 'go to definition' look now?
usagesOfProc(DefModule,DefPos,UseModule,UsePos):-
visibleFrom
(UseModule,DefModule),
procInvokation(UseModule,Name,UsePos),
proc(DefModule,Name,DefPos,_,_,_)
.
visibleFrom(Mod1, Mod2):-
visibleFrom2(Mod1,Mod2,[Mod1]).
visibleFrom2(M1,M1,_).
visibleFrom2(M1,M2,Visited):-
imports(M1,Temp),
not(member(Temp,Visited)),
visibleFrom2(Temp, M2,[Temp|Visited]).
The second predicate, called visibleFrom, can take a Kalimat module and give us all the other modules reachable (recursively) from the first module, including itself.
It can also work in the opposite direction: given a module it can give us all modules that 'see' it. The power & elegance of Prolog at work.
The first predicate, called usagesOfProc, takes a module & a position of a function call, and gives us another module and position where the function is defined.
It can also work in the opposite direction: given a function definition, it can give us all files and positions where the function is called.
So when I implemented 'go to definition' I was also implementing 'find usages'; write a feature and get one for free! The power & elegance of Prolog at work.
Aside: I wish the makers of Prolog implementations marketed them as languages for more than industrial and vertical market type applications, and made them trivial to integrate with all types of other languages.
Also, to help myself with debugging, I made a Prolog console where I can query the Kalimat program database live, while the IDE is running:
It's an experiment, and it's still unfinished so I don't know if it'll work as I desire or not, but I'm very excited about it. Aren't you? :)