Tuesday, December 27, 2005

It can't be so hard...

Do you remember my last post about my javascript compressor?

Óscar Frías, from Trabber fame asked me for my opinion on the dojo javascript compressor, and my reply was along the lines of: "slightly better than mine, as it compress inner variables / functions names". As this seemed a trivial to overcome difference, I coded this compression in my yet unnamed compressor.

There are only two cases that I have not handled:

  1. If you have variables named _1, _2, ... things will break, quickly. This is a pretty trivial bug that I have to fix asap. I guess that dojo gets that right.
  2. If you have "with" statements, things may break.


I wonder if dojo handles that second point. Let me elaborate with an example.


function a()

{

function getElementById(a) { return "hi"; }

with (document) alert(getElementById("xxx"));

}

a();


This code should show an alert dialog box with "null". If we compress it, renaming inner variables / functions, we get:


function a()

{

function \_1(a) { return "hi"; }

with (document) alert(\_1("xxx"));

}

a();


(Note the compressor doesn't know that the getElementById inside the with statement was in fact document.getElementById, and thus it changes it to _1.)

This new code will show "hi" instead of "null".

Despite this corner case this new compression is worth enough to have.

But I didn't build a new compressor just to be a shadow of dojo's one. I want to build the best compressor ever, and such a compressor should fix what I consider a growing problem in the javascript community.

Javascript frameworks are growing in number / quality / size. Dojo, prototype / scriptaculous, rico. You name it!

Some frameworks are already weighting in the megabytes, and to make the download manageable to users, they split the framework in several files. The developper should then pick the javascript files with the code that he will finally use. But it's not working.

If I want to use a simple smooth blink effect with scriptaculous, I should bring the whole effects.js file and the whole prototype.js file, even when I'm obviously only using a little part of these files. Bigger pages very quickly start using all the scriptaculous files, even when they are "only" using an Ajax.Request here and a little effect there.

An approach to cut the bloat is to do manual surgery on these files to build a new page with the minimum needed to run, for instance, some effects (as moo.fx does). But this is time consumming, requires a careful human expert to cut down bloat and yet get something useful, and it should be done on a page per page basis. At the end, it boils down to copy & paste parts from your framework on separated file. But that is too much work to use a framework without putting an unaceptable burden in the shoulders of our users.

The approach that I have in mind is akin the previous one, but done automatically. A program parsers your pages, and it sees exactly what's the minimum javascript needed to do what you are doing. It then rewrites your page / javascript to use this minimum.

So, in short, my goal is to prune dead code of javascript files, as linkers do in other languages.

Unfortunately, due to the highly dynamic nature of javascript, this is impossible to do without building a full javascript engine. Remember that functions are also objects in javascript, and thus they can be freely copied in variables. When you do "a()", you have to evaluate "a" to know what is getting called.

At this point some part of my brain started to think "well, it's just a javascript engine, it can't be so hard..." and somewhere, a few days ago, it convinced the other part of my brain. So here I am, with a javascript engine that evaluates all the javascript statements but for's, switches, try / catches and functions, and all the javascript expressions but constructors, function calls, arrays and objects. You can call it the most overengineered calculator, ever. But I felt really great when I wrote 1 + 1, and it replied: 2.

The really big missing part in this engine is the ECMA object system, that I failed to implement correctly in my first try. I will still need a full day or two to finish it, and then I hope the remaining parts will just fall in place.

When I will finish the javascript engine, I will have to figure out how to do dead code prunning with it. I'm thinking of using a garbage collector like algorithm, but doing it without falling in an exponential explosion of cases seems not trivial. Well, I will cross that bridge when I will get at it, I guess.

Oh, btw, MERRY CHRISTMAS!!!