Monday, November 28, 2005

Javascript compressor

Let's see if you are a real hacker.

Your problem: a web page somewhat slow, with lots of javascript code.

You can:

  1. ignore the problem
  2. activate mod_deflate in the server for javascript code (be careful with old browsers!)
  3. use a javascript compressor to remove any extra spaces, new lines, comments, etc.
  4. take an existing javascript parser, and make it rewrite your javascript code, as above, without comments, spaces, etc. safely renaming internal variable names.
  5. download the ECMA standard, build a full javascript parser from scratch, and make it rewrite your javascript code as above. Extra points if you implement some extra size optimizations.

After a full week-end working on this thing, I had a parser "almost" working. Some more evenings and I had a compressor, but without renaming variables (yet).

Now I'm trying to finish the parser. As always, the latest 5% takes 90% of time. My parser is compliant except for:

  • Virtual semicolons
  • Regular expressions

We all know that in javascript you have to separate statements with semicolons, but you can ignore them in some cases. Among others, you can ignore them if put separate the current statement from the next one with at least a new line, and these two statements combined as one would raise an error. And actually somebody thought *that* thing would make javascript easier to understand (?!)

I have only modified my grammar so it is able to add virtual semicolons before '}' and before the end of file. These are the two most useful points where you can unambiguosly do not put the semicolon.

Regular expressions are also a bit hard to parse with a LALR(1) grammar. I'm thinking of matching a '/' or '/=' token for a primary expression, and then switching the lexer so that it can parse the regular expression and parse it, all in the action of these two tokens. (At least that is what Rhino does.)

If I fail, I will rewrite the parser to a LL(1) one. I will have the same problems, but this time the parser will be hand made, and thus I should be able to put inside these hacks as I need them.

The good news are that my code is fully parsed and written back correctly, except for the two regular expressions I use. I will then start working on most advanced compression features, not yet available anywhere else.

I will keep you posted!