Monday, December 18, 2006

VB.NET vs. C#: An example of the difficulty of micro-optimization

Is VB.NET slower than C#? Shawn Weisfeld wrote a little rant of frustration on the difference in speed between C# and VB.NET.  Personally, I don't use VB.NET, as I dislike it's slightly more verbose syntax and the fact that it has compiler options that might break your code, which makes copy-pasting more complex.  I like to be able to read code and not have to worry to much about yet another setting ;-).  Of course habit is a biggy - and I came from C++ and Java so naturally C# seems easier to learn, even if the difference is mostly just minor aesthetic details.

The micro-benchmark he used was essentially a number of integer calculations in a for loop.  C# is roughly 3 times as fast, even though both programs decompile to (almost) the same C# code.  The result of that calculation is printed, so the compiler can't just throw away the entire loop in a benchmark-defeating bout of optimization.  Here's his C# code:

for (long num7 = 0; num7 < 0x5f5e100; num7++) {
num5 = (num2 + (num3 * num4)) + num5;
num6 = ((((num4 * num3) * 12) + num2) + (num2 + (num3 * num4))) + num6;
}

All variables are initialized to zero, and are 64-bit signed integers.  His post contains the full compilable code.  I've never tried VB.NET yet, but heck, virtually the same program, and this weird difference?  I copied his code, fired up C# and then VB.NET for the first time, and lo and behold: C# needs 3 seconds and VB.NET 9.


So my first thought is "Darn, my machine is slower than his" :-). Then: There are a number of odd things about his benchmark...



  • variables are initialized to zero, and remain zero - might the compiler be smarter than we think and have figured that out?

  • Using 64-bit integers is uncommon, and they are slower (even on 64-bit machines)

  • I'm running this from inside Visual Studio by pressing play - which despite "release" mode still connects a debugger which impacts certain types of code heavily.

Running without Visual Studio bears no improvement.  Changing the initial conditions makes no difference.  Changing to 32-bit integers  speeds things up by a factor 2 - but doesn't change VB's relative slowness.  Funnily enough, running in 32-bit more and outside of Visual Studio speeds things up a further factor 2, but it still doesn't solve our paradox, especially as C# speeds up much more now!  32-bit C# outside of visual studio takes a mere 150ms as opposed to 3000ms using 64-bit integers inside visual studio.


Finally the key: while playing around, VB ended up throwing an integer overflow exception, where C# didn't!  VB.NET is checking integer overflow and that is costly.  Disabling overflow checking lets VB.NET run just as quickly as C#.


I think an improvement from 9000ms to 150ms ain't bad at all.  What do you think?

Wednesday, December 06, 2006

Why XHTML still serves a purpose

The W3C is a source of irritation to many web-developers.  It seems to produce heinously complex specification solving problems nobody has.  It's tackling issues such as the semantic web, when much "simpler" issues such as, say, the syntactic web are completely unresolved.

Evan Goer is cynical about XHTML's usefulness.  XHTML adds the ability to embed other formats such as MathML and SVG, and that "sole" advantage can be mimicked with the help of some JavaScript, allowing SVG and MathML embeds in plain HTML4.

XHTML does add a whole lot more than that, however.  How many developers nowadays actually write all of their HTML themselves?  Anyone?  Almost every site on the planet uses some form of HTML generation.  And it so happens that XHTML is much more suited to that task than HTML, for a number of reasons.

Somebody said that a prime advantage of XHTML is it's integration into the XML food chain.  This is a huge advantage.  The project I'm currently working on generates all output via XSLT transformations - which handily guarantees a certain minimum of well-formed-ness.  When I perform screenscraping, the first step is always to run TIDY over the input - that allows me to choose any number of parsing techniques in stead of regular expressions to extract the information I'm looking for.

The real killer advantage is more fundamental though, and this is an advantage that HTML is not likely to ever achieve:  Security.  The threat of injection is looming large, and html is a target that is difficult to secure.  Early today another 0day XSS exploit on myspace was discovered.  It relies on a particularity of Firefox's HTML parser.

Invalid HTML abounds on the web, and every user agent interprets it differently.  Producing only well-formed XML significantly reduces your attack surface.  MySpace would not have been vulnerable had it actually parsed the HTML and regenerated it.  And while it's parsed anyway, why not use something like XSLT to filter out more complex issues?  Allowing structured user input without filtering it is a security impossibility, but filtering it correctly without parsing it is a technical impossibility.

And if you use XSLT to produce this safe version of your output you get the bonus that while working entirely in XHTML in your stylesheets, you can trivially choose to encode the result as HTML for compatibility reasons.

Essentially, if you're making a dynamic web site which display's user-provided input, and you're not generating HTML structurally (with a real serializer) you're asking for unnecessary trouble.  XHTML won't solve everything, but it's not useless: actually, you' ld be crazy not to use it wherever you can ;-).

Friday, December 01, 2006

Google Reader's Shared Items

I use Google Reader. It's not as fully-featured or as fast as some local clients, but it works fine and is available everywhere (which is of course, a great boon). It does, however, have one feature which is probably solely the domain of online RSS readers: shared items. Many weblogs consist largely of "me-too" content: posts which contain little beyond a link to another post which caught the authors fancy. Unfortunately, that results in a sort of RSS-spam. I'm certainly not interested in trivial references, just give me the real stuff!

So, here are my shared feeds. The page also contains an RSS feed, so you can consume its goodness intravenously. It was a specific post, something that's an absolute must-have for all web-developers, which finally convinced me to start sharing. ;-)