jump to navigation

Fluorine did it, in the ASP.NET Temporary folder, using an InvalidCastException February 9, 2009

Posted by markegilbert in Visual Studio/.NET.
trackback

For two of my more recent projects, the Flash developers involved requested that we use FluorineFX to expose data and functionality as ActionScript Message Format (AMF) web services, allowing them to get data back from the service in strongly-typed ActionScript objects.  With the exception of how FluorineFX is implemented under the hood (as an HttpModule instead of an HttpHandler which I think would have made more sense), it turned out to be surprisingly easy to use.

Things seemed to be going well (as they normally do) until we deployed to production (again, as they normally do).  In production we noticed that the Flash units would sometimes jam, and you’d have to refresh them several times in a row to get them to show any data coming from the web services.  Now normally with a situation like this, I’d fire up Fiddler to watch the traffic between the server and the client.  In this case, however, Fiddler couldn’t help all that much because it doesn’t support AMF natively.  AMF is a binary format, you see, and while there is some sporadic talk on the internets about a plug-in for Fiddler to decrypt the AMF traffic, I have yet to find one.  What does work for this situation is another web proxy called Charles (which one of the Flash devs I was working with swears by).  With it, I could see the AMF traffic going back and forth, including one very interesting exception:

InvalidCastException: Unable to cast object of Type ‘X’ to ‘X’

Oh, ASPNET, why the heck are you trying to cast from X to X, and what’s the problem when you do?  After consulting with my pal Google, I found this post by Henning Krause.  While this was, in fact, the error I was seeing, I wasn’t trying to use Assembly.LoadFile or Assembly.LoadFrom.  I wasn’t even trying to cast anything (X-to-X or otherwise).  In the immortal words of the Falcon’s Captain – “It’s not my fault!”

After mulling it over for a bit, I theorized that while MY code wasn’t casting anything, perhaps Fluorine was.  After all, it had the task of translating an AMF stream to my objects and back again – perhaps IT was using LoadFile or LoadFrom behind the scenes.

I also thought about the strangeness of the issue itself – we never saw this in development, staging, or pre-production.  Why production?  And why does it seem to work sometimes and not others?  What was different?  Well, for one, the production server was not one server – it was actually a load-balanced pair of web servers.  Ok, if one server was having a bad day, and the other was all rainbows and sunshine, perhaps my repeated requests were hitting one server some of the time and the other the rest.  That would explain the intermittent-y nature of the issue.  After pointing my HOSTS file to each server and retesting the service, I found that one server would reliably work every time, and the other would reliably fail every time.

The first thing we tried was bouncing the App Pool for the site in question (we had fortuitously placed this particular site in its own App Pool, so this was the only site affected).  This didn’t seem to do the trick; the server was still throwing the Invalid Cast Exceptions.

I re-read Krause’s article and pulled in a colleague of mine to get another perspective.  One of the sentences in the post caught my attention: “The runtime does not actually run the assemblies from the path the IIS virtual directory points. Instead it copies all assemblies to the Temporary ASP.NET Files…”  Ok, let’s continue on this limb, and theorize that there is an out-of-date copy of my assembly (where type X was found) in the ASP.NET Temp folder.  Perhaps Fluorine is trying to get the class definition from the /bin copy and trying to instantiate the object from the Temporary ASP.NET copy (or vice versa), and perhaps it’s using LoadFile/LoadFrom to do it.

Well, we’ve made it this far out on the limb – we might as well take the final plunge.

If all of the above was correct, we’d expect to find an out-of-date copy of the assembly in the Temp folder on the bad server, and the correct version on the other one.  As it turns out, we found that the bad server had TWO copies of the assembly – the most recent plus an older one, while the other server had only the most recent.  We tried deleting the two copies of the assembly from the bad server’s Temp folder (as well as the accompanying assembly.info file in the directory), but the server complained that the files were in use.  We stopped the sites in IIS, tried the delete again (which worked), and restarted IIS for those sites.  When I hit the bad server again, we saw a single copy of the assembly get placed in the folder, and the service started working again.  Now that both nodes in the farm were working, I could hit the Flash units with impunity, and it would reliably return data.

 

In retrospect this kinda makes sense, but only kinda.  At least now we have a procedure for correcting the condition when it occurs, but I still don’t understand how the server got into the state in the first place, which means I don’t understand how to prevent this from happening again with future deployments.

Comments»

No comments yet — be the first.