Mark Gilbert's Blog

Science and technology, served light and fluffy.

Fluorine did it, in the ASP.NET Temporary folder, using an InvalidCastException

For two of my more recent projects, the Flash developers involved requested that we use FluorineFX to expose data and functionality as ActionScript Message Format (AMF) web services, allowing them to get data back from the service in strongly-typed ActionScript objects.  With the exception of how FluorineFX is implemented under the hood (as an HttpModule instead of an HttpHandler which I think would have made more sense), it turned out to be surprisingly easy to use.

Things seemed to be going well (as they normally do) until we deployed to production (again, as they normally do).  In production we noticed that the Flash units would sometimes jam, and you’d have to refresh them several times in a row to get them to show any data coming from the web services.  Now normally with a situation like this, I’d fire up Fiddler to watch the traffic between the server and the client.  In this case, however, Fiddler couldn’t help all that much because it doesn’t support AMF natively.  AMF is a binary format, you see, and while there is some sporadic talk on the internets about a plug-in for Fiddler to decrypt the AMF traffic, I have yet to find one.  What does work for this situation is another web proxy called Charles (which one of the Flash devs I was working with swears by).  With it, I could see the AMF traffic going back and forth, including one very interesting exception:

InvalidCastException: Unable to cast object of Type ‘X’ to ‘X’

Oh, ASPNET, why the heck are you trying to cast from X to X, and what’s the problem when you do?  After consulting with my pal Google, I found this post by Henning Krause.  While this was, in fact, the error I was seeing, I wasn’t trying to use Assembly.LoadFile or Assembly.LoadFrom.  I wasn’t even trying to cast anything (X-to-X or otherwise).  In the immortal words of the Falcon’s Captain – “It’s not my fault!”

After mulling it over for a bit, I theorized that while MY code wasn’t casting anything, perhaps Fluorine was.  After all, it had the task of translating an AMF stream to my objects and back again – perhaps IT was using LoadFile or LoadFrom behind the scenes.

I also thought about the strangeness of the issue itself – we never saw this in development, staging, or pre-production.  Why production?  And why does it seem to work sometimes and not others?  What was different?  Well, for one, the production server was not one server – it was actually a load-balanced pair of web servers.  Ok, if one server was having a bad day, and the other was all rainbows and sunshine, perhaps my repeated requests were hitting one server some of the time and the other the rest.  That would explain the intermittent-y nature of the issue.  After pointing my HOSTS file to each server and retesting the service, I found that one server would reliably work every time, and the other would reliably fail every time.

The first thing we tried was bouncing the App Pool for the site in question (we had fortuitously placed this particular site in its own App Pool, so this was the only site affected).  This didn’t seem to do the trick; the server was still throwing the Invalid Cast Exceptions.

I re-read Krause’s article and pulled in a colleague of mine to get another perspective.  One of the sentences in the post caught my attention: “The runtime does not actually run the assemblies from the path the IIS virtual directory points. Instead it copies all assemblies to the Temporary ASP.NET Files…”  Ok, let’s continue on this limb, and theorize that there is an out-of-date copy of my assembly (where type X was found) in the ASP.NET Temp folder.  Perhaps Fluorine is trying to get the class definition from the /bin copy and trying to instantiate the object from the Temporary ASP.NET copy (or vice versa), and perhaps it’s using LoadFile/LoadFrom to do it.

Well, we’ve made it this far out on the limb – we might as well take the final plunge.

If all of the above was correct, we’d expect to find an out-of-date copy of the assembly in the Temp folder on the bad server, and the correct version on the other one.  As it turns out, we found that the bad server had TWO copies of the assembly – the most recent plus an older one, while the other server had only the most recent.  We tried deleting the two copies of the assembly from the bad server’s Temp folder (as well as the accompanying assembly.info file in the directory), but the server complained that the files were in use.  We stopped the sites in IIS, tried the delete again (which worked), and restarted IIS for those sites.  When I hit the bad server again, we saw a single copy of the assembly get placed in the folder, and the service started working again.  Now that both nodes in the farm were working, I could hit the Flash units with impunity, and it would reliably return data.

 

In retrospect this kinda makes sense, but only kinda.  At least now we have a procedure for correcting the condition when it occurs, but I still don’t understand how the server got into the state in the first place, which means I don’t understand how to prevent this from happening again with future deployments.

Advertisements

February 9, 2009 - Posted by | Visual Studio/.NET

3 Comments

  1. I believe I’ve fixed this in the source. If you’d like to try it make the following changes and re-compile.

    — TypeHelper.cs —
    ** Starting near line 134:
    static public Type Locate(string typeName)
    {
    if( string.IsNullOrEmpty(typeName) )
    return null;

    Type type = Type.GetType(typeName, false, true);
    if (type != null) return type;

    Assembly[] assemblies = GetAssemblies();// AppDomain.CurrentDomain.GetAssemblies();
    for (int i = 0; i < assemblies.Length; i++)
    {
    Assembly assembly = assemblies[i];
    type = assembly.GetType(typeName, false);
    if (type != null) return type;
    }
    return null;
    }

    —- Messaging/DotNetFacotryInstance.cs —-
    ** starting near line 84:

    public override Type GetInstanceClass()
    {
    if (_cachedType == null)
    _cachedType = ObjectFactory.Locate(this.Source);

    return _cachedType;
    }

    Comment by Kelly | December 15, 2009

  2. Kelly, thanks for the post. Just so I’m clear, what changed? Was it just the GetAssemblies() line, and adding the GetInstanceClass override?

    Comment by markegilbert | December 15, 2009

  3. The GetInstanceClass() method had a bug in the original .15 code base. It called ObjectFactory.LocateInLac(this.Source) when the Type was not cached. This always used a LoadFrom on the local assembly folder with no regard for Assemblies or Types already loaded into he AppDomain.

    My correction, and what i believe the original author’s intent was, was to make it call ObjectFactory.Locate(this.Source). This first attempts to find the type in already loaded Assemblies in the current AppDomain (i.e. the main asp.net web dll) before reverting to LocateInLac() internally if the type is not already in RAM.

    Like i said, i believe this was always the author’s intent and this was a simple mistake when typing the code.

    My change to Locate() in TypeHelper.cs is a very minor optimization but is not really responsible for any fix.

    Also, I’m running with this modification and have not seen any problem but I can’t guarantee any results, only that is seems to work for me.

    Comment by Kelly | December 15, 2009


Sorry, the comment form is closed at this time.

%d bloggers like this: