Mark Gilbert's Blog

Science and technology, served light and fluffy.

Target-Tracking with the Kinect, Part 3 – Target Tracking Improved, and Speech Recognition

In Part 1 of this series, I went through the prerequisites for getting the Kinect/Foam-Missile Launcher mashup running.  In Part 2, I walked through the core logic for turning the Kinect into a target-tracking system, but I ended it talking about some major performance issues.  In particular, commands to the launcher would block updates to the UI, which meant the video and depth feeds were very jerky. 

In this third and final part of the series, I’ll show you the multi-threading scheme that solved this problem.  I’ll also show you the speech recognition components that allowed the target to say the word "Fire" to actually get a missile to launch. 

What did you say?

We had tried to implement the speech recognition feature by following the "Audio Fundamentals" tutorial.  That code looked like it SHOULD work, but there a couple of differences between the tutorial app and ours: the tutorial’s was a C# console application, while ours was a VB WPF application.  As it turns out, those two differences made ALL the difference.

For the demo, Dan (the host) mentions the need for the MTAThread() attribute on the Main() routine in his console app.  Since our solution up to this point was VB, it looked like we would need this.  I tried adding that to every place that didn’t generate a compile error, but nothing worked – the application kept throwing this exception when it fired up:

Unable to cast COM object of type ‘System.__ComObject’ to interface type ‘Microsoft.Research.Kinect.Audio.IMediaObject’. This operation failed because the QueryInterface call on the COM component for the interface with IID ‘{D8AD0F58-5494-4102-97C5-EC798E59BCF4}’ failed due to the following error: No such interface supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE)).

Stack Trace:
       at System.StubHelpers.StubHelpers.GetCOMIPFromRCW(Object objSrc, IntPtr pCPCMD, Boolean& pfNeedsRelease)
       at Microsoft.Research.Kinect.Audio.IMediaObject.ProcessOutput(Int32 dwFlags, Int32 cOutputBufferCount, DMO_OUTPUT_DATA_BUFFER[] pOutputBuffers, Int32& pdwStatus)
       at Microsoft.Research.Kinect.Audio.KinectAudioStream.RunCapture(Object notused)
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart(Object obj)

I decided to try a different tack.  I wrote a C# console app, and copied all of Dan’s code into it (removing the Using statements and initializing the variables manually to avoid scoping issues).  That worked right out of the gate.  Since we were very short on time (this was two days from the demo at this point) I decided to port our application to C#, then incorporated the speech recognition pieces.

First, the "setup" logic was wrapped into a method called "ConfigureAudioRecognition" (I pretty much copied this right from the tutorial).  That method was invoked in the Main window’s Loaded event, on its own thread.  In addition to initializing the objects and defining the one-word grammar ("Fire"), this adds an event handler for the recognizer engine’s SpeechRecognized event:

private void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    if (this._Launcher != null && 
        this._IsAutoTrackingEngaged &&
        e.Result.Confidence > 0.95) { this.FireCannon(); }

The command to launch a missile is only given if the Launcher object is defined, the app is in "auto-track" mode, and the confidence level of the recognition engine is greater than 95%.  This last check is an amusing one.  Before I included this check, I would read a sentence that happened to contain some word with the letter "f", like "if", and the missile would launch.  Inspecting the Confidence property, I found that this only had a value in the 20-30% range.  When I said "Fire", this value as 96-98%.  The confidence check helps tremendously, but it’s still not perfect.  Words like "fine" can fool it.  It’s much better than having it fire with every "f", though.

Take a number

Doug, Joshua, and I discussed some solutions to the UI updates earlier in the week, and the most promising one looked like using BackgroundWorker (BW) to send a command to the launcher asynchronously.  That was relatively easy to drop into the solution, but I almost immediately hit another problem.  The launcher was getting commands sent to it much more frequently than my single BW could handle it, and I started getting runtime exceptions to the effect of "process is busy, go away".  I found an IsBusy property on the process that I could check to see if it had returned yet, but that meant that I would have to wait for it to come back before I could send it another command – basically the original blocking issue, but one step removed.

I briefly toyed with the idea of spawning a new thread with every command, but because they were all asynchronous there was no way to guarantee that they would be completed in the order I generated them in.  Left-left-fire-right looks a lot different than fire-right-left-left.  What I really needed was a way to stack up the requests, and force them to be executed synchronously.  What I found was an unbelievably perfect solution from Matt Valerio with his post titled "A Queued BackgroundWorker Using Generic Delegates".  As the title suggests, he wrote a class called “QueuedBackgroundWorker” that would add another BW to a queue, and then pop them off and process them in order.  This was EXACTLY what I needed.  This was also the most mind-blowing use of lambda expressions I’ve ever seen: you pass entire functions to run as the elements on the queue which get executed when that element is popped off the queue.

I added a small class called "CannonVector" that would roll up a direction (up, down, left, or right) and a number of steps.  Then, I created two methods – FireCannon() and MoveCannon() that would now wrap my calls to the launcher methods that Matt Ellis wrote (see Part 2 of this series):

private void FireCannon()
        new CannonVector
            DirectionRequested = CannonDirection.Down,
            StepsRequested = 0
        args =>
            return (CannonVector)args.Argument;
        args => { }

private void MoveCannon(CannonDirection NewDirection, int Steps)
        new CannonVector
            DirectionRequested = NewDirection,
            StepsRequested = Steps
        args =>
            CannonVector MyCannonVector;
            MyCannonVector = (CannonVector)args.Argument;

            switch (MyCannonVector.DirectionRequested)
                case CannonDirection.Left:
                case CannonDirection.Right:
                case CannonDirection.Up:
                case CannonDirection.Down:
            return new CannonVector
                DirectionRequested = MyCannonVector.DirectionRequested,
                StepsRequested = MyCannonVector.StepsRequested
        args => { }

Cool, huh?

With this in place, everything was smooth again – launcher movement and UI updates, alike.

And there was much rejoicing.

So there you have it.  Full source code for this solution can be found in the "" archive here:  Happy hunting!

September 10, 2011 Posted by | Microsoft Kinect, Visual Studio/.NET | 2 Comments

Target-Tracking with the Kinect, Part 2 – Target Tracking

In Part 1 of this series I laid out the prerequisites.  Now we’ll get into how to turn the Kinect into a tracking system for the cannon.

Manual Targeting

As I mentioned in Part 1, one of the pieces to this puzzle was already written for us – a .NET layer around the launcher.  This layer was provided by Chris Smith in his Being an Evil Genius with F# and .NET post.  He links to this source code at the very end of the post, and included several projects.  We ended up using the RocketLib\RocketLauncher_v0.5.csproj project.

So, now we had a class that we could give commands to the launcher such as


Where “Me._Launcher” was an object of type RocketLib.RocketLauncher.  The numbers being passed to the “Move” commands are the number of times to move the launcher turrent.  The unit of “time” or “step” (as we came to refer to it) seemed to translate into a little less than half a degree of rotation (either left/right or up/down).

Armed with this knowledge (see what I did there?), we were able to whip together a little WPF interface that had five buttons on it – Up, Down, Left, Right, and Fire – that controlled the launcher manually.  That became the “Manual” mode.  The “Auto-track” mode, where the Kinect would control the launcher, would come next.


Now we started going through the Kinect SDK Quickstart video tutorials, produced by Microsoft and hosted by Dan Fernandez.  To begin, we wanted to get to the raw position data (X, Y, and Z) from the camera.  We ended up compressing the first four tutorials (“Installing and Using the Kinect Sensor”, “Setting up the Development Environment”, “Skeletal Tracking”, and “Camera Fundamentals”) into a Friday to get ramped up as quickly as possible.

In “Skeletal Tracking Fundamentals”, Dan explains that the Kinect tracks skeletons, not entire bodies.  Each skeleton has 20 different joints, such as palms, elbow, head, shoulders, etc.  We decided to select the “ShoulderCenter” joint as our target.

Next, we added labels for the X, Y, and Z positions of the ShoulderCenter joint to the app, and then started moving around the room in front of the Kinect, seeing how the values changed.  The values are given in meters, with X and Y being 0 when you’re directly in front of the depth camera.  These values are updated in the SkeletonFrameReady event.

Now, the fun could really begin.  We decided to focus on left/right movement of our target, so the Y value is not used in the app at all.

We also decided that since the launcher had a real physical limitation as to how fast it could move, we couldn’t give it too many commands at a time.  The Kinect sends data 30 times a second, so we decided to sample the data twice a second (every 15 frames).

Our first attempt at this was very complicated and clunky, and didn’t work well unless you were at a magical distance from the Kinect (basically we threw enough magic numbers into the equation until it worked for that one distance).  We really ran into problems when we tried to extend that to work for any depth.

It was Doug that hit upon the idea of calculating the angle to turn the launcher as the arc tangent of X/Z as opposed to what we had been doing (the number of steps).  That did two things for us – first, the angle approach was correctly taking the depth information (Z measurement) into account, and second, it meant we only had to store the last known position of the launcher (measured as a number of steps, either positive or negative, with 0 being straight ahead).  If we knew the last position, and we knew where we had to move to, we could swivel the launcher accordingly.

Private Sub nui_SkeletonFrameReady(ByVal sender As Object, ByVal e As SkeletonFrameReadyEventArgs)
    Dim allSkeletons As SkeletonFrame = e.SkeletonFrame
    Dim NewCannonX, DeltaX As Integer Me._FrameCount += 1

    'get the first tracked skeleton Dim skeleton As SkeletonData = ( _
        From s In allSkeletons.Skeletons _
        Where s.TrackingState = SkeletonTrackingState.Tracked _
        Select s).FirstOrDefault()

    Dim ShoulderCenter = skeleton.Joints(JointID.ShoulderCenter)

    Dim scaledJoint = ShoulderCenter.ScaleTo(320, 240)
    Me.UpdateCrossHairs(scaledJoint.Position.X, scaledJoint.Position.Y, scaledJoint.Position.Z)

    Me.HorizontalPosition.Content = ShoulderCenter.Position.X
    Me.VerticalPosition.Content = ShoulderCenter.Position.Y
    Me.DepthPosition.Content = ShoulderCenter.Position.Z

    Dim NormalizedX As Integer = CType(ShoulderCenter.Position.X * 10, Integer)
    Dim AbsoluteX As Integer = Math.Abs(NormalizedX)

    If (Me._IsAutoTrackingEngaged) Then 
 If (ShoulderCenter.Position.Z > 0) Then 
 ' The multipliers of 100 * 1.6 are needed to convert the degrees to move into steps for the cannon 
            NewCannonX = Math.Atan2(ShoulderCenter.Position.X, ShoulderCenter.Position.Z) * 100 * 1.6
            DeltaX = Math.Abs(NewCannonX - Me._LastCannonX)
            If (NewCannonX < Me._LastCannonX) Then 
            End If 
 Me._LastCannonX = NewCannonX
            Me._NetCannonX = NewCannonX
        End If 
 End If 
End Sub

With this logic in place, the tracking became fairly good, regardless of the distance between the target and the Kinect.

Assumptions Uncovered

Since there really wasn’t any feedback that the launcher could give us about it’s current position, this logic make a couple of major assumptions about the world.  First, the Kinect and the launcher have to be pointed straight ahead to begin with, and second, the Kinect needs to remain pointing ahead.

We uncovered the first assumption when the launcher stopped responding to commands to move right.  We could move it to the left, but not to the right.  We fired up the application that comes with it, and discovered a “Reset” button that caused the launcher to swivel all the way to one side, then to a “center” point.  This center point was actually denoted by a raised arrow on the launcher’s base – something I had not seen up to this point.  After we reset it, it would move left and right just fine.  As it turns out, the launcher can’t move 360 degrees indefinitely – it has definite bounds.  The reset function moved it back to center to maximize the left/right motion.

After we discovered that, I would jump out to that app to reset the launcher, and then I had to shut it down again before I could use ours (two apps couldn’t send commands to the launcher – in fact we got runtime errors if we tried to run both apps at the same time).  After a while that got old, so we included a reset of our own.  Since we knew the launcher’s current position, we’d just move in the opposite direction that amount.  We added a Reset button to our own app, and also called the same method when the app was put back to Manual tracking and when it was shut down.

We uncovered the second assumption in a rather amusing way.  During one of our tests we noticed the cannon was constantly aiming off to Doug’s (our target at the time) right.  He could move left or right, but the launcher was always off.  He happened to look up and noticed that the Kinect had been bumped, so it wasn’t pointing directly ahead any more.  As a result, the camera was looking off to one side and all of its commands were off.  After that, we were much more careful about checking the Kinect’s alignment, and not bumping it.

Some fun to be had

Early on we had thought up a “fun” piece of icing on this electronic cake.  What if we took the video image from the camera, and superimposed crosshairs on it?  We could literally float an image with a transparent background over the image control on the form.  If we could get the scaling right, it could track on top of the user’s ShoulderCenter joint.

And we did.  This is turned on using the “Just for Mike” button at the bottom of the app.  During the agency meeting demo, I had walked through the basic tracking, using Mike (our President) as the target, and explained about the video and depth images.  Then – very dramatically – I “noticed” the screen and turned to Doug (who was running the computer) – “uh, Doug?  I think we’re missing something.”  At which point he hit the button to add the cross hairs to the video image. “There we go!  That’s better.”  Mike got a good laugh out of it, as did most of the rest of the audience.  Fun?  Check!

Beyond the fun, though, I thought it was cool that we could merge the video and depth information to such great effect.  Between having the launcher track you, and seeing the cross hairs on your chest – it’s downright eerie.

Performance Issues

So, by this point, we had launcher tracking, both video and depth images refreshing 30 times a second, and crosshairs.

And everything was running on the same thread.

Yeah.  We now had some performance issues to solve.

When the launcher moved at all, and especially when it fired (which took 2-3 seconds to power up and release), the images would completely freeze, waiting for the launcher to complete.  The easy solution?  Duh!  Just put the launcher and the image updates on their own threads.  Um, yeah.  That turned out to be easier said than done.  We’ll cover the multi-threading solution, as well as the speech recognition features in Part 3.  Those two topics turn out to be intertwined.

Update: Full source code for this solution can be found in the “” archive here:

September 10, 2011 Posted by | Microsoft Kinect, Visual Studio/.NET | 2 Comments