OK, here is a brief description.
As you may know, Kinect uses infrared emitter and sensor to get the depth picture of what it sees. You can find some more details on "depth sensing" of Kinect here:
http://physics.stackexchange.com/questions/3852/how-does-the-kinect-device-workWhat happens in iPi Mocap Studio. After calibration, the relative position of two Kinects is known. Thus, given the actor model's pose the system can compute its depth projection in the view of both Kinects. When tracking, for each frame, the system tries many variations of actor model's pose to find that one, which depth projections in both Kinects fit captured depth data in the best way.