Balance as Infrastructure: Environmental Contact Awareness for Loco-Manipulation Recovery in Humanoid Robots

In preparation, 2027

Status: In preparation (thesis chapter 2, IHMC / University of West Florida)

Motivation

Paper 1 treats recovery as the terminal goal: Fall → Stand up → Done. Paper 2 reframes balance as persistent infrastructure: Walk → Pushed → Recover → Keep walking. The key shift is that balance is always on, not activated only during emergencies.

Core Technical Contribution

Nobody has closed the loop from surface detection → stability region extension → recovery contact selection in a single learned policy. This work proposes to do exactly that by extending the asymmetric actor-critic from Paper 1 with an environmental contact opportunity map in the critic:

  • Nearby surface positions and normals (walls, tables, pillars)
  • Reachability flags for each candidate contact surface
  • Wrench-feasibility rewards that generalize beyond flat-ground capture point (handles slopes, vertical contacts)

At deployment, the privileged contact map is replaced with lightweight depth sensing or height maps — same actor architecture, real sensors.

The Selling Experiment

Robot falls toward a wall → extends arm to brace against it → recovers, because the critic learned that the wall extends the horizontal capture point margin. A baseline policy without the contact map ignores the wall and fails to recover.

Loco-Manipulation

The most practically important capability: balance-aware recovery while carrying an object. A grasp-maintenance reward causes the policy to stiffen its grip as balance is challenged, maintaining payload through perturbation and recovery. No mode switching, no object-drop.