How do humanoid robots mimic human fine motor skills, such as grasping an egg?

Lukas Neuschäfer-Hölzenbecher
Lukas Neuschäfer-Hölzenbecher
PhD student in human-robot interaction

Haha, that's an excellent question! Grasping an egg can be considered a major "graduation exam" for humanoid robots. It's far more complex than simply reaching out and grabbing it. Behind it lies a whole set of very sophisticated "human-mimicking" technologies.

Let me break it down for you, and you'll understand. It's actually quite similar to how humans grasp objects, involving three steps: See, Think, Do.

Step 1: "See" – I don't just see, I understand what I'm seeing.

First, the robot needs to "see" the egg. But its "seeing" isn't as simple as us taking a photo with a phone.

  • 3D Vision: Robots are usually equipped with more than one camera, especially "depth cameras." These are powerful; they can, like human eyes, determine the distance, size, and shape of an object, building a 3D model of the egg in its "brain." It knows it's an ellipsoid, how long and wide it is, and where it's located on the table.

So, in its first step, it knows: "Okay, there's an oval object about 5 cm tall, 15 cm in front of me."

Step 2: "Think" – Formulating a perfect "egg-grasping" plan.

After seeing the egg, the robot's "brain" (its control computer and AI algorithms) starts calculating frantically. This is the most crucial step.

  • Finding Solutions in the "Knowledge Base": A mature robot has a vast "grasping posture library" in its "brain." Just like humans, we know to cup our hands around a glass and pinch a book with our fingers. Robots are also "taught"; they've seen millions of images of different objects and grasping methods. When it identifies an "egg," it immediately retrieves the most suitable egg-grasping solution from its knowledge base – for example, using three or four fingers to gently encircle the "waist" of the egg, rather than clenching it like a rock.

  • Simulated Practice: Sometimes, for unfamiliar objects, it can even perform thousands of "simulated grasps" in a virtual world to find the optimal angle and force before executing it in reality. This is like a surgeon practicing repeatedly on a model before an operation.

So, in this step, it has thought it through: "Alright, I'll use a 'three-finger embrace' grip. My fingers need to be at this angle, and approaching from this direction will be the most stable."

Step 3: "Do" – Gentle execution, with constant sensing.

This is the final step, and it's where the "hand's" own skill is truly tested.

  • A "Dexterous Hand": Robot hands are no longer cold, hard claws. High-end humanoid robot hands have a dozen or even more independent "joints" (professionally called "degrees of freedom"), controlled by very precise miniature motors and transmission mechanisms (some even mimic human tendons, using cables to pull), allowing their fingers to perform very flexible movements.

  • The Most Important "Tactile Sense": This is the key to grasping an egg! The robot's fingertips are covered with various "tactile sensors." To put it simply, it's like its "skin."

    • Pressure Sensors: The moment a finger touches the egg, the sensors immediately tell the "brain": "Contact! Contact!" Then, as the fingers slowly close, the sensors continuously report the pressure value: "Now it's 0.5 Newtons... 0.8 Newtons... 1.0 Newtons..." The brain has already set a maximum pressure limit for grasping an egg, say 1.2 Newtons (for example); exceeding this value would crush the egg. So, it controls the force to be just right.
    • Sliding Sensors: If, while lifting, it feels the egg slightly slipping due to gravity, the fingertip sensors can detect these tiny changes in friction. It will then immediately increase the force slightly to prevent it from dropping.

Summarizing the entire egg-grasping process:

So, the robot's entire egg-grasping process is like a perfectly coordinated performance:

  1. Eyes (depth camera) say: "Report to brain, an egg has been found, 3D coordinates and dimensions sent!"
  2. Brain (AI algorithm) says: "Received! This is a fragile item, activate 'gentle egg-grasping' protocol. Hand, prepare, target pressure 1.0 Newton."
  3. Hand (robotic hand) says: "Understood! Moving to optimal position, beginning to close fingers."
  4. Skin (tactile sensors) says: "Contact made! Pressure 0.5 N... 0.7 N... 0.9 N... 1.0 N! Pressure stable!"
  5. Brain says: "Excellent, maintain this force, lift it up."

As you can see, this is not a simple "grasping" action at all, but a complete "vision-thought-tactile-feedback" closed loop. It acts, senses, and adjusts simultaneously, which is the true secret behind its ability to perform such delicate human-like movements. In essence, it's the perfect integration of hardware (hands and sensors) and software (AI brain).