Reinforcement Learning

Reinforcement Learning for Robotic Manipulation with MuJoCo

Applied reinforcement learning techniques to train a Franka Emika Panda robotic arm to autonomously grasp and lift a cube using the Proximal Policy Optimization (PPO) algorithm in a MuJoCo simulation environment.

Collaboration with: Miro Rava, Luca Lucchina, Endrit Nazifi

Franka Emika Panda robotic arm grasping a cube in MuJoCo simulation

Project Documentation

Download the complete project report with technical details, reward function design, PPO algorithm implementation, and training results.

View PDF Report

Overview

This project applied reinforcement learning techniques to train a Franka Emika Panda robotic arm to autonomously grasp and lift a cube using the Proximal Policy Optimization (PPO) algorithm in a MuJoCo simulation environment. The project focused on designing effective reward functions through iterative refinement, addressing challenges such as unintended behaviors where the robot would balance the cube on its vertices instead of properly lifting it. Implemented reward shaping with intermediate rewards to guide step-by-step progress, optimized hyperparameters for training stability, and developed enhanced observation metrics including cube height and grasp counts for improved debugging and observability.

Approaches

Reward Function Design

Designed and iteratively refined reward functions using reward shaping techniques. Implemented intermediate rewards for reducing distance between gripper and cube, successful grasping and lifting, and penalties for dropping the cube or twisting the arm. This approach enabled step-by-step progress guidance rather than relying solely on sparse rewards.

PPO Algorithm Implementation

Implemented Proximal Policy Optimization (PPO) algorithm, a widely used reinforcement learning method for continuous control tasks. The algorithm enabled stable policy learning through clipped objective functions and multiple epochs of policy updates per batch.

MuJoCo Simulation Environment

Leveraged MuJoCo physics engine for high-fidelity modeling of rigid-body dynamics and continuous control scenarios. The environment featured realistic physical properties including friction, gravity, and collision detection for accurate simulation of the Franka Emika Panda robotic arm.

Enhanced Observation and Metrics

Developed additional observation metrics including cube height and grasp counts to enhance observability and enable more efficient debugging. These metrics helped identify and address unintended behaviors during training.

Results

Successfully trained robotic arm to autonomously grasp and lift a cube
Implemented effective reward shaping to guide step-by-step task completion
Resolved unintended behaviors through iterative reward function refinement
Developed enhanced observation metrics for improved debugging and monitoring
Achieved stable training performance through hyperparameter optimization

Technical Details

Used PyTorch for deep reinforcement learning implementations
Implemented PPO algorithm for continuous control tasks
Leveraged MuJoCo physics engine for realistic simulation
Developed modular codebase structure with separate config, controller, environment, and RL components
Created custom reward functions with intermediate rewards and penalties
Implemented data logging and observation metrics for training analysis
Optimized hyperparameters including learning rate, discount factor, and batch size

Technologies Used

PythonPyTorchMuJoCoPPOReinforcement LearningRobotic Control