PushWorld is a novel grid-world environment designed to test planning and reasoning with physical tools and movable obstacles. While recent advances in artificial intelligence have achieved human-level performance in environments like Starcraft and Go, many physical reasoning tasks remain challenging for computers. The PushWorld benchmark is a collection of puzzles that emphasize this challenge. PushWorld is available as an OpenAI Gym environment and in PDDL format in Github. The environment is suitable for research in classical planning, reinforcement learning, combined task and motion planning, and cognitive science.
Play PushWorld
Here you can play all puzzles in the PushWorld benchmark. All puzzles are solvable.
Select a Puzzle Difficulty
Level 1
Level 2
Level 3
Level 4
Loading
Puzzle Name
PushWorld vs. Related Environments
Solving PushWorld puzzles requires many skills, including: Several existing environments have dynamics similar to PushWorld: Sokoban, sliding block puzzles, and grid-based path planning. However, none of these environments require all of the skills above.
Evaluation: Classical Planners
We compared the following classical planning algorithms on the set of puzzles above: The plot below shows how many puzzles each planner solves within a given time per puzzle. Novelty+RGD solves the most puzzles (69.1%) within 30 minutes per puzzle, followed by FDSS (61.4%). Notably, Novelty+RGD solves as many puzzles in 45 seconds as FDSS solves in 30 minutes, which amounts to a 40x speed improvement. Classical Planning Results Classical Planning Results Classical Planning Results
Evaluation: Model-Free Deep Reinforcement Learning
We selected two deep reinforcement learning (RL) algorithms to evaluate on PushWorld: Deep Q-Network (DQN), an off-policy, value-based algorithm, and Proximal Policy Optimization (PPO), an on-policy, policy-gradient algorithm. We chose these algorithms because they are widely used, easy to implement, and have shown competitive performance on diverse and challenging tasks like Atari and continuous robotic control. We trained both PPO and DQN on all Level 1 puzzles and measured the percentage of Level 1 puzzles they could solve. DQN converged to solving less than 1% of the puzzles, and PPO converged to solving 6% of the puzzles. We believe this low performance is in part due to the low probability of solving most Level 1 puzzles with the initial policy, resulting in sparse positive rewards. To address the sparse reward problem, we programmatically generated a collection of Level 0 puzzle sets that are no larger than 10x10 cells. The PushWorld paper provides details of these puzzle sets, and the table below shows the percentage of training and testing puzzles solved by PPO and DQN. Overall, PPO outperforms DQN but shows signs of overfitting.
Next Steps
PushWorld presents a challenge for both classical planners and model-free RL algorithms. After playing the benchmark puzzles yourself, we hope you were able to solve >95% of the puzzles in far less than 30 minutes per puzzle, demonstrating that current artificial intelligence algorithms are not yet human-level in PushWorld. We hope you are inspired to develop new algorithms that are closer to human-level performance.