Reinforcement Learning for Autonomous Unmanned Aerial Vehicles - Undergraduate Thesis

Implementation of different reinforcement learning algorithms to solve the navigation problem of an unmanned aerial vehicle (UAV) using ROS / Gazebo and Python.

Description

navigation_env.py : The goal of this environment is to navigate a robot on a track without crashing into the walls. Initially, the robot is placed randomly into the track but at a safe distance from the walls. The state-space consists of 5 range measurements. The action-space consist of 3 actions (move_forward, rotate_left, rotate_right). Furthermore, both actions and states have additive white Gaussian noise. The robot is rewarded +5 for moving forward and -0.5 for rotating. If the robot crashes into the wall it is penalized with -200.

There are 3 available worlds/tracks:

Track1:

Track2:

Track3:

Abstract

Reinforcement learning is an area of machine learning concerned with how autonomous agents learn to behave in unknown environments through trial-and-error. The goal of a reinforcement learning agent is to learn a sequential decision policy that maximizes the notion of cumulative reward through continuous interaction with the unknown environment. A challenging problem in robotics is the autonomous navigation of an Unmanned Aerial Vehicle (UAV) in worlds with no available map. This ability is critical in many applications, such as search and rescue operations or the mapping of geographical areas. In this thesis, we present a map-less approach for the autonomous, safe navigation of a UAV in unknown environments using reinforcement learning. Specifically, we implemented two popular algorithms, SARSA(λ) and Least-Squares Policy Iteration (LSPI), and combined them with tile coding, a parametric, linear approximation architecture for value function in order to deal with the 5- or 3-dimensional continuous state space defined by the measurements of the UAV distance sensors. The final policy of each algorithm, learned over only 500 episodes, was tested in unknown environments more complex than the one used for training in order to evaluate the behavior of each policy. Results show that SARSA(λ) was able to learn a near-optimal policy that performed adequately even in unknown situations, leading the UAV along paths free-of-collisions with obstacles. LSPI's policy required less learning time and its performance was promising, but not as effective, as it occasionally leads to collisions in unknown situations. The whole project was implemented using the Robot Operating System (ROS) framework and the Gazebo robot simulation environment.

Supervisor: Associate Professor Michail G. Lagoudakis

https://doi.org/10.26233/heallink.tuc.87066

Getting Started

Prerequisites

This package has been tested in ROS Noetic (Ubuntu 20.04) - Python 3.7.

Required ROS packages:

It is recommended that you build the above packages from source (clone the corresponding git repositories)

Required Python libraries:

NumPy
Gym

Installation

Clone the repository into your catkin workspace:


        cd ~/catkin_ws/src 

        clone https://github.com/NickGeramanis/rl-uav.git

Build your catkin workspace:


        cd ~/catkin_ws 

        catkin_make

Do not forget to source the new setup.*sh file:


        cd ~/catkin_ws 

        source devel/setup.bash

Usage

In order to launch a new world you must start the train.launch file:

roslaunch rl_uav train.launch world:=track1 gui:=true

After the world has started, run the train_uav node ( train_uav.py ) to begin the training process and test different algorithms:

rosrun rl_uav train_uav.py

About

Status

Under maintenance.

License

Distributed under the GPL-3.0 License. See LICENSE for more information.

Authors

Nick Geramanis