How Do Construction Robots Learn to Do Things?
The construction industry is witnessing investments for the deployment of construction robots. Allied Market Research projected that the construction robotics market would reach $7.88 billion by 2027. For instance, Dusty Robotics, a Silicon Valley-based construction startup, announced a $16.5 million Series A in 2021. The main reasons for investing in construction robot tech are to increase productivity and enhance efficiency on repeatable tasks, such as brick laying and surveying construction sites. But how can robots learn to do human tasks? How can you train robots to do complex tasks within construction sites? In this article, we will review one of the methods for training robots: reinforcement learning.
Before talking about robots, I would like to share a personal story about how I trained my dog using reinforcement learning. I rescued Goofy, a Havanese-Terrier mix, 2.5 years ago, when he was three years old. Goofy did not know a single command, so I used treats to train him (e.g., sit, stay, high-five, down, and come here.) The principle of training Goofy was simple: whenever I gave him a command, he observed and responded by taking an action. If he did the desired action, I would say “Yes” and give him a treat as a reward. The treats motivated him to repeat the desired actions or behavior. This method of training is called positive reinforcement, which uses rewards (treats, praise, or toys) for desired actions or behaviors.
Like dogs, robots are not naturally programmed to take the actions that we desire. After all, they are machines and have no intelligence; however, we can teach them to perform tasks through repeated trial-and-error. One of the most useful techniques for training robots to make a sequence of decisions to perform tasks is reinforcement learning, which is one of the sub-branches of Machine Learning (ML) discussed in the previous article.
Robots should be trained to take the actions that we desire.
In reinforcement learning, an agent (Goofy in the last example) interacts with the environment (Goofy’s surroundings, including myself) and learns based on reward (treats) or penalty signals. The agent explores and collects the data from its own actions and experiences (observation) and maps them to the received reward or penalty signals (as the performance metric) — a trial and error approach. The agent uses the received reward or penalty signals, as the performance metrics, to learn the best actions or behaviors within the environment. Otherwise, without the performance metrics, the agent does not know what is good or bad. Once the training is over, you need to validate the trained policy, which is simply a function, telling the agent what action to take in any position it is. If you find a mistake or error, you need to retrain the agent.
One of the most useful ML techniques to train robots is reinforcement learning, which consists of an agent, an environment, observations, actions, rewards, and policy (or function.)
An Example Application of Reinforcement Learning in Construction Sites
Let’s look at an example of reinforcement learning in construction sites. Let’s say you want to fly an autopilot drone over the construction site to record all activities. You can use reinforcement learning to train the drone (agent) to navigate the optimal and safest path within the construction site by itself.
Imagine the entire construction site is represented as a 2D grid (N x M) with obstacles within that grid (e.g., cranes, buildings, or materials.) The drone, as an agent, flies over the construction site to reach the target location while avoiding obstacles. It will start from the starting point and can perform four actions within the 2D grid: moving up, down, right, or left. The agent receives a reward or penalty for each movement: receiving a positive reward for moving closer to the target cell and via the safer zone, and receiving a negative reward for moving further from the target cell or entering the banned cells.
The policy defines that the drone should avoid red cells under all circumstances, and it can’t enter the black cells, which represent the building. If the drone mistakenly enters the red or black cells, then it has to start over. During the training, based on the received rewards and penalties, the agent learns to navigate the 2D grid and reach the target cell via the optimal and safest path (Option A & Option B as shown in the picture.) Once the training is over, you need to verify the results and evaluate the agent and the trained policy. If no issue, then the drone should be able to navigate to the target destination within the real construction site.
In reinforcement learning, an agent interacts with its environment and receives positive or negative rewards to adjust its action or behavior.
To increase the accuracy of the trained policies, you can use deep reinforcement learning, which will be discussed in future articles, to train your construction robots. To keep this article manageable, various methods of reinforcement learning and agent designs are not explained. If you are interested in learning more about the technical aspects of reinforcement learning, I recommend reading Artificial Intelligence: A Modern Approach written by Stuart J. Russell and Peter Norvig.