Resumen:
The software agents are programs that can perceive from their environment and they act to reach their design goals.
In most cases the selected agent architecture determines its behaviour in response to different problem states.
However, there are some problem domains in which it is desirable that the agent learns a good action execution
policy by interacting with its environment. This kind of learning is called Reinforcement Learning (RL) and is
useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an
immediate reward. Then it actualizes its estimations about every action and, after a certain period of time, the agent
learns which the best action to execute is. Most RL algorithms execute simple actions even if two o more can be
executed. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and
proposes a mechanism to combine actions of different types.