LEARNING SYSTEM DESIGN FOR GAME APPLICATIONS

Authors

  • G.A. Yegoshyna О.S. Popov Odessa National Academy of Telecommunications
  • S.M. Voronoy О.S. Popov Odessa National Academy of Telecommunications
  • A.A. Ovdieichuk О.S. Popov Odessa National Academy of Telecommunications

DOI:

https://doi.org/10.33243/2518-7139-2020-1-2-82-91

Abstract

The presented paper investigates the problem of designing a learning system for agents in intelligent game applications based on Unity Game Engine and reinforcement machine learning algorithms. Modern trends in the game applications development are characterized by the active using of the concept of an intelligent agent as a behavior model of an active element in various situations with applying various strategies for interactions with other active elements and the environment. In recent years, there have been a significant number of advances in this area, such as DeepMind and the Deep Q learning architecture, the winning of the Go Game Champion with AlphaGo, OpenAI and PPO. Unity developers have implemented a support for machine learning and, in particular, for deep reinforcement learning in order to create a deep reinforcement learning the SDK (Software Development Kit) for game and simulation developers. With Unity and ML-Agents toolkits we can create physically, visually, and cognitively rich environments, including ones for evaluating new algorithms and strategies. However, learning system design for agents in Unity ML-Agents is possible only by using the Python API. The possibility of a learning system design for agents in the Flappy Bird game application based on the Unity Game Engine with using its own environment is discussed in this paper. Separately, the paper highlights typical features of the Flappy Bird gaming application environment. The environment can be implemented as a fully observable environment or a partially observable environment. The fully observable environment is suggested to be used due to all environment states in this case are seen in the playfield. Thus, the problem of strategy formation is considered as a Markov decision-making process and the agent directly observes the current state of the environment. Temporal Difference Learning is used as a learning method; it involves the assessment of a reward at each stage. Two separate environments, deterministic and stochastic, have been implemented, that allows to conduct further research and evaluation of strategy formation algorithms

Downloads

Published

2021-05-29

Issue

Section

Радіотехніка і телекомунікації