robotic board game

class rbgame.game.game.RoboticBoardGame(colors_map, targets_map, required_mail, robot_colors, num_robots_per_player=1, with_battery=False, random_num_steps=False, max_step=500, render_mode=None, log_to_file=False)[source]

Bases: Env, AECEnv

Main class representing the game. The game can be configured with difference parameters.

Parameters:
  • colors_map (str) – Color map for board.

  • target_map – Target map for board.

  • required_mail (int) – Number of mails to win.

  • robot_colors (list[str]) – Colors of robots.

  • num_robots_per_player (int) – Number robots per player.

  • with_battery (bool) – Battery is considered or not.

  • random_num_steps – Robot can move random number of steps each turn or not.

  • max_step (int) – Maximum enviroment step.

  • render_mode (Optional[str]) – The render mode. It can be None or 'human'.

  • log_to_file (bool) – Log game process to file or not.

action_space(agent)[source]
Parameters:

agent (str) – Agent that need to get action space.

Return type:

Discrete

Returns:

Action space of agent.

agent_iter(max_iter=9223372036854775808)

Yields the current agent (self.agent_selection).

Needs to be used in a loop where you step() each iteration.

Return type:

AECIterable

close()[source]

Close the enviroment.

Return type:

None

get_wrapper_attr(name)

Gets the attribute name from the environment.

Return type:

Any

has_wrapper_attr(name)

Checks if the attribute name exists in the environment.

Return type:

bool

last(observe=True)

Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).

Return type:

tuple[Optional[TypeVar(ObsType)], float, bool, bool, dict[str, Any]]

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

int: the seed of the current np_random or -1, if the seed of the rng is unknown

observation_space(agent)[source]
Parameters:

agent (str) – Agent that need to get observation space.

Return type:

Dict

Returns:

Observation space of agent.

observe(agent)[source]
Parameters:

agent (str) – Agent that need to observe.

Return type:

dict[str, ndarray]

Returns:

Observation of this agent. Is is a dict with two key: 'observation' and 'action_mask'. Value of 'observation' key is the observation vectors of all robots concatenated. Observation of robot that is controlled by agent is placed in the first place. Value of 'action_mask' key is a binary vector where each element of the vector represents whether the action is legal or not.

property previous_agent

Previous agent.

render()[source]

Display all animations to screen. Only works if enviroment render mode is 'human'.

Return type:

None

reset(seed=None, options=None)[source]

Reset enviroment.

Parameters:
  • seed (Optional[int]) – Random module seed. If it isn’t None, reset enviroment to same initial state every time.

  • option – Unused.

Return type:

tuple[dict[str, ndarray], dict[str, Any]]

Returns:

Observation of current agent and some infomations.

run(agents)[source]

Animate game process between agents. User can control robots by keyboard.

Parameters:

agents (list[BaseAgent]) – Agents to act. If it’s None, action is provided from keyboard.

Return type:

tuple[str | None, int]

Returns:

Game time and the winner.

set_wrapper_attr(name, value, *, force=True)

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

Return type:

bool

state()

State returns a global view of the environment.

It is appropriate for centralized training decentralized execution methods like QMIX

Return type:

ndarray

step(action)[source]

Perform enviroment step with input action.

Parameters:

action (int | None) – Action from agent.

Return type:

tuple[dict[str, ndarray], float, bool, bool, dict[str, Any]]

Returns:

Next observation of acting agent, the reward, termination, truncation and infomations. Flag termination - enviroment has finished?, flag truncation - enviroment reaches maximum step and has finished?

sum_count_mail(color)[source]
Parameters:

color (str) – Color of player.

Return type:

int

Returns:

Sum collected mails of one player.

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance