robotic board game

class rbgame.game.game.RoboticBoardGame(colors_map, targets_map, required_mail, robot_colors, num_robots_per_player=1, with_battery=False, random_num_steps=False, max_step=500, render_mode=None, log_to_file=False)[source]

Bases: Env, AECEnv

Main class representing the game. The game can be configured with difference parameters.

Parameters:

colors_map (str) – Color map for board.
target_map – Target map for board.
required_mail (int) – Number of mails to win.
robot_colors (list[str]) – Colors of robots.
num_robots_per_player (int) – Number robots per player.
with_battery (bool) – Battery is considered or not.
random_num_steps – Robot can move random number of steps each turn or not.
max_step (int) – Maximum enviroment step.
render_mode (Optional[str]) – The render mode. It can be None or 'human'.
log_to_file (bool) – Log game process to file or not.

action_space(agent)[source]

Parameters:: agent (str) – Agent that need to get action space.
Return type:: Discrete
Returns:: Action space of agent.

agent_iter(max_iter=9223372036854775808)

Yields the current agent (self.agent_selection).

Needs to be used in a loop where you step() each iteration.

Return type:: AECIterable

close()[source]

Close the enviroment.

Return type:: None

get_wrapper_attr(name)

Gets the attribute name from the environment.

Return type:: Any

has_wrapper_attr(name)

Checks if the attribute name exists in the environment.

Return type:: bool

last(observe=True)

Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).

Return type:: tuple[Optional[TypeVar(ObsType)], float, bool, bool, dict[str, Any]]

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:: int: the seed of the current np_random or -1, if the seed of the rng is unknown

observation_space(agent)[source]

Parameters:: agent (str) – Agent that need to get observation space.
Return type:: Dict
Returns:: Observation space of agent.

observe(agent)[source]

Parameters:: agent (str) – Agent that need to observe.
Return type:: dict[str, ndarray]
Returns:: Observation of this agent. Is is a dict with two key: 'observation' and 'action_mask'. Value of 'observation' key is the observation vectors of all robots concatenated. Observation of robot that is controlled by agent is placed in the first place. Value of 'action_mask' key is a binary vector where each element of the vector represents whether the action is legal or not.

property previous_agent: Previous agent.

render()[source]

Display all animations to screen. Only works if enviroment render mode is 'human'.

Return type:: None

reset(seed=None, options=None)[source]

Reset enviroment.

Parameters:

seed (Optional[int]) – Random module seed. If it isn’t None, reset enviroment to same initial state every time.
option – Unused.

Return type:

tuple[dict[str, ndarray], dict[str, Any]]

Returns:

Observation of current agent and some infomations.

run(agents)[source]

Animate game process between agents. User can control robots by keyboard.

Parameters:: agents (list[BaseAgent]) – Agents to act. If it’s None, action is provided from keyboard.
Return type:: tuple[str | None, int]
Returns:: Game time and the winner.

set_wrapper_attr(name, value, *, force=True)

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

Return type:: bool

state()

State returns a global view of the environment.

It is appropriate for centralized training decentralized execution methods like QMIX

Return type:: ndarray

step(action)[source]

Perform enviroment step with input action.

Parameters:: action (int | None) – Action from agent.
Return type:: tuple[dict[str, ndarray], float, bool, bool, dict[str, Any]]
Returns:: Next observation of acting agent, the reward, termination, truncation and infomations. Flag termination - enviroment has finished?, flag truncation - enviroment reaches maximum step and has finished?

sum_count_mail(color)[source]

Parameters:: color (str) – Color of player.
Return type:: int
Returns:: Sum collected mails of one player.

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: Env: The base non-wrapped gymnasium.Env instance