rl_agent

class rbgame.agent.rl_agent.RLAgent(policy, memory=None, update_per_step=1.0, repeat_per_collect=1000)[source]

Bases: BaseAgent

Base Reinforcement Learning agent.

Parameters:

policy (BasePolicy) – Policy.
memory (Optional[VectorReplayBuffer]) – Replay Buffer.
update_per_step (float) – How many times agent samples from memory and learns per one step, using only in offpolicy algorithms.
repeat_per_collect (int) – How many times agents learns on sampled data, using only in onpolicy algorithms.

abstract get_action(obs)

Compute action from observation.

Parameters:: obs (dict[str, ndarray]) – Observation and action mask from game.
Return type:: int
Returns:: Action.

abstract infer_act(obs_b_o, mask_b, exploration_noise)[source]

Forward batch of observations through network.

Parameters:

obs_b_o (ndarray) – Batch of observations.
mask_b (ndarray) – Batch of action masks.
exploration_noise (bool) – Exploration or not.

Return type:

ndarray

Returns:

Batch of actions.

abstract policy_update_fn(batch_size, num_collected_steps)[source]

Update policy.

Parameters:

batch_size (int) – Batch size.
num_collected_steps (int) – Number collected steps.

Return type:

int

Returns:

Number gradient steps.

class rbgame.agent.rl_agent.OffPolicyAgent(policy, memory=None, update_per_step=1.0, repeat_per_collect=1000)[source]

Bases: RLAgent

get_action(obs)[source]

Compute action from observation.

Parameters:: obs (dict[str, ndarray]) – Observation and action mask from game.
Return type:: int
Returns:: Action.

infer_act(obs_b_o, mask_b, exploration_noise)[source]

Forward batch of observations through network.

Parameters:

obs_b_o (ndarray) – Batch of observations.
mask_b (ndarray) – Batch of action masks.
exploration_noise (bool) – Exploration or not.

Return type:

ndarray

Returns:

Batch of actions.

policy_update_fn(batch_size, num_collected_steps)[source]

Update policy. For offpolicy algorithms, agent samples batch_size of transitions from replay buffer to learn and repeats it several times.

Parameters:

batch_size (int) – Batch size.
num_collected_steps (int) – Number collected steps.

Return type:

int

Returns:

Number gradient steps.

class rbgame.agent.rl_agent.OnPolicyAgent(policy, memory=None, update_per_step=1.0, repeat_per_collect=1000)[source]

Bases: RLAgent

get_action(obs)[source]

Compute action from observation.

Parameters:: obs (dict[str, ndarray]) – Observation and action mask from game.
Return type:: int
Returns:: Action.

infer_act(obs_b_o, mask_b, exploration_noise)[source]

Forward batch of observations through network.

Parameters:

obs_b_o (ndarray) – Batch of observations.
mask_b (ndarray) – Batch of action masks. Unused.
exploration_noise (bool) – Exploration or not. Unused.

Return type:

ndarray

Returns:

Batch of actions.

policy_update_fn(batch_size, num_collected_steps)[source]

Perform one on-policy update by passing the entire buffer to the policy’s update method.

Parameters:

batch_size (int) – Batch size.
num_collected_steps (int) – Number collected steps. Unused.

Return type:

int

Returns:

Number gradient steps.