OmniSafe Critic¶
|
An abstract class for critic. |
Base Critic¶
Documentation
- class omnisafe.models.base.Critic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]¶
An abstract class for critic.
A critic approximates the value function that maps observations to values. Critic is parameterized by a neural network that takes observations as input, (Q critic also takes actions as input) and outputs the value estimated.
Note
OmniSafe provides two types of critic: Q critic (Input =
observation+action, Output =value), and V critic (Input =observation, Output =value). You can also use this class to implement your own actor by inheriting it.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'.num_critics (int, optional) – Number of critics. Defaults to 1.
use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.
Initialize an instance of
Critic.
|
Implementation of CriticBuilder. |
|
Implementation of Q Critic. |
|
Implementation of VCritic. |
Critic Builder¶
Documentation
- class omnisafe.models.critic.CriticBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]¶
Implementation of CriticBuilder.
Note
A
CriticBuilderis a class for building a critic network. In OmniSafe, instead of building the critic network directly, we build it by integrating various types of critic networks into theCriticBuilder. The advantage of this is that each type of critic has a uniform way of passing parameters. This makes it easy for users to use existing critics, and also facilitates the extension of new critic types.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'.num_critics (int, optional) – Number of critics. Defaults to 1.
use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.
Initialize an instance of
CriticBuilder.- build_critic(critic_type)[source]¶
Build critic.
Currently, we support two types of critics:
qandv. If you want to add a new critic type, you can simply add it here.- Parameters:
critic_type (str) – Critic type.
- Returns:
An instance of V-Critic or Q-Critic
- Raises:
NotImplementedError – If the critic type is not
qorv.- Return type:
Q Critic¶
Documentation
- class omnisafe.models.critic.QCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]¶
Implementation of Q Critic.
A Q-function approximator that uses a multi-layer perceptron (MLP) to map observation-action pairs to Q-values. This class is an inherit class of
Critic. You can design your own Q-function approximator by inheriting this class orCritic.The Q critic network has two modes:
Hint
use_obs_encoder = False: The input of the network is the concatenation of theobservation and action.
use_obs_encoder = True: The input of the network is the concatenation of the output ofthe observation encoder and action.
For example, in
DDPG, the action is not directly concatenated with the observation, but is concatenated with the output of the observation encoder.Note
The Q critic network contains multiple critics, and the output of the network :meth`forward` is a list of Q-values. If you want to get the single Q-value of a specific critic, you need to use the index to get it.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'.num_critics (int, optional) – Number of critics. Defaults to 1.
use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.
Initialize an instance of
QCritic.- forward(obs, act)[source]¶
Forward function.
As a multi-critic network, the output of the network is a list of Q-values. If you want to use it as a single-critic network, you only need to set the
num_criticsparameter to 1 when initializing the network, and then use the index 0 to get the Q-value.- Parameters:
obs (torch.Tensor) – Observation from environments.
act (torch.Tensor) – Action from actor .
- Returns:
A list of Q critic values of action and observation pair.
- Return type:
list[Tensor]
V Critic¶
Documentation
- class omnisafe.models.critic.VCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1)[source]¶
Implementation of VCritic.
A V-function approximator that uses a multi-layer perceptron (MLP) to map observations to V-values. This class is an inherit class of
Critic. You can design your own V-function approximator by inheriting this class orCritic.- Parameters:
obs_dim (int) – Observation dimension.
act_dim (int) – Action dimension.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'.num_critics (int, optional) – Number of critics. Defaults to 1.
Initialize an instance of
VCritic.