Model-based Algorithms¶

The OmniSafe Navigation Benchmark for model-based algorithms evaluates the effectiveness of OmniSafe’s model-based algorithms across two different environments from the Safety-Gymnasium task suite. For each supported algorithm and environment, we offer the following:

Default hyperparameters used for the benchmark and scripts that enable result replication.
Graphs and raw data that can be utilized for research purposes.
Detailed logs obtained during training.

Supported algorithms are listed below:

[NeurIPS 2001] Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (PETS))
[CoRL 2021] Learning Off-Policy with Online Planning (LOOP and SafeLOOP)
[AAAI 2022] Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)
[ICML 2022 Workshop] Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method (RCE)
[NeurIPS 2018] Constrained Cross-Entropy Method for Safe Reinforcement Learning (CCE)

Safety-Gymnasium¶

We highly recommend using Safety-Gymnasium to run the following experiments. To install, in a linux machine, type:

pip install safety_gymnasium

Run the Benchmark¶

You can set the main function of examples/benchmarks/experiment_grid.py as:

if __name__ == '__main__':
    eg = ExperimentGrid(exp_name='Model-Based-Benchmarks')

    # set up the algorithms.
    model_based_base_policy = ['LOOP', 'PETS']
    model_based_safe_policy = ['SafeLOOP', 'CCEPETS', 'CAPPETS', 'RCEPETS']
    eg.add('algo', model_based_base_policy + model_based_safe_policy)

    # you can use wandb to monitor the experiment.
    eg.add('logger_cfgs:use_wandb', [False])
    # you can use tensorboard to monitor the experiment.
    eg.add('logger_cfgs:use_tensorboard', [True])
    eg.add('train_cfgs:total_steps', [1000000])

    # set up the environment.
    eg.add('env_id', [
        'SafetyPointGoal1-v0-modelbased',
        'SafetyCarGoal1-v0-modelbased',
        ])
    eg.add('seed', [0, 5, 10, 15, 20])

    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=5)

After that, you can run the following command to run the benchmark:

cd examples/benchmarks
python run_experiment_grid.py

You can set the path of examples/benchmarks/experiment_grid.py : example:

path ='omnisafe/examples/benchmarks/exp-x/Model-Based-Benchmarks'

You can also plot the results by running the following command:

cd examples
python analyze_experiment_results.py

For a detailed usage of OmniSafe statistics tool, please refer to this tutorial.

OmniSafe Benchmark¶

To demonstrate the high reliability of the algorithms implemented, OmniSafe offers performance insights within the Safety-Gymnasium environment. It should be noted that all data is procured under the constraint of cost_limit=1.00. The results are presented in Table 1 and Figure 1.

Performance Table¶

	PETS		LOOP		SafeLOOP
Environment	Reward	Cost	Reward	Cost	Reward	Cost
SafetyCarGoal1-v0	33.07 ±1.33	61.20 ±7.23	25.41 ±1.23	62.64 ±8.34	22.09 ±0.30	0.16 ±0.15
SafetyPointGoal1-v0	27.66 ±0.07	49.16 ±2.69	25.08 ±1.47	55.23 ±2.64	22.94 ±0.72	0.04 ±0.07
	CCEPETS		RCEPETS		CAPPETS
Environment	Reward	Cost	Reward	Cost	Reward	Cost
SafetyCarGoal1-v0	27.60 ±1.21	1.03 ±0.29	29.08 ±1.63	1.02 ±0.88	23.33 ±6.34	0.48 ±0.17
SafetyPointGoal1-v0	24.98 ±0.05	1.87 ±1.27	25.39 ±0.28	2.46 ±0.58	9.45 ±8.62	0.64 ±0.77

Table 1: The performance of OmniSafe model-based algorithms, encompassing both reward and cost, was assessed within the Safety-Gymnasium environments. It is crucial to highlight that all model-based algorithms underwent evaluation following 1e6 training steps.

Performance Curves¶

SafetyCarGoal1-v0

SafetyPointGoal1-v0

Figure 1: Training curves in Safety-Gymnasium environments, covering classical reinforcement learning algorithms and safe learning algorithms mentioned in Table 1.