Features¶
OmniSafe transcends its role as a mere SafeRL library, functioning concurrently as a standardized and user-friendly SafeRL infrastructure. We compared the features of OmniSafe with popular open-source RL libraries. See comparison results.
Note: All results in compare_with_repo are accurate as of 2024. Please consider the latest results if you find any discrepancies between these data.
Table 1: Comparison of OmniSafe to a representative subset of RL or SafeRL libraries.
| Features | OmniSafe | TianShou | Stable-Baselines3 | SafePO | RL-Safety-Algorithms | Safety-starter-agents |
|---|---|---|---|---|---|---|
| Algorithm Tutorial | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| API Documentation | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Command Line Interface | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Custom Environment | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Docker Support | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| GPU Support | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Ipython / Notebook | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| PEP8 Code Style | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Statistics Tools | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Test Coverage | 97% | 91% | 96% | 91% | - | - |
| Type Hints | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Vectorized Environments | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Video Examples | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
Compared to classic RL open-source libraries, TianShou and Stable-Baselines3, OmniSafe adheres to the same engineering standards and supports user-friendly features. Compared to the SafeRL library, SafePO, RL-Safety-Algorithms, and Safety-starter-agents, OmniSafe offers greater ease of use and robustness, making it a foundational infrastructure to accelerate SafeRL research. The complete codebase of OmniSafe adheres to the PEP8 style, with each commit undergoing stringent evaluations, such as isort, pylint, black, and ruff. Before merging into the main branch, code modifications necessitate approval from at least two reviewers. These features enhance the reliability of OmniSafe and provide assurances for effective ongoing development.
OmniSafe includes a tutorial on Colab that provides a step-by-step guide to the training process, as illustrated in Figure 2. For those who are new to SafeRL, the tutorial allows for interactive learning of the training procedure. By clicking on Colab Tutorial, users can access it and follow the instructions to understand better how to use OmniSafe. Seasoned researchers can capitalize on OmniSafe’s informative command-line interface, as demonstrated in Figure 1 and Figure 3, facilitating rapid comprehension of the platform’s utilization to expedite their scientific investigations.
Regarding the experiment execution process, OmniSafe presents an array of tools for analyzing experimental outcomes, encompassing WandB, TensorBoard, and Statistics Tools. Furthermore, OmniSafe has submitted its experimental benchmark to the WandB report [1], as depicted in Figure 4. This report furnishes more detailed training curves and evaluation demonstrations of classic algorithms, serving as a valuable reference for researchers.
[1]: https://api.wandb.ai/links/pku_rl/mv1eeetb | https://api.wandb.ai/links/pku_rl/scvni0oj
Figure 1: An illustration of the OmniSafe command line interface. Users can view the commands supported by OmniSafe and a brief usage guide by simply typing omnisafe --help in the command line. If a user wants to further understand how to use a specific command, they can obtain additional prompts by using the command omnisafe COMMAND --help, as shown in Figure 3.
Figure 2: A example demonstrating the Colab tutorial provided by OmniSafe for using the Experiment Grid. The tutorial includes detailed usage descriptions and allows users to try running it and then see the results.
(a) Example of omnisafe analyze-grid --help in command line.
(b) Example of omnisafe benchmark --help in command line.
(c) Example of omnisafe eval --help in command line.
(d) Example of omnisafe train-config --help in command line.
Figure 3: Here are some more details on using omnisafe --help command. Users can input omnisafe COMMAND --help to get help, where COMMAND includes all the items listed in Commands of Figure 1. This feature enables users to swiftly acquire proficiency in executing common operations provided by OmniSafe via command-line and customize them further to meet their specific requirements.
(a) SafetyPointGoal1-v0 |
(b) SafetyPointButton1-v0 |
(c) SafetyCarGoal1-v0 |
(d) SafetyCarButton1-v0 |
Figure 4: An exemplification of OmniSafe’s WandB reports videos. This example supplies videos of PPO and PPOLag in SafetyPointGoal1-v0, SafetyPointButton1-v0, SafetyCarGoal1-v0, and SafetyCarButton1-v0 environments. The left of each sub-figure is PPO, while the right is PPOLag. Through these videos, we can intuitively witness the difference between safe and unsafe behavior. This is exactly what OmniSafe pursues: not just the safety of the training curve, but the true safety in a real sense.
Figure 5: An exemplification of OmniSafe’s WandB reports training curve in SafetyPointGoal1-v0: The left panel represents the episode reward, and the right panel denotes the episode cost, with both encompassing the performance over 1e7 steps.