HPC meets interactive Data Science and Machine Learning
Carme (/ˈkɑːrmiː/ KAR-mee; Greek: Κάρμη) is a Jupiter moon, also giving the name for a Cluster of Jupiter moons (the carme group).
or in our case…
an open source framework to manage resources for multiple users running interactive jobs on a Cluster of (GPU) compute nodes.
Follow us on Twitter
Presentations
(selection)
Core Idea
We combine established open source ML and DS tools with HPC backends and use therefore
- Singularity containers
- Anaconda environments
- web based GUI frontends e.g. Theia-IDE and JupyterLab
- completely web frontend based
(OS independent, no installation on user side needed) - HPC job management and schedulers (SLURM)
- HPC data I/O technologies like Fraunhofer’s BeeGFS
- HPC maintenance and monitoring tools
Job submission scheme
Key Features
- Open source
- we use only opensource components that allow commercial usage
- Carme is open source, allowing commercial usage
- Seamless integration with available HPC tools
- Job scheduling via SLURM
- Native LDAP support for user authentication
- Integrate existing distributed file systems like BeeGFS
- Access via web-interface
- OS independent (only web browser needed)
- Full user information (running jobs, cluster usage, news / messages)
- Start/Stop jobs within the web-interface
- Interactive jobs
- Flexible access to GPUs
- Access via web driven GUIs like Theia-IDE or JupyterLab
- Job specific monitoring information in the web-interface
(GPU/CPU utilization, memory usage, access to TensorBoard)
- Distributed multi-node and/or multi-gpu jobs
- Easy and intuitive job scheduling
- Directly use GPI, GPI-Space, MPI, HP-DLF and Horovod within the jobs
- Full control about accounting and resource management
- Job scheduling according to user specific roles
- Compute resources are user exclusive
- User maintained, containerized environments
- Singularity containers
(runs as normal user, GPU, Ethernet and Infiband support) - Anaconda Environments
(easy updates, project / user specific environments) - Built-in matching between GPU driver and ML/DL tools
- Singularity containers
Roadmap
- since 04/2018: Carme prototype is up and running on our Cluster
- 03/2019: r0.3.0 (first public release)
- 07/2019: r0.4.0
- 11/2019: r0.5.0
- 12/2019: r0.6.0
- 07/2020: r0.7.0
- 11/2020: r0.8.0 (latest)
- 02/2021: r0.9.0 (development)
Documentation
Visit our documentation at doc.open-carme.org.
Who is behind Carme?
Carme is developed at the machine learning group of the Competence Center for High Performance Computing at Fraunhofer ITWM.
NOTE: We are open for contributions!
Contact
→ info@open-carme.org
Sponsors
The development of Carme is financed by research grants from