High-level Infrastructure Architecture
The Ethernet network is used for cluster management:
1. Console access (iDRAC/BMC)
2. Compute nodes provisioning using xCAT
3. Grid internal communications (scheduling, remote login, naming services, etc.)
Infiniband
InfiniBand is used for storage access and MPI traffic.
Access from HaifaU LAN
All grid components, including compute nodes, are accessible from University of Haifa LAN
Storage
Lustre is a high-performance parallel distributed file system designed for use in HPC and supercomputing environments. It offers scalable and high-speed storage solutions, enabling efficient data access and storage for large-scale computational workloads.
Servers:
Lustre Storage Servers - Redundant servers support storage fail-over, while metadata and data are stored on separate servers, allowing each file system to be optimized for different workloads. Lustre can deliver fast IO to applications across high-speed network fabrics, such as Ethernet, InfiniBand (IB), Omni-Path (OPA), and others.
Storage server - The storage system of the cluster is an HPE Cray E1000 ClusterStor server. Note that this storage is meant for ongoing analyses. This is not an archive system. This is a distributed file system that allows high performance - fast reading and writing of many files, both large and small. It is composed of many disks, but functions as one storage volume, with a total of 919 TB. This total volume is made up from a hybrid set of disks, including both HDD and SSD, which ensures high performance for different usecases.
Please note that your files on the Hive2 storage system are NOT BACKED UP by default. It is your responsibility to backup your vital files and irreplaceable data. While the E1000 server is a highly resilient solution, any system has risk of failure and loss of data. Therefore, please make sure to backup important files.
Backup server - The Hive2 backup system can automatically back up files, but only for users who buy space on the backup server. The backup system uses rsync replication of user's data. Rsnapshot is used to create daily snapshots, keeping up to 14 snapshots.
Name | Quantity | Model | CPU's № | RAM | Notes |
Compute(Old bee's) | 38 | HP XL170r |
24 | 128GB | bee033-071 |
Compute(bee's) | 73 | HP ProLiant XL220n Gen10 Plus |
64 | 250GB | bee073-145 |
Fat node(Old queen's) | 1 | HP DL560 |
56 | 760GB | queen02-03 |
Fat node(queen's) | 2 | HP DL560-G10 |
80 | 1.5TB | queen4 (1.5 TB) queen5 (360GB) |
GPU(vespa's) |
Operating Systems
For xCAT we are using RHEL8.5 for xCAT supported OS reasons. Compute nodes and xCAT managed hosts we are using RHEL9.1. Operating system can be upgraded easily as needed using xCAT.