3. Environment¶
3.1. Storage¶
In TSUBAME4.0, a home directory and two types of group disks (fast storage area and large storage area) are available.
The home directory and fast storage area are built on SSD shared storage, and the large storage area is built on HDD shared storage.
TSUBAME4.0 | Storage | Mount point | Capacity | Filesystem |
---|---|---|---|---|
High-speed storage area Home directory (SSD) |
/gs/fs /home |
372TB | Lustre | |
Large-scale (Big) storage area Shared application deployment (HDD) |
/gs/bs /apps |
44.2PB | Lustre | |
Local scratch area (SSD) | /local | 1.92TB/node | xfs |
The local scratch area is located on the NVMe SSD of each compute node and can be used for temporary files, etc. during the computation.
Info
The capacity of the available local scratch area is determined by the resources acquired.
The shared scratch area (BeeOND) that was available in TSUBAME3 has been discontinued. For details, see Appendx.4. Storage for details.
Resource type | Local scratch area (GB) |
---|---|
node_f | 1920 |
node_h | 960 |
node_q | 480 |
node_o | 240 |
gpu_1 | 240 |
gpu_h | 120 |
cpu_160 | 96 |
cpu_80 | 48 |
cpu_40 | 24 |
cpu_16 | 9.6 |
cpu_8 | 4.8 |
cpu_4 | 2.4 |
3.2. Compute nodes¶
The compute node for TSUBAME 4.0 is a 4th generation AMD EPYC 9654 on the Zen4 architecture, with more than 6 times more cores per node than TSUBAME 3.0.
The compute node has 4 NVIDIA H100 Tensor Core GPUs.
TSUBAME3.0 | TSUBAME4.0 | |
---|---|---|
Computing Unit | Compute node HPE SGI ICE-XA 540 nodes | Compute node HPE Cray XD665 240 nodes |
Components (per node) | ||
CPU | Intel Xeon E5-2680 v4 2.4GHz x 2 Socket | AMD EPYC 9654 2.4GHz x 2 Socket |
Cores/Threads | 14cores / 28threads x 2CPU | 96cores / 192threads x 2CPU |
Memory | 256GiB | 768GiB (DDR5-4800) |
GPU | NVIDIA TESLA P100 for NVlink-Optimized Servers x 4 | NVIDIA H100 SXM5 94GB HBM2e x 4 |
SSD | 2TB | 1.92TB NVMe U.2 SSD |
Interconnect | Intel Omni-Path HFI 100Gbps x 4 | InfiniBand NDR200 200Gbps x 4 |
Info
TSUBAME4.0 calculation nodes are from r1n1 to r23n11. r: 1 to 23 n: 1 to 10 or 11
3.3. Job Scheduler¶
TSUBAME4.0 uses the Altair Grid Engine (AGE), the successor to the UNIVA Grid Engine (UGE) of TSUBAME3.0.
The resource types in TSUBAME4.0 are as follows.
The number of resource types has increased, and the number of cores available for each resource type has also increased.
Resource type | Physical CPU cores | Memory (GB) | GPUs | Local scratch area (GB) |
---|---|---|---|---|
node_f | 192 | 768 | 4 | 1920 |
node_h | 96 | 384 | 2 | 960 |
node_q | 48 | 192 | 1 | 480 |
node_o | 24 | 96 | 1/2 | 240 |
gpu_1 | 8 | 96 | 1 | 240 |
gpu_h | 4 | 48 | 1/2 | 120 |
cpu_160 | 160 | 368 | 0 | 96 |
cpu_80 | 80 | 184 | 0 | 48 |
cpu_40 | 40 | 92 | 0 | 24 |
cpu_16 | 16 | 36.8 | 0 | 9.6 |
cpu_8 | 8 | 18.4 | 0 | 4.8 |
cpu_4 | 4 | 9.2 | 0 | 2.4 |
3.3.1. Subscription Job¶
TSUBAME4.0 introduces a "subscription" that allows quasi-exclusive use of computation nodes on a monthly basis.
Only intramural users and joint use (academic) users can use this service.
To submit a job under the subscription system, add -q prior. Other options are the same as the pay-as-you-go system.
$ qsub -q prior -g [TSUBAME group] SCRIPTFILE
Option | Description |
---|---|
-g | Specify the TSUBAME group name. Please add as qsub command option, not in script. |
-q prior | Subscription job. Wait one hour at most until execution. |
For more details about compute node subscription, check here.
Warning
Even if a job for the subscription group, note that if -q prior
is not specified, the job will be processed as a pay-as-you-go job.
3.4. Software¶
3.4.1. Commercial application¶
The differences between commercial applications available in TSUBAME4.0 and TSUBAME3.0 can be found here.
Each application fee is required for the use of some commercial applications. For more details, please refer to Fare Overview Commercial Applications (Partially charged in TSUBAME4.0).
3.4.2. Freesoft¶
The difference between the free software available for TSUBAME4.0 and TSUBAME3.0 can be found here.
3.4.3. Applications used in TSUBAME 3.0¶
TSUBAME4.0 and TSUBAME3.0 have different compilers, MPI, and various libraries, so they cannot be run as they are. It is necessary to recompile the program on TSUBAME4.0.