コンテンツにスキップ

4. Software Environment

4.1. Switch User Environment

In this system, you can switch the compiler and application use environment by using the Environment Modules system.
For Environment Modules system, please refer to the man command or Environment Modules.

4.1.1. List available modules

You can check available modules with “module avail" or “module ava".

$ module avail

For each version available, please refer to following pages:
System software
Supported applictions

4.1.2. Module information

If you want to know information about a module, module watis MODULE command.

$ module whatis intel/2024.0.2
-------------------------------- /apps/t4/rhel9/modules/modulefiles/compiler --------------------------------
      intel/2024.0.2: Intel oneAPI compiler 2024.0 and MKL

4.1.3. Load modules

Modules are loaded with module load MODULE command.

$ module load intel/2024.0.2

You need to use same modules in a job script as complied/build.

4.1.4. List loaded modules

module list displays currently loaded modules.

$ module list
Currently Loaded Modulefiles:
 1) intel/2024.0.2   2) cuda/12.3.2

4.1.5. Unload modules

If you want to unload modules in use, type module unload MODULE.

$ module list
Currently Loaded Modulefiles:
 1) intel/2024.0.2   2) cuda/12.3.2
$ module unload cuda
$ module list
Currently Loaded Modulefiles:
 1) intel/2024.0.2

4.1.6. Refresh module environment

module purge can refresh module environment. All modules currently loaded will be unloaded.

$ module list
Currently Loaded Modulefiles:
 1) intel/2024.0.2   2) cuda/12.3.2
$ module purge
$ module list
No Modulefiles Currently Loaded.

4.2. Intel Compiler

In this system, you can use Intel compiler (oneAPI), AMD compiler (AOCC), NVIDIA compiler (NVIDIA HPC SDK) and GNU compiler.

Intel compiler's commands are listed below:

Command Language Syntax Former command
ifx Fortran 77/90/95 $ ifx [OPTION] source_file ifort
icx C $ icx [OPTION] source_file icc
icpx C++ $ icpx [OPTION] source_file icpc

When you use Intel compiler, load intel using module command.--help option shows various compiler options.

Info

icc and icpc commands are no loger supported in Intel oneAPI 2024. You should use icx and icpx instead. ifort is also suspected to decommit in near future, we recommend you to use ifx instead.

Info

Since Intel oneAPI 2024, the default standard for C++ has changed from C++14 to C++17. Therefore, syntax errors may occur. For more information, see here.

4.2.1. Compiler options

The compiler options are listed below.

Option Description
-O Same as O2
-O0 Disables all optimizations. Using for debugging,etc.
-O1 Affects code size and locality. Disables specific optimizations.
-O2 Default optimizations. Same as -O.
Enables optimizations for speed, including global code scheduling, software pipelining, predication.
-O3 Aggressive optimizations for maximum speed (, but does not guarantee higher performance).
Optimization including data prefetching, scalar replacement, loop transformations.
-xCORE-AVX512 Generates Intel Advanced Vector Extensions 512 (Intel AVX512), Intel AVX2, AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, SSE instructions for Intel processors. Optimizes for Intel processors with Intel AVX512 instruction set support.
-xCORE-AVX2 Generates Intel Advanced Vector Extensions 2 (Intel AVX2), Intel AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel processors. Optimized for Intel processors supporting the Intel AVX2 instruction set.
-xSSE4.2 Generates Intel SSE4 high-efficiency and high-speed string processing instructions, Intel SSE4 vectorized compiler and media accelerator instructions, and Intel SSSE3, SSE3, SSE2, and SSE instructions for Intel processors. Optimizes for Intel processors supporting the Intel SSE4.2 instruction set.
-xSSSE3 Generates Intel SSSE3, SSE3, SSE2, and SSE instructions for Intel processors, optimized for Intel processors supporting the Intel SSSE3 instruction set.
-qopt-report=n Generates an optimization report. By default, the report is output to a file with an .optrpt extension, where n is the level of detail from 0 (no report) to 5 (most detailed). The default is 2.
-fp-model precise Controls the semantics of floating-point operations. Disables optimizations that affect the precision of floating-point data and rounds intermediate results to the precision defined by the source.
-g The -g option instructs the compiler to generate symbolic debugging information in the object file that increases the size of the object file.
-traceback This option instructs the compiler to generate supplemental information in the object file so that when a fatal error occurs at run-time, the source file traceback information can be displayed.
When a fatal error occurs, correlation information for the source file, routine name, and line number is displayed along with the hex address of the call stack (program counter trace).
The map file and the hex address of the stack displayed when an error occurs can be used to determine the cause of the error.
This option increases the size of the executable program.

Recommended compile optimization options are listed below. The AMD EPYC 9654 in this system supports the Intel AVX512 instruction set, so the -xCORE-AVX512 option can be specified. When -xCORE-AVX512 is specified, the compiler will analyze the source code and generate the optimal AVX512, AVX2, AVX, and SSE instructions. The recommended optimization option is an aggressive and safe option. Optimization may change the order of calculations and may introduce errors in the results.

Info

AVX512, adopted in Intel Xeon (Skylake and later), is now supported by AMD in the 4th generation EPYC of the Zen4 architecture, which is included in this system.

Option Description
-O3 O2 optimization to enable more powerful loop transformations such as fusion, unroll and jam blocks, and IF statement folding.
-xCORE-AVX512 Generates Intel Advanced Vector Extensions 512 (Intel AVX512), Intel AVX2, AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, SSE instructions for Intel processors. Optimizes for Intel processors with Intel AVX512 instruction set support.

If the above options worsen the performance of your program, reduce the level of optimization to -O2 or change the vectorization options. Also try the floating point option if the results do not match.

4.2.3. Intel 64 architecture memory model

Create an executable binary using one of the following memory models

Memory model Description
small (-mcmodel=small) Code and data are limited to the first 2GB of address space so that all access to code and data is instruction pointer (IP) relative addressing.
If -mcmodel option is not specified, this is the default.
medium (-mcmodel=medium) Code is limited to the first 2GB of address space, but data is not. Code can be accessed via IP relative addressing, but data must be accessed via absolute addressing.
large (-mcmodel=large) Neither code nor data is restricted. Both code and data access uses absolute addressing.

IP relative addressing requires only 32 bits, while absolute addressing requires 64 bits. This affects code size and performance. (IP relative addressing is slightly faster to access.)

When the total amount of common blocks, global data, and static data in a program exceeds 2 GB, the following error message is output at link time

<some lib.a library>(some .o): In Function <function>:
  : relocation truncated to fit: R_X86_64_PC32 <some symbol>
…………………
  : relocation truncated to fit: R_X86_64_PC32 <some symbol>

In this case, compile/link with -mcmodel=medium and -shared-intel.

If you specify the medium or large memory model, you must also specify the -shared-intel compiler option to ensure that the appropriate dynamic version of Intel's runtime libraries is used.

4.3. NVIDIA HPC SDK

Each command of NVIDIA HPC SDK (formerly PGI compiler) is as follows.

Command Language Syntax Former command
nvfortran Fortran 77/90/95/2003/2008/2018 $ nvfortran [OPTION] source_file pgfortran
nvc C $ nvcc [OPTION] source_file pgcc
nvc++ C++ $ nvc++ [OPTION] source_file pgc++

For more details about each command, see $ man nvcc.

To use it, load nvhpc with the module command. If you specify the --help option, a list of compiler options will be displayed.

4.4. AOCC

Each command of AMD Optimizing C/C++ and Fortran Compilers (AOCC) is as follows.

Command Language Syntax
flang (clang) Fortran 95/2003/2008 $ flang [OPTION] source_file
clang C $ clang [OPTION] source_file
clang-cpp (clang) C++ $ clang-cpp [OPTION] source_file

To use it, load aocc with the module command. If you specify the --help option, a list of compiler options will be displayed.

4.5. GPU Environment

This system provides you GPU (NVIDIA H100 SXM5) environment in conjection with CPU.

4.5.1. Execution and debug for interactive job

The login nodes (login, login1, login2) do not have GPUs and can only compile and link. Also, execution of high-load programs on login nodes is limited.

You can run GPU codes with interactive and debug on compute nodes by batch system. Please refer Interactive job for more details.

4.5.2. Supported applications for GPU

GPU supported applications are listed below (as of 2024.4.1).

  • ABAQUS 2024 --- Refer to ABAQUS Usage guide.
  • ANSYS 2024R1 --- Refer to ANSYS Usage guide.
  • AMBER 22up05 --- Refer to AMBER Usage guide.
  • Mathematica 14 --- Refer to Mathematica Usage guide.
  • MATLAB 2024 --- Refer to MATLAB Usage guide.
  • Linaro forge(ex:Arm Forge) --- Refer to [Parallel Programming] in Lectures
  • NVIDIA HPC SDK --- Refer to NVIDIA HPC Usage guide.

For other applications, we will provide it sequentially.

4.5.3. MPI Environment with CUDAI

MPI Environment with CUDA is available.

OpenMPI + gcc environment

# load OpenMPI, GCC
$ module load openmpi/5.0.2-gcc
Loading openmpi/5.0.2-gcc
  Loading requirement: cuda/12.3.2

Info

The required related modules are loaded automatically.

OpenMPI + NVIDIA HPC SDK environment

# load OpenMPI、NVIDIA HCP SDK
$ module load openmpi/5.0.2-nvhpc
Loading openmpi/5.0.2-nvhpc

OpenMPI + Intel environment

# load OpenMPI, Intel
$ module load openmpi/5.0.2-intel
Loading openmpi/5.0.2-intel
  Loading requirement: intel/2024.0.2 cuda/12.3.2

Info

The required related modules are loaded automatically.

4.5.4. Multi-Instance GPU (MIG)

You can use half a GPU on node_o and gpu_h.
Multi-Instance GPU(MIG) paritions NVIDIA H100 SXM5 94GB HBM2e on TSUBAME4 into two separate GPU instances.
See details about MIG here
MIG is not used on the other resource types.

4.5.5. Multi-Process Service (MPS)

Multi-Process Service (MPS) allows a single GPU to use multiple CUDA processes.
The processes run in parallel on the GPU, eliminating saturation of the GPU compute resources.
MPS also enables concurrent execution, or overlapping, of kernel operations and memory copying
from different processes to enhance utilization.
See details about MPS here

Info

TSUBAME4 provides T4 original nvidia-cuda-mps-control to avoid sytem trouble.
Be sure to use this command. This command can be used by executing the following module load.
If you use MPS,
module load cuda
Do not change CUDA_MPS_PIPE_DIRECTORY variables or it may damage the other jobs.

Please be aware that if you do not follow these rules and cause damage to other users, we may take measures such as deleting your job without notice or suspending your TSUBAME account.

4.5.6. GPU COMPUTE MODE

Only when using resource type node_f batch job, you can change GPU compute mode.
To change GPU compute mode, specify node_f in the job script and specify #$ - v GPU_COMPUTE_MODE=<MODE> for additional.
The following three modes are available.

Mode Description
0 DEFAULT mode
Multiple contexts are allowed per device.
1 EXCLUSIVE_PROCESS mode
Only one context is allowed per device, usable from multiple threads at a time.
2 PROHIBITED mode
No contexts are allowed per device (no compute apps).

Info

If GPU_COMPUTE_MODE is not specified when node_f is specified, DEFAULT mode (0) is set.
Also, GPU_COMPUTE_MODE is fixed at DEFAULT mode (0) when anything other than node_f is specified.

Here is a sample job script.

#!/bin/sh
#$ -cwd
#$ -l node_f=1
#$ -l h_rt=1:00:00
#$ -N gpumode
#$ -v GPU_COMPUTE_MODE=1
/usr/bin/nvidia-smi

When using interactive job, it can be used as follows.

$ qrsh -g [TSUBAME group] -l node_f=1 -l h_rt=0:10:00 -pty yes -v TERM -v GPU_COMPUTE_MODE=1 /bin/bash

4.6. Use containers

In TSUBAME4.0, Apptainer (Singularity) is available as a container environment for HPC.

An example of how to use Apptainer is shown below.

4.6.1. Build image

An example of the Apptainer image creation process is shown below. This example uses the latest Ubuntu Docker image.

[Options]

  • -nv : Use GPU
  • -B : Mounting a file system
  • -s : Build images in a sandbox format
$ mkdir $HOME/apptainer
$ cd $HOME/apptainer
$ apptainer build -s ubuntu/ docker://ubuntu:latest
INFO:    Starting build...
Getting image source signatures
Copying blob 49b384cc7b4a done
Copying config bf3dc08bfe done
Writing manifest to image destination
Storing signatures
2024/05/29 13:07:49  info unpack layer: sha256:49b384cc7b4aa0dfd16ff7817ad0ea04f1d0a8072e62114efcd99119f8ceb9ed
2024/05/29 13:07:50  warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2024/05/29 13:07:50  warn xattr{$HOME/apptainer/build-temp-2088960457/rootfs/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
INFO:    Creating sandbox directory...
INFO:    Build complete: ubuntu/

$ cd ubuntu/
$ mkdir gs apps      # Create directories to mount /gs and /apps
$ cd ../

4.6.2. Shell start

The following is an example of how to start a shell with Apptainer. Here, the image created by Build image is used.

[Options]

  • -nv : Use GPU
  • -B : Mounting a file system
  • -f (--fakeroot) : Exercise root privileges on containers
  • -w (--writable) : Allows writing into the container

Info

The -w (--writable) option must be specified.
If not specified, we have confirmed that the shell will not start properly.

$ cd $HOME/apptainer
$ apptainer shell -B /gs -B /apps -B /home --nv -f -w ubuntu/
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    Using fakeroot command combined with root-mapped namespace
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container
WARNING: Skipping mount /bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
Apptainer> id
uid=0(root) gid=0(root) groups=0(root)

4.6.2.1. precautions when using fakeroot

When using the fakeroot function (--fakeroot) in Apptainer, the libc version must match between the host and container because Apptainer uses the host's fakeroot as a bindmap.
If the host's libc library is newer than the corresponding library in the container, the fakeroot command may output an error that the GLIBC version is missing. ( Reference:Fakeroot feature )
Example :

/.singularity.d/libs/faked: /lib/x86_64-linux-gnu/libc.so.6: version`GLIBC_2.33' not found (required by /.singularity.d/libs/faked)
/.singularity.d/libs/faked: /lib/x86_64-linux-gnu/libc.so.6: version`GLIBC_2.34' not found (required by /.singularity.d/libs/faked)
fakeroot: error while starting the `faked' daemon.
/.singularity.d/libs/fakeroot: 1: kill: Usage: kill [-s sigspec | -signum |-sigspec] [pid | job]... or
kill -l [exitstatus]
If the above error occurs, recreate the container so that the libc library versions of the host and container match.