General¶
What is TSUBAME4.0?¶
TSUBAME 4.0 is a supercomputer operated and managed by Center for Information Infrastructure (CII) of the Institute of Science Tokyo. TSUBAME 4.0 has a theoretical calculation performance of 952 PFlops (half precision) and is expected to be the largest supercomputer in Japan handling a wide range of workloads including big data and AI in addition to conventional High Performance Computing.
For what purpose can TSUBAME be used?¶
Use of TSUBAME is limited to education, research, clerical work and social contribution purpose only. It can not be used for applications that directly lead to private financial interests. For example, mining virtual currency using block chain technology.
Conditions for using TSUBAME4.0¶
An account is required to use TSUBAME4.0.
Account types vary depending on affiliation and system. See How to apply for a TSUBAME account.
How to get started with TSUBAME4.0¶
This part shows the flow until setup an environment for running program.
There are 6 steps necessary to use TSUBAME4.
When step 1 and 2 are done, login is enabled. To submit jobs, you need to complete additional steps 3 - 5. In addition to your home directory of 25GiB, if you need additional volumes, do step 6.
- Getting an account
- SSH key pair generation and the public key registration
- Creation of a group (group administrator only)
- Addition of users to the group(group administrator and its members)
- Tsubame point purchase(group administrator only)
- Setup of group disk(group administrator only)
How to write acknowledgments in a paper using TSUBAME?¶
Please refer to the following page for an example of how to write an acknowledgement. Please note that this is just an example, and you may adjust the description to match the description of other supercomputers or research funds.
Please mention TSUBAME usage in acknowledgement of publications
In addition, please submit reports on your use of TSUBAME, such as bibliographic information, through TSUBAME Portal to help us understand how TSUBAME is being used. Please refer to the following User's Guide for how to submit usage reports.
TSUBAME portal User's Guide Management of TSUBAME usage report
Differences between login node and compute node¶
The difference between login node and compute node is as follows.
Hardware
Login node | Compute node | |
---|---|---|
# of nodes | 2 | 240 |
CPU | AMD EPYC 7443 24-Core/2.85GHz x 2 | AMD EPYC 9654 96-Core/2.4GHz x 2 |
Memory | 256GiB | 768GiB |
GPU | None | NVIDIA H100 SXM5 × 4 |
Local storage | None | 1.92TB |
The login nodes are shared servers and are not assumed to be used for calculation purpose. Please avoid high-load processing such as program execution at the login nodes, execute it on compute nodes though job scheduler.
Please refer to TSUBAME4.0 User's Guide for details.
How many the number of files is acceptable per one directory?¶
As the number of files per directory increases, the processing time for metadata operations (file creation, deletion, and opening) on the files under the directory increases, or the file system may generate errors, making it impossible to create files.
Even when using a group disk, we recommend arranging files hierarchically with a target of less than 10,000 per directory.
In past cases of inquiries, we observed access delays caused by metadata operations under the condition of around 70,000 files per directory.
Example:
- NG: 00000.dat ~ 99999.dat
- If 100,000 files are placed flat in one directory, the load during file access will increase, causing performance degradation and failure.
- OK: 000/00000.dat ~ 000/00999.dat, 001/01000.dat ~ 001/01999.dat, …
- The hierarchical arrangement minimizes the cost of metadata operations by limiting the number of files per directory to about 1000.
I'm a beginner, I don't know what to do.¶
The content depends on what you are a beginner for.
Beginners of UNIX/LINUX
Upon using TSUBAME 4.0, users are required to master UNIX/Linux proficiently levels. Handbooks are made on this assumption.
If you do not understand the content of the handbooks, please read the UNIX/Limux beginner's book at the library, and understand how to use UNIX shells and commands.
There are various publications on the operation of the "terminal emulator" software. Please check this according to your application software too.
First, please understand the operation of UNIX/Linux, then read our guidebooks, then check the section 3.
Beginners of supercomputer
If you have used UNIX/Linux but never used a job scheduler, please read "Job Scheduler" of TSUBAME 4.0 User's Guide.
In addition, TSUBAME 4.0 regularly conducts workshops, check the page of Lectures.
Please also refer to "Introduction to TSUBAME (Linux basics)" and "TSUBAME4.0利用者ガイダンス" posted on Lectures.
Beginners of TSUBAME 4.0
If you have been using TSUBAME 3, please refer to "Migration from TSUBAME 3".
Please also refer to "TSUBAME4.0利用者ガイダンス" posted on Lectures.
Beginners of compile
If you have used UNIX/Linux but never used a commandline compile, please read Intel compiler or NVIDIA HPC SDK or AOCC.
In addition, if you use intel compiler, refer to "Parallel Programming" in Lectures.
Beginners of parallel programming(OpenMP,MPI)
Please refer to "Parallel Programming" posted on Lectures.
Beginners of GPU programming
TSUBAME 4.0 regularly conducts workshops, check the page of Lectures.
Please also refer to "GPU Programming" and "GPU Programming hands-on" posted on Lectures.
Beginners of ISV application software
Please check the application software guide for each. In addition, TSUBAME 3.0 regularly conducts workshops, check the page of Lectures.
About file transfer¶
File transfer by rsync, scp, and sftp is available on TSUBAME4.0. As well as login, you need to access with your SSH private key which is a pair of your SSH public key registered in TSUBAME3 portal. Also, please check the settings of the application you are using carefully, as some applications may time out.
To install a file transfer application¶
If you are using MobaXterm or RLogin, it is easier to use the built-in file transfer function of these software.
If you are using other software such as PuTTY for connection, you need to install a file transfer application such as FileZilla or WinSCP that supports sftp and rsync protocols. In this case, as well as login, you need to access using SSH private key which is a pair of SSH public key registered in TSUBAME4.0 portal. For Filezilla and WinSCP, you can use the .ppk format key files that you usually use for PuTTY. For details on how to use each software, please refer to the manual of each software.
If the option feature "OpenSSH Client" in Windows 10/11 is enabled, you can use scp and sftp command from command prompt or powershell.
If you are using Linux/Mac/Cygwin (Windows) (rsync, scp, sftp commands)¶
In these environments, rsync, scp, and sftp commands are available. Describes three ways each, rsync, scp, sftp.
rsync:
To transfer from the local to the remote host, execute the following command. If you set the standard path/file name as the key pair location, the -i option is not required.
$ rsync -av --progress -e "ssh -i <Private_Key_File> -l Login_Name" <Local_Directory> <Remote_Host:Remote_Directory>
$ rsync -av --progress -e "ssh -i ~/.ssh/ecdsa -l GSICUSER00" ./ login.t4.gsic.titech.ac.jp:/gs/bs/GSIC
For details such as how to specify the transfer source and tranfer destination, please execute the following command and confirm it.
$ man rsync
scp:
To transfer from local to remote host, execute the following command. If you set the standard path/file name as the key pair location, the -i option is not required.
$ scp -i <Private_Key_File> <Login_Name*@*Remote_Host>:<Remote_Directory> <Local_directory>
Please enter the suitable phrase for your situation in the < >.For example, the command when the user with login name "GSCIUSER00" copies the current directory to /gs/bs/GSIC of TSUBAME 4.0 using ~/.ssh/ecdsa of private key is as follows.
$ scp -i ~/.ssh/ecdsa GSICUSER00@login.t4.gsic.titech.ac.jp:/gs/bs/GSIC .
For details such as how to specify the transfer source and tranfer destination, please execute the following command and confirm it.
$ man scp
sftp:
To transfer interactively, execute the following command.
If you set the standard path/file name as the key pair location, the -i option is not required.
$ sftp -i <Private_Key_File> <Login_Name>@<Remote_Host>
$ sftp -i ~/.ssh/ecdsa GSICUSER00@login.t4.gsic.titech.ac.jp
For details such as how to specify the transfer source and tranfer destination, please execute the following command and confirm it.
$ man sftp
To use CIFS access¶
In addition, only on-campus terminals can be accessed via CIFS.
The CIFS address is
\\gshs.t4.gsic.titech.ac.jp.
Refer to "CIFS access from inside campus" in TSUBAME4.0 User's Guide.
If the connection fails, please also see Can not establish CIFS connection to the group disk, Unable to open TSUBAME group disk on Windows.
I want to copy a large amount of data from/to TSUBAME¶
Please consider the following topics to improve the performance of data transfer between TSUBAME and external computers.
Pack the files to appropriate size¶
Large amounts of small files reduce transfer speed. Pack such files using the tar command to archives of 1GB size each.
Change transfer protocols¶
If you do not get enough speed with scp / sftp, consider using rsync or CIFS(Science Tokyo users only) protocols.
For more details on the CIFS connection, please refer "CIFS access from inside campus" section of the TSUBAME4.0 users guide.
If the connection fails, please also see Can not establish CIFS connection to the group disk, Unable to open TSUBAME group disk on Windows.
Remove the bottleneck on the network route¶
- If you have old LAN cables (CAT-3 or CAT-5 (not CAT-5e)), switching hubs, or routers whose link speed is lower than 1000 Mbps, replace them with newer ones.
- When using a router (WiFi router, NAT router, broadband router, etc.), connect your computer to the external network (in Science Tokyo, IP address starting with 131.112 or 172.16-31) directly.
For details of the network at Science Tokyo, please contact the network administrator of the laboratory. If you are not sure, please contact the branch manager for each building or organization.
Science Tokyo Users Only) Use the iMac terminal of Education Computer Systems¶
If it is difficult to change the network configuration, you can bring your HDD to the CII and connect it to the iMac terminal of the Education Computer Systems in the exercise room to transfer the data. Please check the opening hours.
Terminal room location and Opening Hours (in Japanese)
How to synchronize data between TSUBAME and PC¶
The advantage of the rsync command is that it transfers only the difference. If the transfer is interrupted for any reason, you can start again, or if you run it again after a certain period of time, you can transfer only those files that have changed their content. Data deleted from the source can also be deleted at the destination for complete synchronization.
An example command is shown below. It's a good idea to check the log or run it multiple times, in case the command fails along the way.
Synchronize TSUBAME with the data of the terminal on your local PC.
rsync -auv (source directory) (your login name)@login.t4.gsic.titech.ac.jp:(full path of the destination directory)
Synchronize TSUBAME data to the terminal on your local PC.
rsync -auv (your login name)@login.t4.gsic.titech.ac.jp:(full path of the source directory) (destination directory)
How to terminate the programs executed accidentally¶
Terminate the program according to the following procedure when you accidentally executed a program on the login node where the program execution is prohibited.
See "How to terminate the job submitted to the batch job scheduler" for the deletion of the jobs submitted to the batch job scheduler.
To allow other members to read and write on a group disk¶
Warning
This article is about [group disks (/gs/bs,/bs/fs)]((../../handbook.ja/storage/#group), do not run the following sample in your home directory.
Users are not allowed to change the owner of their files. Therefore, please change the group permissions so that it can be read and written. The point is,
- Change permissions for all files and directories below the directory, not just the top-level directory.
- Add read (R) as well as write (W) permissions to the file. If you don't have a write (w), you can't erase it later.
- The directory should contain not only read (r) but also write (w) and execute (x). You can't access the directory without the execution (x).
Some example commands are shown below. Depending on the original permissions of the file, some errors may occur, in which case, try re-running the command until the output no longer changes.
Find your own directories under /gs/bs/tgX-XXXXXX/ and make them readable and writable by group members.
find /gs/bs/tgX-XXXXXX/ -type d -user $USER ! -perm -2770 -print0 | xargs -r0 chmod -v ug+rwx,g+s
Find your own files under /gs/bs/tgX-XXXXXX/ and make them readable and writable by group members.
find /gs/bs/tgX-XXXXXX/ -type f -user $USER ! -perm -660 -print0 | xargs -r0 chmod -v ug+rw
Find your own files under /gs/bs/tgX-XXXXXX/ and match the ownership group to the TSUBAME group.
find /gs/bs/tgX-XXXXXX/ -user $USER ! -group (TSUBAME group name) -print0 | xargs -r0 chgrp -v (TTSUBAME group name)
The basic configuration of Module file¶
The basic configuration of Module file is described below.
- Module file listed as [Application name]/[Version].
- If you do not specify a version, it loads the preset default version
If multiple versions exist, the default version is loaded.
$ module avail cuda
---------------------
cuda/12.0.0 cuda/12.1.0 cuda/12.3.2
$ module list
Currently Loaded Modulefiles:
1) cuda/12.3.2
- For applications with dependencies such as MPI, they can be used by loading them in advance.
$ module load openmpi/5.0.2-intel
Loading openmpi/5.0.2-intel
Loading requirement: intel/2024.0.2 cuda/12.3.2
CPU / GPU allocation at resource designation in Altair Grid Engine(AGE)¶
AGE assigns virtual CPUID / GPUID according to the specified number of resources except node_f.
- In case of CPU
As an example of cpu_8 of a resource type that reserves only one CPU and resource type cpu_4 that reserves 4 CPUs, When s_core=7 is specified, seven nodes are allocated and 1 core of each node is allocated. When q_core=7 is specified, seven nodes are allocated and 4 cores of each node are allocated.
- In case of GPU
In the case of resource type gpu_1 that reserves only one GPU, When s_gpu=4 is specified, 4 nodes are reserved and the GPU of each node is virtually assigned as GPU 0. Just because you secured 4, it does not mean GPU 0, 1, 2, 3.
In node_h which is a resource type that reserves 2 GPUs, 2 GPUs are allocated within the node, in this case GPU 0 and 1 are allocated.
How to create an SSH key pair in Linux/Mac/Windows(Cygwin/OpenSSH)¶
Warning
If your SSH private key is leaked, your account will be misused by the third-party. Please secure your private key with setting the passphrase.
Info
TSUBAME4.0 login node cannot be connected with RSA key (SHA-1). ecdsa key or ed25519 key is recommended.
SSH key pair creation method in Linux / Mac / Windows (Cygwin or OpenSSH) is as follows.
Please check man ssh-keygen command for key type difference.
There are correspondence / unsupported types depending on the version of openssh.
ecdsa key type:¶
$ ssh-keygen -t ecdsa
ed25519 key type:¶
$ ssh-keygen -t ed25519
When you execute one of the above commands, you will be asked for the save location as follows.
If there is special circumstance to avoid, such as the same filename is already used for other purpose, just press Enter key to use the default value.
(If you are already using SSH key pair for other sites, you can reuse the same file for TSUBAME)
Generating public/private <keytype> key pair.
Enter file in which to save the key <$HOME/.ssh/id_keytype:> (No need to type filename)[Enter]
Then you will be prompted for a passphrase, so enter it.
Enter passphrase (empty for no passphrase): (Set passphrase; What you type will not appear in screen) [Enter]
Re-enter your passphrase for confirmation.
Enter same passphrase again: (Enter the same passphrase again for confirmation; What you type will not appear in screen) [Enter]
A key pair is created and saved to two files. The upper line shows the location of private key, and the lower line shows that of public key. Register the public key via TSUBAME portal.
our identification has been saved in *$HOME/.ssh/id_keytype*
Your public key has been saved in *$HOME/.ssh/id_keytype* .pub.
The key fingerprint is:
SHA256: *random number:username@hostname*
The key's randomart image is:
(鍵Some text specific to the generated key pair will be shown)
Check the file with the following command.
$ ls ~/.ssh/ -l
drwx------ 2 *user* *group* 512 Oct 6 10:50 .
drwx------ 31 *user* *group* 4096 Oct 6 10:41 ..
-rw------- 1 *user* *group* 411 Oct 6 10:50 *private_key*
-rw-r--r-- 1 *user* *group* 97 Oct 6 10:50 *public_key*
If the permissions are incorrect, correct them with the following command.
$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/*private_key*
How to create an SSH key pair using PuTTY or MobaXterm¶
Warning
If your SSH private key is leaked, your account will be misused by the third-party. Please secure your private key with setting the passphrase.
Info
TSUBAME4.0 login node cannot be connected with RSA key (SHA-1). ecdsa key or ed25519 key is recommended.
This article describes how to create SSH keypair to be used TSUBAME3 using PuTTYgen, which will be installed with PuTTY. MobaKeyGen from MobaXterm has the same functionarity and UI.
You will get a dialog similar to this by executing PuTTYgen:
- Select the key type in "Type of key to generate".
ECDSA or EdDSA format is recommended. For other formats, you may not be able to connect to TSUBAME4.0. - Press "Generate" to create SSH key-pair
You can adjust key-pair configurations using "Parameters" box, but you don't have to do so in most cases. - Enter your passphrase in the “Key passphrase” and “Confirm passphrase” fields to prevent others from using TSUBAME4.0 without permission.
The specified passphrase will be used when you login. - Press "Save private key" to save a private key file of generated key-pair for future login.
After registering public key to TSUBAME4.0, anyone who can read this private-key file can log in to TSUBAME4.0 with your account.
Please keep this file safe, DO NOT carry with USB stick, or send via e-mail etc. Private keys must not be shared with others. - Copy the string (public key) shown in “Public key for pasting...”.
The public key is registered in TSUBAME4.0 portal. login to TSUBAME4 portal, and add the string you copied according to the procedure of SSH public key registration.
Info
In the image, the public key looks like three lines, but it is actually one line. Do not insert line breaks.
The private and public keys are a pair. Note that each time you click the Generate button, the pair is re-generated.
Warning
Never choose Parameters' SSH-1 (RSA).
Please refer to I want to know how to login to TSUBAME4.0 using PuTTY for the login procedure to TSUBAME4.0 using PuTTY.
Please refer to the manual for information on how to use PuTTY.
I want to know how to login to TSUBAME4.0 using PuTTY¶
- Start PuTTY. "PuTTY Configuration" will start.
- Select "Session" in the left pane and enter the host name of the login node in the red frame. Please enter login.t4.gsic.titech.ac.jp.
- Select “Connection” - “SSH” - “Auth” - “Creditials” in the left pane and specify the “private key” in the red frame.
- If you want to use X forwarding using interactive nodes, select “Connection” - “SSH” - “X11” in the left pane and check the red box.
- Select “Session” in the left pane.
Give an arbitrary name to the (1) section and press “Save” (2) to save the settings made in 2. through 4.
To login to the login node, select the saved settings (3) and press “Open” (4).
- "Security Alart” will be displayed only when connecting for the first time. Check the contents and if there are no problems, press “Accept”.
- The login screen will appear.
In “login as,” enter your TSUBAME login name (beginning with “u”).
In “Passphrase for key,” enter the passphrase you specified when creating the key pair.
About common errors in Linux¶
Here we have a FAQ on Linux common errors.
For details on how to use the described command, please check with the man command etc.
No such file or directory¶
There is no required file or directory.
It occurs when specifying a nonexistent file, directory name, etc., typing, or incorrect path specification.
Also, depending on the application, it may occur when the line feed code is CR + LF on windows.
Measures
Please review the file and directory name carefully. Also, please check FAQ "The job status is "Eqw" and it is not executed." about the newline character.
There are related errors as follows.
error while loading shared libraries: ****.so: cannot open shared object file: No such file or directory
Measures
Please check with ldd command. There is a way to set the environment variable LD_LIBRARY_PATH, explicitly specify the library at compile time, and so on.
command not found¶
The command you entered does not exist.
Depending on the type of program you wish to run, perform the following checks.
-
- You may not have purchased the application. Please refer to the FAQ below.
- You may not have executed the module command. Please load the necessary module.
-
- You may not have executed the module command. Please load the necessary modules.
-
Program you've installed yourself
- The environment variable PATH may not be set correctly.
Make sure you use the "echo $PATH" command to set the path to the directory that contains the command you need, and if not, set it.
- The environment variable PATH may not be set correctly.
Example of adding the "hoge" directory directly under the home directory ($HOME) to the existing environment variable "PATH"
$ export PATH=~/hoge:$PATH
Permission denied¶
You are not authorized to perform the operation you attempted to perform.
Linux and user and group permissions are set on a file / directory basis.
Check the authority of the target file or directory you want to read or write or execute with the following command.
(When checking the hoge file for an example)
$ ls -l hoge
Measures
If you are trying to create files in / etc, / lib etc which are system directories etc, please make it in the user directory.
If it occurs in a user directory such as a group disk, check the authority and please do.
Disk quota exceeded¶
Please check FAQ How to solve "Disk quota exceeded" error"".
Out Of Memory¶
This error occurs when memory runs out.
Measures
Change the resource type to one with more memory capacity.
Divide the memory usage per node with mpi etc.
Related FAQ "Check the detail of an error message printed the log file"
Related URL¶
-
The error when executing the qrsh command
- Check the detail of an error message printed the log file
- "Warning: Permanently added ECDSA host key for IP address 'XXX.XXX.XXX.XXX' to the list of known hosts." in the error log
- Errors and remedies of qsub command execution
- The range of support by T4 Helpdesk about the program error such as segmentation fault
- Error handling for each ISV application
How to solve "Disk quota exceeded" error¶
This message indicates there is no space left in ether a home directory or a group disk.
When you face it, you should delete unused files or purchase an additional group disk to keep enough free disk space.
The following command can be used to check disk usage for all directories, including hidden directories.
cd $HOME
du -h --max-depth=1 | sort -hr
Please note that temporary files are generated at the home directory in some cases, and an application sometimes needs over 25 GB of a disk space for creating temporary files. (25 GB is the capacity of a home directory)
I want to change the directory where cache files, user files, etc. used by the application are stored.
To avoid running out of disk space, we recommend not to use home directory but to use local scratch area or shared scratch area for temporary file location.
** Related FAQ ** How to check TSUBAME points, group disk usage, home directory usage
Session suddenly disconnected while working on TSUBAME4.0¶
For TSUBAME4.0, session timeout is set for security measures.
Sessions that do not have any input at a certain time are disconnected.
Even if the GUI application is started up and operated, if there is no input to the terminal, it will be disconnected.
If you want to avoid this, please set keep alive on the terminal side.
Please check the user guide of the terminal for keep alive setting.
Available SSH client on Windows¶
The following SSH clients on Windows are available to connect TSUBAME.
OpenSSH client (Windows 10/11 functionality)¶
OpenSSH client can be installed via [Apps]-[Manage optional features] section in Settings app.
ssh, ssh-keygen, etc commnads(same as linux) are available after the installation.
PuTTY¶
PuTTY is a free SSH Client software. Please refer to this article to generate the SSH key.
MobaXterm¶
This software includes an SSH client and an X11 server.
Most X11 applications on TSUBAME seem to work fine.
Please refer to this article to generate the SSH key.
Window Subsystem for Linux (WSL)¶
Linux environments can be constructed on Windows by downloading Linux distribution(such as Ubuntu, OpenSUSE) from Windows 10/11 store.
ssh, ssh-keygen commands are available from that.
Cygwin¶
Cygwin provides a pseudo-Linux environment on Windows.
ssh, ssh-keygen commands are available from that. We strongly recommend using other software.
The combination of compiler and mpi module¶
GNU, Intel oneAPI, NVDIA HPC SDK and OpenMPI can be used in combination.
gcc is provided by the OS. Please check the available version with the following command.
$ gcc --version
$ module avail
Info
Please note that if you use any other OpenMPI than the following OpenMPI provided with TSUBAME4.0, the operation is not guaranteed and not supported.
1. Intel OpenMPI
$ module load openmpi/5.0.2-intel
Loading openmpi/5.0.2-intel
Loading requirement: intel/2024.0.2 cuda/12.3.2
$ mpicc -v
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /apps/t4/rhel9/isv/intel/compiler/2024.0/bin/compiler
Configuration file: /apps/t4/rhel9/isv/intel/compiler/2024.0/bin/compiler/../icx.cfg
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/11
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
Found CUDA installation: /apps/t4/rhel9/cuda/12.3.2, version
2. GNU OpenMPI
$ module load openmpi/5.0.2-gcc
Loading openmpi/5.0.2-gcc
Loading requirement: cuda/12.3.2
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-host-pie --enable-host-bind-now --enable-languages=c,c++,f
ortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/
bugzilla --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit -
-disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --en
able-plugin --enable-initfini-array --without-isl --enable-multilib --with-linker-hash-style=gnu --enable-offload-ta
rgets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_6
4=x86-64-v2 --with-arch_32=x86-64 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serial
ization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.1 20230605 (Red Hat 11.4.1-2) (GCC)
3. NVIDIA HPC SDK OpenMPI
$ module load openmpi/5.0.2-nvhpc
Loading openmpi/5.0.2-nvhpc
Loading requirement: nvhpc/24.1
$ mpicc -v
Export NVCOMPILER=/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/24.1
Export PGI=/apps/t4/rhel9/isv/nvidia/hpc_sdk
nvc-Warning-No files to process
When trying to save a file with Emacs during an interactive job, the screen froze¶
It is caused by the flow control by specific input characters, enabled on the default terminal setting.
Flow control is a function to temporarily hold a data transfer in order to prevent overflow of the receiving side, for example, when the transmission speed exceeds the reception packet speed in data transfer. In general, Ctrl+S is used for pending transfers, and Ctrl+Q is used for restart as control characters.
When editing with Emacs interactively and overwrite saving, you have to enter Ctrl+S, but it is also a flow control character, so the packet will not be transferred and it will be as if it were frozen. To fix, please enter Ctrl+Q.
To disable flow control, you need to execute the following command before running interactive job.
$ stty -ixon
If you want to always disable flow control, add the above command to .bashrc in your home directory.
Cannot login to TSUBAME4.0 (ssh, Permission denied (publickey,hostbased) etc.)¶
Please refer the following checklist before contacting us for help.
1. Is your account name correct?
Please confirm you are using TSUBAME4.0 account.
- TSUBAME4.0 account is different from TSUBAME3 account. We are receiving an increasing number of inquiries using TSUBAME3 accounts.
- Your TSUBAME4.0 account is different from your student number or faculty and staff number.
- Do you have a TSUBAME4.0 account? For information on obtaining an account, please refer to How to obtain an account.
2. Did you registered a public key in correct format?
Please confirm that you have registered a public key in OpenSSH format to TSUBAME portal.
You can not login to TSUBAME3 if you registered a public key in PuTTY format.
Please refer to the following links for how to create a key pair.
Please refer to the following for how to register the public key.
Info
TSUBAME4.0 login node cannot be connected with RSA key (SHA-1). ecdsa key or ed25519 key is recommended.
3. Is the command you entered correctly? ( for Linux / Mac / Windows(Cygwin) )
Please make sure your login name and path to your private key file (*) are specified in the command line correctly.
$ ssh <TSUBAME4 account>@login.t4.gsic.titech.ac.jp -i <private key>
Example) When your login name is gsic_user and the location of a secret key is ~/.ssh/t4-key,
$ ssh gsic_user@login.t4.gsic.titech.ac.jp -i ~/.ssh/t4-key
Tips
If your private key file is stored in one of the following locations under your home directory (the case that you have not modified the location from the default value), you can omit "-i
.ssh/id_rsa, .ssh/id_dsa, .ssh/id_ecdsa, .ssh/id_ed25519
Info
TSUBAME4.0 login node cannot be connected with RSA key (SHA-1). ecdsa key or ed25519 key is recommended.
Please refer to the following man command for options of ssh command.
$ man ssh
4. Does the symptom reproduce in another terminal environment?
There are various types of terminal software for Windows.Please check whether it reproduces even with another terminal software.
If not reproduce, it may be a software-specific problem.In that case we can not respond to your inquiries, so please understand.
5. Do the symptoms reproduce with different network?
There may be a problem with the network from which you are accessing the site and you may not be able to connect.
If there are multiple access routes, change the access source and see if the problem reproduces itself.
- On campus/off campus
- Lab/other labs
- Home/public WiFi
If the connection status changes when the network is changed, there may be a problem with the router, firewall, or other settings. Please check the settings with your administrator.
Also, in the case of Windows, security software may be blocking communication. Please temporarily turn off your security software and see if you can connect. Please refer to the manual of your security software for the workaround.
If you do not solve any of the above solutions, please contact us with the following information.
-
Operation system (Windows10, Debian12, macOS 14.4.1 and so on)
-
Terminal software and the version (Cygwin, PuTTY, Rlogin and so on)
For details on how to check the version, please refer to the terminal software manual.
For Linux / Mac OS, please send SSH version. You can check with the following command.
$ ssh -V
- The operation you tried. If you get an error, please send the details.
For Linux / Mac OS, please send the output of ssh command with -v option (debug mode) including the command line itself.
Example) When an account name is gsic_user and the location of a secret key is ~/.ssh/t4-key,
$ ssh gsic_user@login.t4.gsic.titech.ac.jp -i ~/.ssh/t4-key -v
Can not establish CIFS connection to the group disk, Unable to open TSUBAME group disk on Windows¶
Access to the TSUBAME group disk using CIFS is only available within the university. It cannot be accessed from off-campus.
Even in the campus network, CIFS may be blocked by routers or other devices in the middle of the network, such as laboratories, and cannot be used in such cases.
Please check the default settings of routers in general, as communication on TCP/UDP port 445 is often blocked by default.
Info
The communication between branch lines in the campus network is set to TCP/UDP 445 is blocked. However, because an exception is made for the TSUBAME4 network, the communication is not blocked from the branch line network (building switch) onward.
If there is no problem with the port blocking setting, the CIFS server may not be able to be reached in the first place, please check the network communication to the CIFS server.
Please check the PING using a DOS prompt etc.
C:\> ping gshs.t4.gsic.titech.ac.jp
To access the group disk from Windows, you need to set a TSUBAME password. Please configure from the TSUBAME portal. For the setting method, please see here. And, if the message "Password is incorrect" or "Password has expired" is displayed, please reset your TSUBAME password.
FAQ about group disk¶
About group disk¶
The Group disk is the high-speed storage area (SSD) and large-capacity storage area (HDD) described in the "Usage Guide". This is a shared storage that allows each group to use the capacity set on the TSUBAME portal.
Usage period: In one-month increments until the end of the fiscal year (end of March) including the month of purchase.
Point and inodes per purchase unit
Type | Purchase unit | Point | inode |
---|---|---|---|
large-capacity storage area (HDD) | 1TB | 0.5 | 2,000,000 |
high-speed storage area (SSD) | 100GB | 0.2 | 200,000 |
How to set: you can set from the TSUBAME Portal Reference:TSUBAME Portal User's Guide "10. Management of Group Disk"
What is the group disk grace period ?¶
Group disks are reset once at the end of the fiscal year, and all group disks are in a grace state that can only be read/deleted. This period is called the grace period, and usually it will be maintained around the middle of April.
Reference:TSUBAME Portal User's Guide
If the data of the previous year remains and you purchase it after the grace period, it becomes as follows.
For example, if you purchased 50TB in the previous year and you used a capacity of 45TB.
1) When 45TB is deleted during the grace period and the used capacity is 0.
You can purchase from 1TB which is the minimum capacity.
2) When 25TB is deleted during the grace period and the used capacity is 20TB.
Available from over 20TB.
3) If the used capacity is not deleted during the grace period (used capacity is 45TB)
Available from over 45TB.
If you do not need the previous year's data, please delete it during the grace period.
Related FAQ¶
- Checking the usage of group disks with command
- Can not establish CIFS connection to the group disk
- "Disk quota exceeded" error is output
About the IP address of the gateway server for compute nodes(connection to license servers outside TSUBAME, etc.)¶
The IP address range of the compute node gateway server is as follows.
131.112.133.241, 131.112.133.242
When computing on TSUBAME by using a campus or university license server, please set so that communication within the above range is permitted.
Please keep in mind that the above address may be changed without notice from the circumstances of operation.
If your software requires communication with a license server outside of TSUBAME (e.g., in a laboratory), please confirm that you can communicate with the license server from a network outside of TSUBAME and outside of the license server before contacting us with the following information.
- Global IP address of the license server
- Port number of the license server (or all ports if there are more than one)
- IP address of the host where the communication test was performed
An error such as "fork: Resource temporarily unavailable" is displayed on the login node.¶
The login node has a limitation of 50 processes per user. Therefore, if you create a process that exceeds the limit, you will get an error like this. For more information, please refer to Please refrain from occupying the CPU in the login nodes..
Can Docker be used with TSUBAME4.0?¶
Can Singularity be used with TSUBAME4.0?¶
Can container be used with TSUBAME4.0?¶
Docker cannot be used with TSUBAME4.0. Apptainer(Singularity) is available.
For more information, see Use containers.
Can I use Jupyter Lab with Tsubame 4.0?¶
Jupyter Lab is available in Tsubame4.0. See Open OnDemand User's Guide.
The group disk suddenly became unusable.¶
Since a group disk is allocated every month, the amount of the group disk used may exceed the allocated size when the month is crossed.
If this situation continues, all access to the target group disk will be prohibited at a specific time.
- If you wish to check the usage status of the group disk, please refer to Confirmation of group disk usage.
- For information on what to do when the amount of group disk usage is exceeded, please refer to What to do when the group disk usage exceeds the reserved size.
Please also refer to FAQ about group disk.
I would like to know how to utilize the GPU in TSUBAME4.0.¶
The following documents, which are available on the Lectures page, may be helpful.
- How to make the most of TSUBAME4's GPU
I want to use a debugger/profiler that supports multi-threading/multi-processing¶
Linaro Forge(ex:Arm forge) is available.
Please refer to "Parallel Programming" on the Lectures page for information on how to use it.
OpenOnDemand or Jupyter fails to start¶
If you experience any of the following problems when using OpenOnDemand (TSUBAME Desktop) or Jupyter, your configuration file may be corrupted.
- Application does not start
- Application terminates abnormally after startup
Delete the following directories
Application | Directory path |
---|---|
OpenOnDemand(TSUBAME Desktop) | ~/ondemand |
Jupyter | ~/.jupyter |
Info
These directories are automatically created at the time of use. Deletion of these directories usually has no effect.
If you edit files in these directories and experience problems, your inquiry is not covered.
The procedure for deleting a directory on OpenOnDemand is as follows
- Log in to OpenOnDemand. For instructions, see Login to Open OnDemand for instructions.
- Click Files - Home Directory.
- Check the Show Dotfiles checkbox.
- Select the directory to be deleted.
4-1. For Jupyter Click ⋮ next to the .jupyter directory and click Delete. (Directories beginning with . )
4-2. OpenOnDemand(TSUBAME Desktop)の場合
Click ⋮ next to the ondemand directory and click Delete.
- A confirmation dialog box will appear. Confirm that it is the directory to be deleted and click OK.
GLIBC not found error when using Apptainer¶
When using the fake root function (--fakeroot) in Apptainer, the libc version must be matched between the host and the container.
If they do not match, you may get the following error.
/.singularity.d/libs/faked: /lib/x86_64-linux-gnu/libc.so.6: version`GLIBC_2.33' not found (required by /.singularity.d/libs/faked)
/.singularity.d/libs/faked: /lib/x86_64-linux-gnu/libc.so.6: version`GLIBC_2.34' not found (required by /.singularity.d/libs/faked)
fakeroot: error while starting the `faked' daemon.
/.singularity.d/libs/fakeroot: 1: kill: Usage: kill [-s sigspec | -signum |-sigspec] [pid | job]... or
kill -l [exitstatus]
I want to use VS Code.¶
Access to login nodes using VS Code is prohibited because it places a heavy load on the login nodes.
Please refrain from occupying the CPU in the login nodes.
In TSUBAME4.0, it is possible to connect to compute nodes via Open OnDemand using code server.
Please consider using code server. Please refer to Open OnDemand User's Guide for details on how to use it.
Notes on using MPS (Multi-Process Service) function in TSUBAME4.0¶
TSUBAME4 provides T4 original nvidia-cuda-mps-control to avoid sytem trouble.
Be sure to use this command. This command can be used by executing the following module load.
If you use MPS,
module load cuda
Please be aware that if you do not follow these rules and cause damage to other users, we may take measures such as deleting your job without notice or suspending your TSUBAME account.
I want to change the directory where cache files, user files, etc. used by the application are stored¶
If you want to change the directory where cache files, user files, etc. used by the application are stored, please consider the following procedure.
Info
If the used capacity of the group disk exceeds the set capacity, access will be disabled.
Please be careful not to exceed the set capacity of the group disk, as this is done on a monthly basis.
In addition, the procedures for changing the settings and any problems that may occur as a result of changing the settings are not covered by this inquiry.
-
using the settings provided by the application
Some applications allow the storage location of cache files, user files, etc. to be changed.
The method of changing these settings differs depending on the application, such as environment variables, etc. Please refer to the man or other documentation. -
replace the relevant directory with a symbolic link.
If the setting does not exist on the application side, there is a way to move to the group disk using a symbolic link.
Note that this procedure does not guarantee normal execution by all applications.
Here is an example of moving the ${HOME}/.cache directory to /gs/bs/tga-xxxxxxx/tsubametarou.
cd ${HOME} mv -i .cache /gs/bs/tga-xxxxx/tsubametarou # Move the current .cache directory to the group disk ln -nfs /gs/bs/tga-xxxxx/tsubametarou/.cache # Symbolic link to ${HOME}/.cache
I want to use an external storage or cloud storage service¶
Mounting external disks using user privileges is not permitted. Cloud storage services are also not supported. Please use a group disk.
If you have an HPCI account, you can mount only HPCI Shared Storage with user privileges. For details, please refer to the HPCI User Manual or contact the HPCI Help Desk.