Queue system administration
To do list
The following is a list of items requiring attention, in order of priority:
- Reinstall OS on Pegasus.
- Measure usage on each machine using PBS/Torque accounting tools.
- Clean up our junk in NE47-181 and E40-008: throw away boxes, remove old Cyrus2 cluster.
- Find better way to manage user data after they leave the group
Backup services
Backup service for our four clusters is provided by MIT TSM. This comes at a cost of $65 per month per system.
...
First, establish connection to the TSM server.
Code Block |
---|
$ sudo |
...
dsmc
|
Next, you can see the version of the file stored on the TSM server.
Code Block |
---|
$ query backup /path/ |
...
filename
|
You can restore the file to its original location, or restore the file to a new location. For restoring folders with subdirectories use option --sub=yes.
Code Block |
---|
$ restore backup /path/ |
...
filename OR |
...
$ restore backup “/path/filename” /newpath/newfilename |
For case 2, the procedure is little involved. There are few important things to take into account. The exact procedure is included below:
1). The backup can only be restored using a linux machine. The linux This is because the downloading machine has to masquerade as the original cluster in order to retrieve data. All our clusters use some form of linux. Therefore, a linux machine is required to retrieve data from TSM server.
2). The filesystem of the disc on which you are writing retrieved data should be exactly similar to the filesystem used on the cluster. For example, if the files on the cluster are written on a drive with xfs file system, you have to use a disc on the third machine, which is also formatted as xfs.
3). TSM software for linux can be downloaded from the IS&T website. The installation procedure for a new Ubuntu version is described here.
4. The older version of the software is written for RHL5 and is available as an rpm. 4) Install the "ksh" package and the "alien" package. ksh ksh is needed since several of the scripts included with TSM use ksh. More More important is "alien" as that lets which lets admins , this allows users to install RPM packages on Ubuntu or other Debian-based distributions.
Code Block |
---|
$ sudo apt-get install ksh alien |
The next step is to use 5)Use alien to install the appropriate RPMs. $ sudo alien -i :
Code Block |
---|
$ sudo alien -i --scripts TIVsm-API.i386.rpm TIVsm-BA.i386. |
...
rpm
|
6). There are several other libraries which are required by TSM like libstdc++.so.5 etc. Download and install the required files from apt-get or some other source.
7). Change the Nodename, backup server and errorlog file location in dsm.sys. This file is located in the /opt/tivoli/tsm/client/ba/bin/ folder. Settings for each cluster are given on this page.
8). Follow the instructions from case 1 above to restore files to a new location. Look up the documentation pages for tsm TSM commands like restart restore, cancel restore, etc.
Installing TSM backup software
The TSM 5.4 software has been installed in accordance with the instructions on the TSM page. There is a need to install older libraries, namely libstdc++.so.5. On Darius1 this was done as follows: the compat-libstdc+ package was downloaded from here and here, and then installed using the "yum localinstall" command:
Code Block |
---|
lrwxrwxrwxsudo 1yum localinstall compat-libstdc++-33-3.2.3-61.x86_64.rpm sudo yum localinstall compat-libstdc++-33-3.2.3-61.i386.rpm sudo yum localinstall TIVsm-API.i386.rpm sudo yum localinstall TIVsm-BA.i386.rpm |
The next steps are to edit the dsm.opt and dsm.sys files as described in the instructions. Those files include the default location for the backup logs:
Code Block |
---|
/opt/tivoli/tsm/client/ba/bin/dsmsched.log and
/opt/tivoli/tsm/client/ba/bin/dsmerror.log
|
Finally, running the dsmc program as root will let the user enter the initial password. Next, a line can be added to /etc/inittab to automatically start the dsmc scheduler; to initialize it after installing, the root user can simply execute the dsmc command with the "sched" argument:
Code Block |
---|
# nohup /usr/bin/dsmc sched > /dev/null 2>&1 &root root 18 Jun 3 2010 /usr/lib64/libstdc++.so.5 -> libstdc++.so.5.0.7 \-rwxr-xr-x 1 root root 825400 Jan 8 2007 /usr/lib64/libstdc++.so.5.0.7 lrwxrwxrwx 1 root root 18 May 21 2010 /usr/lib/libstdc++.so.5 -> libstdc++.so.5.0.7 -rwxr-xr-x 1 root root 733456 Aug 21 2006 /usr/lib/libstdc++.so.5.0.7-rwxr-xr-x 1 root root 733456 Aug 21 2006 /usr/lib/libstdc++.so.5.0.7 |
TSM registration information
The four clusters backed up with TSM have the following registration information. The TSM system automatically assigns an initial password (newpass), but according to the registration e-mail, this will be automatically changed to a new, encrypted password, and stored on the machine after the first connection to the TSM servers.
Darius2
Server: oc11-bk-ent-1.mit.edu
Nodename: DARIUS2.CSBI
Schedule: BUS-0700
Darius1
Server: backup-i.mit.edu
Nodename: DARIUS1.CSBI
Schedule: BUS-2400
...