CSC’s unix environment

33
1 CSC’s unix environment

description

CSC’s unix environment. corona.csc.fi and sepeli.csc.fi. ssh connection to Corona. text-based connection does not need much bandwidth no graphics. X-connection to Corona. possibility to use graphical interfaces requires locally installed X-emulator needs more bandwidth. - PowerPoint PPT Presentation

Transcript of CSC’s unix environment

Page 1: CSC’s unix environment

1

CSC’s unix environment

Page 2: CSC’s unix environment

2

corona.csc.fiand

sepeli.csc.fi

Page 3: CSC’s unix environment

3

ssh connection to Corona

• text-based connection• does not need much bandwidth• no graphics

Page 4: CSC’s unix environment

4

X-connection to Corona

• possibility to use graphical interfaces • requires locally installed X-emulator• needs more bandwidth

Page 5: CSC’s unix environment

5

Can I log into Corona if I don’t know unix well?

• you can only delete your own files• you need to be an expert to cause big damage

• Security is important! Keep your password fresh and safe.

Page 6: CSC’s unix environment

6

Directories:

$HOME (/mnt/mds/univX/group/username)- permanent, size limit 200 Mb

$METAWRK (/mnt/mds/metawrk/username)- storage time 1 month, no size limit

$WRK (/wrk/username)- storage time 1 week, no size limit

$TMP (/tmp/username)- storage time 1 day, no size limit

$ARCHIVE (/mnt/fs/archive/univX/group/username)- permanent, no size limit, only for storage

Project directory - a spcial large area of permanent disk spacefor the common usage of the group (needs an

application)

Page 7: CSC’s unix environment

7

homehome u1u1

wrkwrk

univ1univ1 oyoy kkayttajkkayttaj

metawrkmetawrk kkayttajkkayttaj

kkayttajkkayttaj

//

tmptmp kkayttajkkayttaj

archive1archive1 univ1univ1 oyoy

project1project1

Fasta_results

data.dat

proj04.tar

Gradu.tar

own_programsown_programs

report.txt

kkayttajkkayttaj

test.rubbish

run.tmp

$HOME

$ARCHIVE

$METAWRK

$WRKDIR

$TMPDIR

Directories of Kalle Käyttäjä (kkayttaj)

fsfs

archivearchive

Page 8: CSC’s unix environment

8

Unix commands

lsls -lls -l myDirectory

Page 9: CSC’s unix environment

9

Commands for directories:

cd change directory

ls list the contents of a directory

pwd print (=show) working directory

mkdir make directory

rmdir remove directory

Page 10: CSC’s unix environment

10

Commands for files:

cat print file to screencp copyless view text filerm removemv move/rename a filehead show beginning of a filetail show end of a filegrep find lines containing given text

Page 11: CSC’s unix environment

11

cd Go back to home directory from anywhere.cd .. Move one level up in the directory hierarchy. (cd .. in ”structures” directory moves you to directory ”directory1”)

cp thesis.txt directory1/structures Copies file ”thesis.txt” to the subdirectory “structures”.

cp casein.phy ../directory1/ Copies file “casein.phy” to subdirectory “directory1”

directory1casein.fasta

directory2bunnies.txtcasein.phy

structures

$HOME thesis.txt

Examples of using files and directories

Page 12: CSC’s unix environment

12

Use command “less” to view text files

less filenamereturn (next line)space (next screen)b (previous screen)h (show help for less)q (quit) /string (find string from the file)

ls -la | less (pipe ls output to less)

Page 13: CSC’s unix environment

13

Nano (or pico) text editor

nano filenamectrl-c (line number)ctrl-g (help menu)ctrl-k (cut a line)ctrl-o (save)ctrl-r (read a file)ctrl-v (next page)ctrl-c (find a word)ctrl-x (exit)ctrl-y (previous page)

Page 14: CSC’s unix environment

14

Use eog and ggv for displaying images

Eog can display e.g. jpg, tiff, gif and png files.

eog filename.pgn

Ggv can display ps and pdf files

ggv filename.ps

ps2pdf converts a PostScript file into a pdf-fileps2pdf filename.ps

Note: eog and ggv require X connectionYou can use Scientist’s Interface too ( Settings: show)

Page 15: CSC’s unix environment

15

General features:arrow keys browse previous commandstabulator auto-fille commands or file namesmanual pages man commandcontrol-c stops the currently running program (or process)

Special characters:* (asterisk), wild card, means any text

ls *.fasta

| (pipe) guides output of a command to an input of another commands

ls *.fasta | less

> Writes output to a new file

ls > files_of_the_directory.txt

~ (tilde) means your home directory as does $HOME

cp test.txt ~/file.txt

cp text.txt $HOME

Page 16: CSC’s unix environment

16

Batch queue jobs at CSC

Batch queues in Corona and Sepeli• maximum time limit for interactive jobs is 2 h (CPU h)• longer jobs must be submited through the batch queue system• even rather small jobs can cause overload to the the front node of sepeli

Queue systems aim to optimize the usage of the computing resoiurces- customer defines, how much computing time, memory and processors the job needs- the queue system starts the job when suitable resources are available- during the execution the job can effectively utilize the reserved resources

Page 17: CSC’s unix environment

17

N1 grid engine in Corona and Sepeli

• Both Corona and Sepeli use N1 grid engine queue system

• The maximum time and memory limits are different in Sepeli and Corona

Max. time Max. mem Max. procCorona 168 h ( 7 days) 192 Gb 32Sepeli 240 h (10 days) 4 Gb/subjob 128

Page 18: CSC’s unix environment

18

N1 grid engine in Corona and Sepeli

In minimum, a batch job script must include a computing time estimate and all the commands needed to run the program:

#!/bin/tcsh#$ -l h_rt=24:00:00raxml -n test1 -s ratite.phy -m HKY85

The script file is submitted with command:

qsub batch_job.file

The job can be followed with commands

qstatqstat -u username

Page 19: CSC’s unix environment

19

N1 grid engine in Corona and Sepeli

Structure of a batch queue file#!/bin/tcsh “shebang” tells what command shell to use

The lines containing the batch queue definitions start with #$Most common definitions

#$ -l h_rt=h:min:sec reserved time#$ -l v_mem=max_mem(M,G) maximum memory size#$ -pe cre n_proc Number of processors#$ -o run.log output file#$ -e error.log error file#$ -cwd run job in the directory where it was

submitted (works only in corona)

Page 20: CSC’s unix environment

20

N1 grid engine in Corona and Sepeli

Note that batch jobs start from the home directory with the sameSettings as what the user has just after login.

In the batch job file you must take care of:• Moving to right directory (cd $METAWRK/ or -cwd )• Setting up the program environment (use emboss etc.)• Giving all the parameters what the execution of the commands needs

#!/bin/tcsh#$ -l h_rt=24:00:00#$ -o ratite_run.log#$ -e ratite_run.log cd $METAWRK/birds/raxml -n test1 -s ratite.phy -m HKY85

Page 21: CSC’s unix environment

21

N1 grid engine in Corona and Sepeli

For “interactive” programs you can use <<EOF -structure

#!/bin/tcsh#$ -l h_rt=24:00:00#$ -o mrbayes_run.log#$ -e mrbayes_run.log cd $METAWRK/birds/mrbayes64 <<EOFlog start filename=data.logexecute rat1.nxsmcmcnosumpsumtquitEOF

Page 22: CSC’s unix environment

22

N1 grid engine in Corona and Sepeli

Note that in sepeli batch jobs can only use files that locate in the $WRKDIRDirectory. ($WRKDIR is the “home directory in computing nodes)

For short or interactive jobs you can use interactive batch jobs

qrsh -l h_rt=4:00:00

Qrsh opens an interctive session to a one computong node.The maximum length of the session is defined by -l h_rt

Page 23: CSC’s unix environment

23

More information about CSC Unix environment

Unix operating system:http://www.csc.fi/metacomputer/neuvonta.html.enhttp://www.csc.fi/oppaat/metakone/

Text editors:http://www.csc.fi/cschelp/kaytto/editorit.html.en

Page 24: CSC’s unix environment

24

Unix EMBOSS

Page 25: CSC’s unix environment

25

Advantages of unix EMBOSS

- more programs (e.g. Vienna, hmmer, meme)

- possibility to use list files

- big analysis tasks

- you can analyze the same data with other unix programs (Clustal, Phylip, BLAST, FASTA, etc.)

Page 26: CSC’s unix environment

26

EMBOSS in Corona

• use emboss – initializes EMBOSS • showdb - displays the databases linked to EMBOSS• wossname term - finds programs related to a given term• wossname - lists descriptions of all EMBOSS programs

Page 27: CSC’s unix environment

27

EMBOSS in Corona

• you can start a program by typing its name• you can give parameters interactively

corona > seqretReads and writes (returns) sequencesInput sequence(s): swiss:P12067Output sequence [lyc1_pig.fasta]

• or you can give parameters in command line (you can often feed in more parameters in command line)

corona ~> seqret swiss:P12067Reads and writes (returns) sequences

Output sequence [lyc1_pig.fasta]:

Page 28: CSC’s unix environment

28

EMBOSS file formats

• EMBOSS uses USA (Uniform Sequence Address) description for sequence files.

format::database:name (e.g. fasta::swiss:CAS1_human)

• EMBOSS reads and writes several sequence formats including fasta, gcg, staden, swiss, text, clustal. The default format is fasta. One file can include several sequences

• EMBOSS can use list files, which contain sequence names in USA format. List file has to be indicated with @-character to the program (seqret @list.txt)

• short sequences can be fed in command line using asis::sequence

seqret asis::TGCAGCTGCTGCAGCTGCTGC

Page 29: CSC’s unix environment

29

EMBOSS results

• results are stored to a new file (either text file or image)

• text files can be viewed with less- and pico- programs

• images can be viewed through X-term connections or stored as a postscript file

• Use Scientist’s interface to transport data between your machine and Corona

Page 30: CSC’s unix environment

30

EMBOSS command options

-help short command help

-opt ask more parameters interactively-auto use default parameters

corona ~> seqret -help Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA

Optional qualifiers: (none) Advanced qualifiers: -firstonly bool Read one sequence and stop

General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbos

Page 31: CSC’s unix environment

31

EMBOSS general options

Many EMBOSS programs use general options that are not included in the help information. For example:

-sbegin starting point in the sequence

-send ending point in the sequence

-sreverse use reverse sequence

-sask ask -sbegin, -send and -sreverse parameters interactively

-osname name of the output file

-ossingle write sequences into separate files

Page 32: CSC’s unix environment

32

Image output

• EMBOSS program asks for image format:

Graphics device[x11]:• x11 = show in the screen (requires X-term connection)• ps = write image into post-script file.• Data = write a data file instead of image

Page 33: CSC’s unix environment

33

How to find the right EMBOSS program?

• manuals grouped by program function http://www.csc.fi/molbio/progs/emboss/Apps/groups.html

• wossname program: Text search of EMBOSS manuals

• EMBOSS - GCG “dictionary”: http://www.csc.fi/molbio/progs/emboss/comparison.html