5. Using the command line

5.1. Goals

  • Explain how shell relates to the operating system, and users’ programs.
  • Explain when and why command-line interfaces should be used instead of graphical interfaces.

5.2. Background

A shell is a program like any other. What’s special about it is that its job is to run other programs rather than to do calculations itself. The most popular Unix shell is Bash, the Bourne Again SHell (so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.

5.3. What does it look like?

The shell prompt allows you to interact with you computer and the files and folders present in it.

campus-009-192:~ eoziolor$

You can enter commands in shell that will allow you to execute actions.

campus-009-192:~ eoziolor$ ls
2018-setacna-rnaseq     Library             awscli-bundle           popgen_class
Applications            Movies              fgfh_post           power
Desktop             Music               jobs                training
Documents           Pictures            learn_python
Downloads           Public              phgenome_post
GIGAIII_bioinformatics_workshop angus_private_key

You can modify the actions of commands (like ls) by using flags aka options

campus-009-192:~ eoziolor$ ls -lh /
total 13
drwxrwxr-x+ 67 root      admin   2.1K Oct 16 14:59 Applications
drwxr-xr-x+ 64 root      wheel   2.0K Oct  1 16:21 Library
drwxr-xr-x   2 root      wheel    64B Oct  1 16:17 Network
drwxr-xr-x@  5 root      wheel   160B Sep 20 21:05 System
drwxr-xr-x   6 root      admin   192B Oct  1 16:17 Users
drwxr-xr-x+  3 root      wheel    96B Oct 18 14:12 Volumes
drwxr-xr-x  22 eoziolor  staff   704B Aug 31 11:39 anaconda3
drwxr-xr-x@ 37 root      wheel   1.2K Sep 20 21:17 bin
drwxrwxr-t   2 root      admin    64B Oct  1 16:17 cores
dr-xr-xr-x   3 root      wheel   4.2K Oct 15 17:58 dev
lrwxr-xr-x@  1 root      wheel    11B Oct  1 16:16 etc -> private/etc
dr-xr-xr-x   2 root      wheel     1B Oct 18 12:52 home
-rw-r--r--   1 root      wheel   313B Aug 17 17:55 installer.failurerequests
dr-xr-xr-x   2 root      wheel     1B Oct 18 12:52 net
drwxr-xr-x   6 root      wheel   192B Oct  1 16:17 private
drwxr-xr-x@ 63 root      wheel   2.0K Oct  1 16:16 sbin
lrwxr-xr-x@  1 root      wheel    11B Oct  1 16:16 tmp -> private/tmp
drwxr-xr-x@  9 root      wheel   288B Sep 20 21:01 usr
lrwxr-xr-x@  1 root      wheel    11B Oct  1 16:16 var -> private/var
drwxr-xr-x   3 root      wheel    96B Oct  1 16:19 vm

In this case I am using the modifier -lh, to make ls (list) spit out a (l)ist of files/folders that is (h)uman readable in the directory /.

In order to find out how to use a command you can always use the command man in front of whichever command you’d like to use. That will give you the (man)ual for that command.

LS(1)                     BSD General Commands Manual                    LS(1)

NAME
     ls -- list directory contents

SYNOPSIS
     ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]

DESCRIPTION
     For each operand that names a file of a type other than directory, ls displays its name as well as any requested, associated information.
     For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well as any
     requested, associated information.

     If no operands are given, the contents of the current directory are displayed.  If more than one operand is given, non-directory operands
     are displayed first; directory and non-directory operands are sorted separately and in lexicographical order.

     The following options are available:

     -@      Display extended attribute keys and sizes in long (-l) output.

     -1      (The numeric digit ``one''.)  Force output to be one entry per line.  This is the default when output is not to a terminal.

     -A      List all entries except for . and ...  Always set for the super-user.

     -a      Include directory entries whose names begin with a dot (.).

     -B      Force printing of non-printable characters (as defined by ctype(3) and current locale settings) in file names as \xxx, where xxx
             is the numeric value of the character in octal.

     -b      As -B, but use C escape codes whenever possible.

     -C      Force multi-column output; this is the default when output is to a terminal.

     -c      Use time when file status was last changed for sorting (-t) or long printing (-l).

     -d      Directories are listed as plain files (not searched recursively).

     -e      Print the Access Control List (ACL) associated with the file, if present, in long (-l) output.

     -F      Display a slash (`/') immediately after each pathname that is a directory, an asterisk (`*') after each that is executable, an at
             sign (`@') after each symbolic link, an equals sign (`=') after each socket, a percent sign (`%') after each whiteout, and a ver-
             tical bar (`|') after each that is a FIFO.

     -f      Output is not sorted.  This option turns on the -a option.

     -G      Enable colorized output.  This option is equivalent to defining CLICOLOR in the environment.  (See below.)

That way you can experiment with commands and use various options of them.

P.S.: To quit the man prompt you can just type “q”

5.4. Usage

Command prompts are simple and are a language of their own. That means that unless you are percise and accurate about what you want to do - it will get done poorly!

Let’s say that you forget a space between your command and the modifier you are using.

campus-009-192:~ eoziolor$ ls-F
-bash: ls-F: command not found

Bash has no idea what you mean and is likely not going to try to guess what it is. This is why you should always double, triple, quadruple check scripts that you write. One letter off can throw out a whole pipeline and sometimes you might not even know that it’s happening.

5.4.1. Advantages

Biggest advantage of using command line is that you know exactly what you are doing. This is not a black box in which a program is going to magically analyze your data and spit out results. If you’re here, usually you know what your data looks like and format it needs to take to be received by the various programs of unix command-line.

Another big advantage of unix is that it uses pipes (|). Basically that is the way for unix to rush the output from one command into another, so that you don’t have to save any intermediates. We will chat more about this later, but overall that simplifies analyses immensely.

5.6. Creating files and folders

5.6.1. New folders

You can use the command mkdir (make directory) to make a folder in the directory you are.

campus-009-192:~ eoziolor$ mkdir blabla
campus-009-192:~ eoziolor$ ls
2018-setacna-rnaseq     Library             awscli-bundle           phpopg
Applications            Movies              blabla              popgen_class
Desktop             Music               fgfh_post           power
Documents           Pictures            jobs                training
Downloads           Public              learn_python
GIGAIII_bioinformatics_workshop angus_private_key       phgenome_post

If you would like to delete a folder, you can use the command rmdir (remove directory), but that only works if there is nothing in the folder.

To delete a folder with contents type rm -rf followed by the directory. BE VERY CAREFUL with this! You cannot undo this! It does not go into trash, it just disappears FOREVER!

5.6.2. Things to note about folder names

Here are some rules of thumb for creating new folders:

  • don’t have spaces in the name - it confuses bash
  • name them something simple that you can remember
  • avoid capital letters - bash is sensitive to them
  • create structure in your directories

5.6.3. New files and editing

In a similar manner you can create a new file with the command touch

campus-030-034:~ eoziolor$ touch document.txt
campus-030-034:~ eoziolor$ ls
2018-setacna-rnaseq     Library             awscli-bundle           phgenome_post
Applications            Movies              blabla              phpopg
Desktop             Music               document.txt            popgen_class
Documents           Pictures            fgfh_post           power
Downloads           Public              jobs                setac_private_key
GIGAIII_bioinformatics_workshop angus_private_key       learn_python            training
campus-030-034:~ eoziolor$ 

5.6.3.1. Choose a text editor

Do not open vim unless you’re ready

Let’s start by opening the document we created with the test editor nano

nano document.txt

Now you can type anything that you want within that document.

hello
everyone
what
is
going
on

I will close the document by pressing Cmd+x and follow the prompt to save the changes.

Now we have a couple of option in which we can look at this document. We can use the command less, often employed for bigger documents.

Advantages of less:

  • only opens a chunk of the document to fill a page
  • you can scroll up and down the document

Disadvantages of less:

  • can’t do much more than that in terms of interacting with the text

The other option we have is cat. Advantages of cat:

  • passes the text through many other programs
  • can process zipped text with its sister program zcat
  • just cool AF

Disadvantages of cat:

  • your terminal will go crazy if you try to open a large document - type Cmd+c to stop it if it goes unwieldy
  • you don’t get GUI-like interaction with the text in your document

Let’s start by using cat here:

cat document.txt 

hello
everyone
what
is
going
on

5.7. Piping

Piping is one of the most useful thing in bash script. The unix shell was made to do something to a file and be able to unitize these commands to pass a certain pipeline.

The pipe basically means that I will do something to a file and the output of that I will pass to another program…to do whatever the other program is doing with it. The pipe symbol is | and can work as following:

cat document.txt | head -n 3
hello
everyone
what

In this case what I’m doing is printing our file and piping the output into the head commnad, which allows me to print only the first 3 lines.

5.8. Grep, tr, wc, mv and many others

I can use an abundance of commands to now manipulate this text. Let’s look at a slightly more complicated document and see what we can do.

Let’s quickly download a small fasta file containing the Fundulus heteroclitus transcriptome and play with it:

curl -O ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/826/765/GCF_000826765.1_Fundulus_heteroclitus-3.0.2/GCF_000826765.1_Fundulus_heteroclitus-3.0.2_rna.fna.gz

If you want to rename a file you can use mv (move)

mv GCF_000826765.1_Fundulus_heteroclitus-3.0.2_rna.fna.gz fhet.tr.fna.gz

Now let’s look at the top of the file

zcat < fhet.tr.fna.gz | head
>NM_001309911.1 Fundulus heteroclitus low choriolytic enzyme-like (LOC105918466), mRNA
GGAAGGAAAAAATGGATCTCCAAGCACGAGCCTTGCTTCTGCTCCTGCTGCTTTCAGCCGTCTGTAATGCTTACCCCACA
GATAATTACAAAGCAGATGACGAAAACTCAGAGAAGGAGGACATCACAACCACTATCCTCAGAATGAACAATGGATCTGC
CGATATGCTGTTTGAAGGAGACGTTTTTGTTCCAAGATCCCGGACTGCCAAGAAGTGCCTTGATCCACGTTACAGCTGTT
TCTGGCCAAAGTCTTCAAATGGGAATGTGGAAATCCCTTTTGTTTTAAGTGACGAATATGATCACAACGAGAAGAATCAG
ATTCTCAAAGCCATGAAGGGCTTTGAGGGTAGAACCTGCATCCGCTTTGTTCGTCATAGAGGAGAGAGGGCGTACCTGAG
CATTGAGTCCAAATTTGGCTGTTTCTCTTTGATGGGTCGTTCTGGAGAAAGGCAGCTTGTGTCTCTGCAGAGACCCGGTT
GTTTAAATAATGGCATCATCCAGCATGAGCTGCTCCACGCTATGGGTTTCTACCACGAACACACTCGCAGCGACCGTGAC
AAATATGTCAAAATCAACTGGGATAACATACAAGAATATTATTATAAAAACTTCAAAAAAATGGACACAGACAATCTCAC
CCCATATGACTACTCCTCTGTGATGCAATATGGAAAAACTGCCTTTGGAAAGAACAGGGCAGAATCCATCACTCCTATCC

5.8.1. Grep

Here’s your normal fasta output. Now let’s try to get some quick stats out of this.

zcat < fhet.tr.fna.gz | grep -c "^>"
41170

What I told the terminal is to grep, which captures a certain pattern in the document, to count (-c) how often this pattern occurs.

In this case I have counted the number of new lines that begin with >, which is every other line in a fasta document. What that tells me is that there are 41170 transcripts represented in this file.

5.8.2. Tr

Notice that every > line starts with NM_. What if we want to get rid of all of these underscores and replace them with a dot? We can do this with tr.

zcat < fhet.tr.fna.gz | tr "_" "." | head
>NM.001309911.1 Fundulus heteroclitus low choriolytic enzyme-like (LOC105918466), mRNA
GGAAGGAAAAAATGGATCTCCAAGCACGAGCCTTGCTTCTGCTCCTGCTGCTTTCAGCCGTCTGTAATGCTTACCCCACA
GATAATTACAAAGCAGATGACGAAAACTCAGAGAAGGAGGACATCACAACCACTATCCTCAGAATGAACAATGGATCTGC
CGATATGCTGTTTGAAGGAGACGTTTTTGTTCCAAGATCCCGGACTGCCAAGAAGTGCCTTGATCCACGTTACAGCTGTT
TCTGGCCAAAGTCTTCAAATGGGAATGTGGAAATCCCTTTTGTTTTAAGTGACGAATATGATCACAACGAGAAGAATCAG
ATTCTCAAAGCCATGAAGGGCTTTGAGGGTAGAACCTGCATCCGCTTTGTTCGTCATAGAGGAGAGAGGGCGTACCTGAG
CATTGAGTCCAAATTTGGCTGTTTCTCTTTGATGGGTCGTTCTGGAGAAAGGCAGCTTGTGTCTCTGCAGAGACCCGGTT
GTTTAAATAATGGCATCATCCAGCATGAGCTGCTCCACGCTATGGGTTTCTACCACGAACACACTCGCAGCGACCGTGAC
AAATATGTCAAAATCAACTGGGATAACATACAAGAATATTATTATAAAAACTTCAAAAAAATGGACACAGACAATCTCAC
CCCATATGACTACTCCTCTGTGATGCAATATGGAAAAACTGCCTTTGGAAAGAACAGGGCAGAATCCATCACTCCTATCC

This becomes very useful for example if you have a comma separated value document and want to turn it into a tab separated value document. In that case you can use the command:

tr "," "\t"

and now you have a tab separated value document.

5.8.3. wc

Word count is a useful feature and we can use it in variety of ways.

zcat < fhet.tr.fna.gz | grep -v "^>" | wc -m
 131885065

What I’ve done above is I’ve used grep to remove any line that begins with ^> thus effectively only leaving the lines that have genetic code in them and not the names of each entry. Then I’ve told wc to give me a count of all the characters (-m) present in those lines, effectively giving me the size of the transcriptome (~131Mb).

I suggest using man or - -help to explore these commands further!