Working at the command line with Bash
Session Overview
The purpose of this session is to provide familiarity and comfort with the Unix shell for the purposes of working with the course material. It is not meant to be a comprehensive lesson. For more in-depth instruction of using Bash, the default Unix shell, please see the Software Carpentry lesson, “The Unix Shell”
You may see advanced commands in the workshop that are not covered here because of time. If you are curious about what they do, ask one of the instructors or helpers, use the man
command (covered in Section 1), or try an online resource such as Stack Overflow (or Google!).
Details of the individual session components are included below:
1. Getting started with the shell
2. Creating and editing text files
3. Running commands and managing output
4. Variables
5. Wildcards
6. Loops and scripts
1. Getting started with the shell
The job of a shell program is to provide a text-based environment for viewing files and directories, running programs and pipelines, and monitoring program status and output. The shell that we will be using in this workshop is called Bash.
After logging in to your instance for this workshop, you should see a Bash prompt, where you can input commands:
In the example above, you’ll see that two commands have been entered. The first line is empty and just shows the prompt, which is indicated by the text ending in $
. The command ls
was input in the second line, and the output of that command is shown directly below. The next line shows ls
being used with an option, -p
. Note how the output has been modified.
So what does ls
do?
ls
is an abbreviation for “list”, and its purpose is to list the contents of a directory. In most operating systems, files are organized by a hierarchical directory structure. Directories are often called folders and are represented by a file folder in graphical operating system interfaces. Directories can contain both files and subdirectories, the latter of which may also contain additional files and subdirectories, and so on. Considering the following example in macOS:
Here, we are looking at the contents of a directory/folder called “microbiome-workshop-2018” (abbreviated as “microbio…hom-2018” by the browser). This directory contains several files (e.g., “donate.md”) and several subdirectories (e.g., “images”); it is itself contained within a parent directory called “Dev”, which is contained in its own parent directory called “djme”, and so on.
When we ran ls
above, we showed the contents of a directory. But that was not terribly informative, because the output did not tell us whether the items listed were files or subdirectories. To get that information, we had to modify the ls
command. By typing -p
after ls
, we changed the way that ls
displays output. In this case, all of the contents were appended with a forward slash, /
, which is a special character in Bash that indicates an item is a directory. Thus the purpose of the -p
option is to append the /
only to directories, so we learned that anaconda2/
, local/
, MCA/
, and R/
are all subdirectories. But subdirectories of what?
Whenever you are working at the prompt in the shell, you are always working “inside” of a directory. But the directory that you are in is not always obvious, especially when you first start up the shell. To determine your current working directory, you can use the command pwd
, which is short for “print working directory”.
In this example, pwd
returned a path: /home/ubuntu
. This path tells us that we are working in a directory called ubuntu
, which itself is contained within a parent directory called home
. Note that home
is the first listed directory in the path–this means that the parent directory of home
is the root directory, or the very base of the directory structure.
We can change our working directory with the command cd
, which stands for “change directory”:
You can also change into the parent directory using cd ..
as follows:
Finally, you can change directly to any directory by providing its full path:
Challenge 1.1: Determine what
cd ../..
did in the above example.
Challenge 1.2: Note how the text to the left of$
has been changing. What do you think~
means?
In the ls
examples, we have been using an option, -p
, to indicate which items in the directory are subdirectories. There are actually many different options we can use to modify the behavior of ls
. For example, we can list directory contents in the “long” format using -l
.
Note that in the long format, there is one file listed per line, and each file has some associated information listed in columns. The first four columns won’t be covered here. The fifth column gives the file size in bytes. The sixth column gives the date the file was last modified. The last column lists the file name.
Most commands have options, and they almost always start with -
or --
. To look at available options for a command, and to find other useful information, use man
:
This will take you to that manual page for the command. To exit, type q
.
To create a new directory, use mkdir
, which stands for “make directory”:
cp
and mv
can be used to copy and move files, respectively. To test this, first create a ‘blank’ file using touch
:
cp
and mv
work similarly in that they take two strings of text, called arguments, the source file and the destination. For example:
Challenge 1.3: Create a new folder
~/test3
and move both files into it
Challenge 1.4: Try renamingfile2
to something else. Hint: think about whatmv
does!
Files can be removed with rm
:
Directories can be removed with rmdir
:
Challenge 1.5: What happens when you try to remove a directory with
rm
?
Challenge 1.6: What happens when you try tormdir
a directory that contains a file?
2. Creating and editing text files
The Bash shell gives us access to several useful Unix utilites for working with text and text files. We’ll start with a very simple command called echo
, which simply repeats text back that is given as an argument. For example:
Here, the echo
command has taken the input text and directed it to our screen as Standard Output, or stdout. We can redirect stdout to a file using the >
character. For example:
Challenge 2.1: Create a text file called
text_file1.txt
that contains the line “Roses are red”.
Challenge 2.2: Try viewing the content of your file with thecat
command:cat text_file1.txt
.
Now what if you want to edit the file you just created? For this, we will use a basic text editor called Nano. For details on how to use Nano, see the online documentation. One nice thing about nano is that it give you some command shortcuts right on the screen when you have it open. Note that the caret symbol (^
) indicates that you should hold the control key while pressing the associated letter key. For example, ^X
means that you should hold control and press the X
key.
Challenge 2.3: Add a new line to your document: “Violets are blue”.
Challenge 2.4: Try saving the document, closing it, and re-opening it.
Challenge 2.5: Create a second file calledtext_file2.txt
that contains the rest of our poem:There are trillions of bacteria
Living on you!
You used the cat
command above to view the contents of text_file1.txt.
cat
is short for concatenate, because it can operate on multiple files to concatenate the contents. For example:
By redirecting the stdout from cat
to a file, you can create a new text file that is a concatenation of the input text files:
3. Running commands and managing output
As mentioned above, Bash gives you access to dozens of small programs that are very useful for dealing with text files. Because these tools are at their most powerful when working with large text files, lets grab one using wget
(“World Wide Web get”).
Note that the -h
option was used with ls
in the above example. This option makes the file size information “human readable”. You can see that the file that was downloaded, 100-0.txt
, is about 5.6 megabytes. That’s a big text file! Using cat
on this file is not very useful to inspect its contents (give it a try and you’ll see).
One nice option for browsing very large text files is less
(usage example: $ less 100-0.txt
; to exit, type the letter q
). This displays one screen’s worth of the file contents and allows you to scroll through. However, this is still inefficient, depending on what you want to do with the text.
If you just want to check out the first few lines of text, you can try head
:
You can specify the number of lines you want to inspect by supplying an option (an integer):
Another very useful command for inspecting a text file is wc
, which stands for “word count”. This lists the lines, words, and characters in your text file:
Note that options can be used to return only the lines (-l
), words (-w
), or characters (-c
) in the file.
grep
is a particularly powerful command, because it allows you to filter lines of text using input strings. For example, if I want to return only lines from 100-0.txt
that contain the word “needle”, I can do this:
Challenge 3.1: Create a new file called
Romeo.txt
that contains only lines from100-0.txt
with the word “Romeo”
Challenge 3.2: How many lines, words, and characters are inRomeo.txt
?
grep
also has an option, -v
that will return only lines that don’t contain the input string. For example, if I want to only return lines from Romeo.txt
that do not contain the letter “t”, I could do this:
Thus far, we have been using only one command at a time. However, the true power of Unix-based operating systems comes from the ability to string multiple commands in succession. This is called a pipeline and is acheived by redirecting the output of one command to the input of another command through the use of the pipe character, |
.
Lets take a look at an example using grep
.
In this example, we ran the grep Romeo 100-0.txt
command, which searches the file 100-0.txt
for the text “Romeo”. Usually, this will send the output to your screen through stdout. However, the pipe |
redirects that output to a second grep
, which is searching for “love”. Notice that no file is specified for the second command. This is because it is instead operating on standard input, or stdin. So what the |
actually does is take the stdout of one command and send it to another command as stdin.
Another way to accomplish the above example is by starting with cat
:
Now we can start to combine the commands we have learned to accomplish some pretty interesting tasks. For example, you can combing grep
with wc
to count instances of a word. To understand how to do this, you first need to know that the -o
option of grep
will return only matching text:
So if you want to count the number of instances of the word orange in the complete works of Shakespeare (which is what the 100-0.txt
file actually is, in case you have not yet noticed), you can do the following:
Challenge 3.3: How many lines of
100-0.txt
contain “trouble”?
Challenge 3.4: By combining commands with pipes, come up a way to count the number of .txt files in a directory
4. Variables
Text can be assigned to variables and used later in Bash. For example:
In this example, we assigned the text "Hello"
to the variable greeting
. Notice that to use the variable later with echo
, we had to use the $
character. You can use variables in combination with other input too:
However, whitespace matters when referencing variables:
Notice in the first example, everything worked as expected. But not in the second case. This is because $greetingHow
was seen as one variable, which was empty, so echo
only printed are you?
. In these cases, you can use curly braces, {}
, to clarify which text belongs to the variable name:
Because commands in Bash are text, you can store commands in variables too:
In this example, the text “pwd” was stored in the variable $MyDirectory
, so when $MyDirectory
was used on the next line, Bash automatically substituted “pwd” at the command line, which led to the execution of the pwd
command. When we changed to the parent directory with cd ..
, the output of pwd
changed.
Sometimes, you might want to assign the output of a command to a variable, not the actual command itself. In this case, you can use backticks, `
:
In this example, the backticks resulted in pwd
being run, and the output, which was the text “/home/ubuntu/text_files”, was saved in the variable $homebase
. When $homebase
was used by itself of the next line, Bash gave an error, because Bash does not take paths as commands by themselves. By using echo
, we verified which text was being stored in $homebase
. This text did not change when we changed directories. Thus when $homebase
was used with cd
, we changed to the directory that we were working in when we assigned the output of pwd
to $homebase
in the first place.
5. Wildcards
Wildcards are special characters in Bash that can stand in for other characters. Two important wildcards are *
and ?
. The *
can stand in for any number of characters, whereas the ?
can stand in for any single character. For example:
In the example above, by using the ?
character, text_file?.txt
was expanded to text_file1.txt text_file2.txt
, so cat
returned the text of both of them.
In the above example, rm text*
expanded to rm text_file1.txt text_file2.txt
, so both of those files were removed. rm *.txt
removed all of the files ending in .txt
, which in this case was all of them.
To finish this section, lets now delete this empty directory:
6. Loops and scripts
We’ve explored how Bash commands can be combined through the use of pipes, but we can also script a series of commands together to perform a task. Lets create a simple Bash script in nano:
There is a lot going on in that script so lets walk through it. This script contains a single FOR loop. This is a special construct for performing an iterative task. This particular FOR loop does the following:
1) It assigns a value to the variable $NUM
from a list of values created by {1..10}
. Bash knows that {1..10}
means “create a list from 1 to 10”. The first time through the loop, $NUM
takes on the first value in the list, so the number 1. The next time through the loop, $NUM
takes on the value 2, and so on.
2) Using the current value for $NUM
, the code between do
and done
is executed.
3) If there are any values left in the {1..10}
list, the do
/done
block will evaluate again with the next value. If the list has been completely run through, the loop finishes.
We can see this in action, but to do so, we first need to indicate that count_ten.sh
is an executable script. To do that, we use the chmod
command. Don’t worry too much about the details of what is going on with the chmod
command itself–just note that change that it causes:
Notice that the first column changed due to the chmod 775
command. Specifically, an x
appeared in the 4th, 7th, and 10th positions. This means that the script is now eXecutable to the file owner, anyone in the file owners group, and any user in any group, respectively. Making a file executable will also often make the file name print in a different color when you inspect a directory with ls
.
Lets take a look at one final example script:
Final Challenge: Delete folder1-5 to clean up after the lesson!