import sys
sys.path.append('../../../styles/')
import styles
styles.custom()
This is the second lesson of our course in "Engineering Computations." For the first lesson, Interacting with Python, you could use IPython, the interactive Python shell, or Jupyter.
From now on, we will be using Jupyter notebooks. This very lesson is written in a Jupyter notebook. You will love it.
Jupyter is an open-source project that develops tools for interactive and exploratory computing. These tools allow you to write, run, and share code in different languages: Python, Julia, and R, among others. You work right on your browser, which becomes the user interface through which Jupyter gives you the ability to manage, view and edit files, and a document format: the notebook.
A Jupyter notebook can contain: input and output of code, formatted text, images, videos, pretty math equations, and much more. The computer code is executable, which means that you can run the bits of code, right in the document, and get the output of that code displayed for you. This interactive way of computing, mixed with the multi-media narrative, allows you to tell a story (even to yourself) with extra powers!
Project Jupyter includes several pieces that you will want to know about. Let's do a quick tour.
Jupyter Notebook is a web application for creating and working with interactive documents mixing code, output (including visualizations), and explanatory text. It has been widely adopted by the scientific and data science communities, and integrates with various other tools and platforms.
The latest version of the notebook format (as of 2023) is Notebook 7, with many exciting new features over the previous version (v6, a.k.a. "classic"), including: debugger, table of contents, real-time collaboration, accessibility improvements, and more.
JupyterLab is a full-featured development environment for Jupyter notebooks and associated files (like data for analysis). In one browser tab, it shows several tiled areas for working with documents, browsing files, launching apps in sub-tabs, and selecting commands from various menus. It is modular and customizable, allowing you to arrange multiple notebooks, terminals, editors, consoles, and other components in a single workspace. You can also drag and drop cells between notebooks, or copy and paste output from one notebook to another.
JupyterLab works with the Notebook 7 format by default. You can switch between Notebook 7 and Notebook 6 modes using the menu or toolbar options in JupyterLab.
For this course, we recommend working in the more focused and simple Notebook interface. More advanced users will likely prefer the Lab interface.
JupyterHub is a web application for institutions to host and manage multiple Jupyter notebook servers for different users. It is useful for providing a pre-configured data science environment to a group of students, researchers, or data scientists, without requiring them to install or maintain any software on their own machines. You access JupyterHub from the browser with your institutional login.
At the George Washington University, we provide a JupyterHub for teaching purposes at: go.gwu.edu/jupyter.
Jupyter Desktop is an application that lets you run JupyterLab without using a web browser or a command line. It is a convenient and easy way to get started with Jupyter notebooks on your own computer, without having to install or configure anything. You can download Jupyter Desktop from GitHub and then launch it by clicking its icon in the customary way. Like all the Jupyter tools, it's free!
Nbviewer is a free web service that allows you to share static versions of hosted notebook files, as if they were a web page. If a notebook file is publicly available on the web, you can view it by entering its URL in the nbviewer web page, and hitting the Go! button. The notebook will be rendered as a static page: visitors can read everything, but they cannot interact with the code.
Some things about working with Jupyter could be counter-intuitive to you at first. For example, you may have to interact with the command line, and documents have two types of content—code and markdown—that handle a bit differently. The fact that your browser is an interface to a compute engine (called "kernel") leads to some extra housekeeping (like shutting down the kernel). But you'll get used to it pretty quick!
The standard way to start JupyterLab is to type the following in the command-line interface:
jupyter lab
Hit enter and voilà!! After a little set up time, your default browser will open the JupyterLab app in a new tab. It should look like in the screenshot below.
Don't close the terminal window where you launched JupyterLab (while you're still working on Jupyter). If you need to do other tasks on the command line, open a new terminal window.
To start a new Jupyter notebook in JupyterLab, click on the tile that says Python 3
under the "Notebook" heading. You can see it in the middle of the screenshot above.
A new tab will appear within JupyterLab and you will see an empty notebook, with a single input line, waiting for you to enter some code. See the next screenshot.
The notebook opens with a single empty code cell. Try to write some Python code there and execute it by hitting [shift] + [enter]
.
You can switch to the Notebook interface by clicking the "Open in" menu option, as indicated with a red arrow and oval on the figure.
The Jupyter notebook uses cells: blocks that divide chunks of text and code. Any text content is entered in a Markdown cell: it contains text that you can format using simple markers to get headings, bold, italic, bullet points, hyperlinks, and more.
Markdown is easy to learn, check out the syntax in the "Daring Fireball" webpage (by John Gruber). A few tips:
# Title
## Heading
*italic*
or _italic_
**bolded**
[hyperlinked text](url)
Computable content is entered in code cells. We will be using the IPython kernel ("kernel" is the name used for the computing engine), but you should know that Jupyter can be used with many different computing languages. It's amazing.
A code cell will show you an input mark, like this:
In [ ]:
Once you add some code and execute it, Jupyter will add a number ID to the input cell, and produce an output marked like this:
Out [1]:
Markdown was co-created by the legendary but tragic Aaron Swartz. The biographical documentary about him is called "The Internet's Own Boy," and you can view it in YouTube or Netflix. Recommended!
Look at the icons on the top of the Jupyter notebook (see the screenshots above). The first icon on the left (an old-fashioned floppy disk) is for saving your notebook. You can add a new cell with the big + button. Then you have the cut, copy, and paste buttons. Then you have a button to "run" a code cell (execute the code), the square icon means "stop" and the swirly arrow is to "restart" your notebook's kernel (if the computation is stuck, for example). Next to that, you have the cell-type selector: Code, Markdown, Raw.
You can test-drive a code cell by writing some arithmetic operations. Like we saw in our first lesson, the Python operators are:
+ - * / ** % //
There's addition, subtraction, multiplication and division. The last three operators are exponent (raise to the power of), modulo (divide and return remainder) and floor division.
Typing [shift] + [enter]
will execute the cell and give you the output in a new line, labeled Out[1]
(the numbering increases each time you execute a cell).
Add a cell with the plus button, enter some operations, and [shift] + [enter]
to execute.
Everything we did using IPython we can do in code cells within a Jupyter notebook. Try out some of the things we learned in lesson 1:
print("Hello World!")
Hello World!
x = 2**8
x < 64
False
Once you click on a notebook cell to select it, you may interact with it in two ways, which are called modes. Later on, when you are reviewing this material again, read more about this in Reference 1.
Edit mode:
We enter edit mode by pressing Enter
or double-clicking on the cell.
We know we are in this mode when we see a cell border and a prompt in the cell area.
When we are in edit mode, we can type into the cell, like a normal text editor.
Command mode:
We enter in command mode by pressing Esc
or clicking outside the cell area.
We know we are in this mode when we see a grey cell background.
In this mode, certain keys are mapped to shortcuts to help with common actions.
You can find a list of the shortcuts by selecting Help->Show Keyboard Shortcuts
from the Jupyter menu bar. You may want to leave all this for later, and come back to it, but it becomes more helpful the more you use Jupyter.
Closing the tab where you've been working on a notebook does not immediately "shut down" the compute kernel. So you sometimes need to do a little housekeeping.
After closing a notebook, you will see in the file list that your notebook file has a green bullet next to it. You can right-click the file name, and Shut Down kernel. You don't need to do this all the time, but if you have a lot of notebooks running, they will use resources in your machine.
Similarly, Jupyter is still running even after you close the tab that has it open. To exit the Jupyter app, you should go to the terminal that you used to open Jupyter, and type [Ctrl] + [c]
to exit.
Let's keep playing around with strings, but now coding in a Jupyter notebook (instead of IPython). We recommend that you open a clean new notebook to follow along the examples in this lesson, typing the commands that you see. (If you copy and paste, you will save time, but you will learn little. Type it all out!)
str_1 = 'hello'
str_2 = 'world'
Remember that we can concatenate strings ("add"), for example:
new_string = str_1 + str_2
print(new_string)
helloworld
What if we want to add a space that separates hello
from world
? We directly add the string ' '
in the middle of the two variables. A space is a character!
my_string = str_1 + ' ' + str_2
print(my_string)
hello world
Create a new string variable that adds three exclamation marks to the end of my_string
.
We can access each separate character in a string (or a continuous segment of it) using indices: integers denoting the position of the character in the string. Indices go in square brackets, touching the string variable name on the right. For example, to access the 1st element of new_string
, we would enter new_string[0]
. Yes! in Python we start counting from 0.
my_string[0]
'h'
#If we want the 3rd element we do:
my_string[2]
'l'
You might have noticed that in the cell above we have a line before the code that starts with the #
sign. That line seems to be ignored by Python: do you know why?
It is a comment: whenever you want to comment your Python code, you put a #
in front of the comment. For example:
my_string[1] #this is how we access the second element of a string
'e'
How do we know the index of the last element in the string?
Python has a built-in function called len()
that gives the information about length of an object. Let's try it:
len(my_string)
11
Great! Now we know that my_string
is eleven characters long. What happens if we enter this number as an index?
my_string[11]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-10-19e2c11e7861> in <module>() ----> 1 my_string[11] IndexError: string index out of range
Oops. We have an error: why? We know that the length of my_string
is eleven. But the integer 11 doesn't work as an index. If you expected to get the last element, it's because you forgot that Python starts counting at zero. Don't worry: it takes some getting used to.
The error message says that the index is out of range: this is because the index of the last element will always be: len(string) - 1
. In our case, that number is 10. Let's try it out.
my_string[10]
'd'
Python also offers a clever way to grab the last element so we don't need to calculate the lenghth and substract one: it is using a negative 1 for the index. Like this:
my_string[-1]
'd'
What if we use a -2
as index?
my_string[-2]
'l'
That is the last l
in the string hello world
. Python is so clever, it can count backwards!
Sometimes, we want to grab more than one single element: we may want a section of the string. We do it using slicing notation in the square brackets. For example, we can use [start:end]
, where start
is the index to begin the slice, and end
is the (non-inclusive) index to finish the slice. For example, to grab the word hello
from our string, we do:
my_string[0:5]
'hello'
You can skip the start
index, if you want to slice from the beginning of the string, and you can skip the end
of a slice, indicating you want to go all the way to the end of the string. For example, if we want to grab the word 'world'
from my_string
, we could do the following:
my_string[6:]
'world'
A helpful way to visualize slices is to imagine that the indices point to the spaces between characters in the string. That way, when you write my_string[i]
, you would be referring to the "character to the right of i
" (Reference 2).
Check out the diagram below. We start counting at zero; the letter 'g'
is to the right of index 2. So if we want to grab the sub-string 'gin'
from 'engineer'
, we need [start:end]=[2:5]
.
Try it yourself!
# Define your string
eng_string = 'engineer'
# Grab 'gin'slice
eng_string[2:5]
'gin'
'banana'
and print out the first and last 'a'
.'ana'
and print them out.The following lines contain the solutions; to reveal the answer, select the lines with the mouse:
Solution Exercise 1:
b = 'banana' print(b[1]) print(b[-1])
Solution Exercise 2:
print(b[1:4]) print(b[3:])
Python has many useful built-in functions for strings. You'll learn a few of them in this section. A technical detail: in Python, some functions are associated with a particular class of objects (e.g., strings). The word method is used in this case, and we have a new way to call them: the dot operator. It is a bit counter-intuitive in that the name of the method comes after the dot, while the name of the particular object it acts on comes first. Like this: mystring.method()
.
If you are curious about the many available methods for strings, go to the section "Built-in String Methods" in this tutorial.
Let's use a quote by Albert Einstein as a string and apply some useful string methods.
AE_quote = "Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid."
The count()
method gives the number of ocurrences of a substring in a range. The arguments for the range are optional.
Syntax:
str.count(substring, start, end)
Here, start
and end
are integers that indicate the indices where to start and end the count. For example, if we want to know how many letters 'e'
we have in the whole string, we can do:
AE_quote.count('e')
10
If we want to know how many of those 'e'
charachters are in the range [0:20]
, we do:
AE_quote.count('e', 0, 20)
2
We can look for more complex strings, for example:
AE_quote.count('Everybody')
1
find()
& index()
¶The find() method tells us if a string 'substr'
occurs in the string we are applying the method on. The arguments for the range are optional.
Syntax:
str.find(substr, start, end)
Where start
and end
are indices indicating where to start and end the slice to apply the find()
method on.
If the string 'substr'
is in the original string, the find()
method will return the index where the substring starts, otherwise it will return -1
.
For example, let's find the word "fish" in the Albert Einstein quote.
AE_quote.find('fish')
42
If we know the length of our sub-string, we can now apply slice notation to grab the word "fish".
len('fish')
4
AE_quote[42: 42 + len('fish')]
'fish'
Let's see what happens when we try to look for a string that is not in the quote.
AE_quote.find('albert')
-1
It returns -1
… but careful, that doesn't mean that the position is at the end of the original string! If we read the documentation, we confirm that a returned value of -1
indicates that the sub-string we are looking for is not in the string we are searching in.
A similar method is index()
: it works like the find()
method, but throws an error if the string we are searching for is not found.
Syntax:
str.index(substr, start, end)
AE_quote.index('fish')
42
AE_quote.index('albert')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-26-19ab4543c577> in <module>() ----> 1 AE_quote.index('albert') ValueError: substring not found
In the example above, we used the len()
function to calculate the length of the string 'fish'
, and we used the result to calculate the ending index. However, if the string is too long, having a line that calculates the length might be inconvenient or may make your code look messy. To avoid this, we can use the find()
or index()
methods to calculate the end position. In the 'fish'
example, we could look for the index of the word 'by'
(the word that follows 'fish'
) and subtract 1 from that index to get the index that corresponds to the space right after 'fish'
. There are many ways to slice strings, only limited by your imagination!
Remember that the ending index is not inclusive, which is why we want the index of the space that follows the string 'fish'
.
idx_start = AE_quote.index('fish')
idx_end = AE_quote.index('by') - 1 # -1 to get the index of the space after 'fish'
AE_quote[idx_start:idx_end]
'fish'
count()
method to count how many letters 'a'
are in AE_quote
?'a'
are in AE_quote
?index()
method to find the position of the words 'genius'
, 'judge'
and 'tree'
in AE_quote
.AE_quote
.strip()
¶A few more string methods are useful when you are working with texts and you need to clean, separate or categorize parts of the text.
Let's work with a different string, a quote by Eleanor Roosevelt:
ER_quote = " Great minds discuss ideas; average minds discuss events; small minds discuss people. "
Notice that the string we defined above contains extra white spaces at the beginning and at the end. In this case, we did it on purpose, but bothersome extra spaces are often present when reading text from a file (perhaps due to paragraph indentation).
Strings have a method that allows us to get rid of those extra white spaces.
The strip()
method returns a copy of the string in which all characters given as argument are stripped from the beginning and the end of the string.
Syntax:
str.strip([chars])
The default argument is the space character. For example, if we want to remove the white spaces in the ER_quote
, and save the result back in ER_quote
, we can do:
ER_quote = ER_quote.strip()
ER_quote
'Great minds discuss ideas; average minds discuss events; small minds discuss people.'
Let's supose you want to strip the period at the end; you could do the following:
ER_quote = ER_quote.strip('.')
But if we don't want to keep the changes in our string variable, we don't overwrite the variable as we did above. Let's just see how it looks:
ER_quote.strip('.')
'Great minds discuss ideas; average minds discuss events; small minds discuss people'
Check the string variable to confirm that it didn't change (it still has the period at the end):
ER_quote
'Great minds discuss ideas; average minds discuss events; small minds discuss people.'
startswith()
¶Another useful method is startswith()
, to find out if a string starts with a certain character.
Later on in this lesson we'll see a more interesting example; but for now, let's just "check" if our string starts with the word 'great'.
ER_quote.startswith('great')
False
The output is False
because the word is not capitalized! Upper-case and lower-case letters are distinct characters.
ER_quote.startswith('Great')
True
It's important to mention that we don't need to match the character until we hit the white space.
ER_quote.startswith('Gre')
True
split()
¶The last string method we'll mention is split()
: it returns a list of all the words in a string. We can also define a separator and split our string according to that separator, and optionally we can limit the number of splits to num
.
Syntax:
str.split(separator, num)
print(AE_quote.split())
['Everybody', 'is', 'a', 'genius.', 'But', 'if', 'you', 'judge', 'a', 'fish', 'by', 'its', 'ability', 'to', 'climb', 'a', 'tree,', 'it', 'will', 'live', 'its', 'whole', 'life', 'believing', 'that', 'it', 'is', 'stupid.']
print(ER_quote.split())
['Great', 'minds', 'discuss', 'ideas;', 'average', 'minds', 'discuss', 'events;', 'small', 'minds', 'discuss', 'people.']
Let's split the ER_quote
by a different character, a semicolon:
print(ER_quote.split(';'))
['Great minds discuss ideas', ' average minds discuss events', ' small minds discuss people.']
Do you notice something new in the output of the print()
calls above?
What are those [ ]
?
The square brackets above indicate a Python list. A list is a built-in data type consisting of a sequence of values, e.g., numbers, or strings. Lists work in many ways similarly to strings: their elements are numbered from zero, the number of elements is given by the function len()
, they can be manipulated with slicing notation, and so on.
The easiest way to create a list is to enclose a comma-separated sequence of values in square brackets:
# A list of integers
[1, 4, 7, 9]
[1, 4, 7, 9]
# A list of strings
['apple', 'banana', 'orange']
['apple', 'banana', 'orange']
# A list with different element types
[2, 'apple', 4.5, [5, 10]]
[2, 'apple', 4.5, [5, 10]]
In the last list example, the last element of the list is actually another list. Yes! we can totally do that.
We can also assign lists to variable names, for example:
integers = [1, 2, 3, 4, 5]
fruits = ['apple', 'banana', 'orange']
print(integers)
[1, 2, 3, 4, 5]
print(fruits)
['apple', 'banana', 'orange']
new_list = [integers, fruits]
print(new_list)
[[1, 2, 3, 4, 5], ['apple', 'banana', 'orange']]
Notice that this new_list
has only 2 elements. We can check that with the len()
function:
len(new_list)
2
Each element of new_list
is, of course, another list.
As with strings, we access list elements with indices and slicing notation. The first element of new_list
is the list of integers from 1 to 5, while the second element is the list of three fruit names.
new_list[0]
[1, 2, 3, 4, 5]
new_list[1]
['apple', 'banana', 'orange']
# Accessing the first two elements of the list fruits
fruits[0:2]
['apple', 'banana']
integers
list, grab the slice [2, 3, 4]
and then [4, 5]
.We can add elements to a list using the append() method: it appends the object we pass into the existing list. For example, to add the element 6 to our integers
list, we can do:
integers.append(6)
Let's check that the integer
list now has a 6 at the end:
print(integers)
[1, 2, 3, 4, 5, 6]
Checking for list membership in Python looks pretty close to plain English!
Syntax
To check if an element is in a list:
element in list
To check if an element is not in a list:
element not in list
'strawberry' in fruits
False
'strawberry' not in fruits
True
fruits
list.'mango'
is in your new fruits
list.alist = [1, 2, 3, '4', [5, 'six'], [7]]
run the following in separate cells and discuss the output with your classmates:4 in alist
5 in alist
7 in alist
[7] in alist
We can not only add elements to a list, we can also modify a specific element. Let's re-use the list from the exercise above, and replace some elements.
alist = [1, 2, 3, '4', [5, 'six'], [7]]
We can find the position of a certain element with the index()
method, just like with strings. For example, if we want to know where the element '4'
is, we can do:
alist.index('4')
3
alist[3]
'4'
Let's replace it with the integer value 4
:
alist[3] = 4
alist
[1, 2, 3, 4, [5, 'six'], [7]]
4 in alist
True
Replace the last element of alist
with something different.
Being able to modify elements in a list is a "property" of Python lists; other Python objects we'll see later in the course also behave like this, but not all Python objects do. For example, you cannot modify elements in a a string. If we try, Python will complain.
Fine! Let's try it:
string = 'This is a string.'
Suppose we want to replace the period ('.') by an exaclamation mark ('!'). Can we just modify this string element?
string[-1]
'.'
string[-1] = '!'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-64-dbf68e37fb66> in <module>() ----> 1 string[-1] = '!' TypeError: 'str' object does not support item assignment
Told you! Python is confirming that we cannot change the elements of a string by item assignment.
You have learned many things about strings and lists in this lesson, and you are probably eager to see how to apply it all to a realistic situation. We created a full example in a separate notebook to show you the power of Python with text data.
But before jumping in, we should introduce you to the powerful ideas of iteration and conditionals in Python.
for
statements¶The idea of iteration (in plain English) is to repeat a process several times. If you have any programming experience with another language (like C or Java, say), you may have an idea of how to create iteration with for
statements. But these are a little different in Python, as you can read in the documentation.
A Python for
statement iterates over the items of a sequence, naturally. Say you have a list called fruits
containing a sequence of strings with fruit names; you can write a statement like
for fruit in fruits:
to do something with each item in the list.
Here, for the first time, we will encounter a distinctive feature of the Python language: grouping by indentation. To delimit what Python should do with each fruit
in the list of fruits
, we place the next statement(s) indented from the left.
How much to indent? This is a style question, and everyone has a preference: two spaces, four spaces, one tab… they are all valid: but pick one and be consistent!
Let's use four spaces:
fruits = ['apple', 'banana', 'orange', 'cherry', 'mandarin']
for fruit in fruits:
print("Eat your", fruit)
Eat your apple Eat your banana Eat your orange Eat your cherry Eat your mandarin
for
statement ends with a colon, :
fruit
is implicitly defined in the for
statementfruit
takes the (string) value of each element of the list fruits
, in orderprint()
statement is executed for each value of fruit
fruits
, it stops— What is the value of the variable fruit
after executing the for
statement above? Discuss with your neighbor. (Confirm your guess in a code cell.)
A very useful function to use with for
statements is enumerate()
: it adds a counter that you can use as an index while your iteration runs. To use it, you implicitly define two variables in the for
statement: the counter, and the value of the sequence being iterated on.
Study the following block of code:
names = ['sam', 'zoe', 'naty', 'gil', 'tom']
for i, name in enumerate(names):
names[i] = name.capitalize()
print(names)
['Sam', 'Zoe', 'Naty', 'Gil', 'Tom']
— What is the value of the variable name
after executing the for
statement above? Discuss with your neighbor. (Confirm your guess in a code cell.)
Say we have a list of lists (a.k.a., a nested list), as follows:
fullnames = [['sam','jones'], ['zoe','smith'],['joe','cheek'],['tom','perez'] ]
Write some code that creates two simple lists: one with the first names, another with the last names from the nested list above, but capitalized.
To start, you need to create two empty lists using the square brackets with nothing inside. We've done that for you below. Hint: Use the append()
list method!
fullnames = [ ['sam','jones'], ['zoe','smith'],['joe','cheek'],['tom','perez'] ]
firstnames = []
lastnames = []
# Write your code here
if
statements¶Sometimes we need the ability to check for conditions, and change the behavior of our program depending on the condition. We accomplish it with an if
statement, which can take one of three forms.
(1) If statement on its own:
a = 8
b = 3
if a > b:
print('a is bigger than b')
a is bigger than b
(2) If-else statement:
# We pick a number, but you can change it
x = 1547
if x % 17 == 0:
print('Your number is a multiple of 17.')
else:
print('Your number is not a multiple of 17.')
Your number is a multiple of 17.
Note: The %
represents a modulo operation: it gives the remainder from division of the first argument by the second
Tip: You can uncomment this following cell, and learn a good trick to ask the user to insert a number. You can use this instead of assigning a specific value to x
above.
#x = float(input('Insert your number: '))
(3) If-elif-else statement:
a = 3
b = 5
if a > b:
print('a is bigger than b')
elif a < b:
print('a is smaller than b')
else:
print('a is equal to b')
a is smaller than b
Note: We can have as many elif
lines as we want.
Using if
, elif
and else
statements write a code where you pick a 4-digit number, if it is divisible by 2 and 3 you print: 'Your number is not only divisible by 2 and 3 but also by 6'. If it is divisible by 2 you print: 'Your number is divisible by 2'. If it is divisible by 3 you print: 'Your number is divisible by 3'. Any other option, you print: 'Your number is not divisible by 2, 3 or 6'
for
statements.if
statements.