Introduction to Python Programming

Interactive Jupyter Notebook

This notebook provides an introduction to Python programming fundamentals, including an overview of basic programing concepts, common data structures, and simple visualization. This notebook was created by Becky Vandewalle based off of prior work by Dandong Yin.

Introduction

Python is a commonly used scripting language that is popular due to its accessibility. While this notebook covers Python 2.7, many concepts are similar to Python 3.

General documentation: https://docs.python.org/2.7/
Python tutorial: https://docs.python.org/2.7/tutorial/index.html

Setup

Run this cell for the rest of the notebook to work!

In [43]:
# import required libraries

%matplotlib inline
import os
import json
import rasterio
import time

#execfile(os.path.join('/share/pyintro_resources/','highlight_feats.py'))

filename = os.path.join('pyintro_resources/','highlight_feats.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))

Python Fundamentals

This section will provide a brief overview of basic concepts and operations in Python.

Python as a Calculator

A simple, yet powerful attribute of Python is how you can use it to calculate basic numeric operations. This is useful for getting a first introduction to the language. You can type a sequence of numbers and operators into a cell to get a result like a calculator. Parentheses can be used to order operations.

See a list of useful operators here.

Try running these cells to see how it works:

In [44]:
3 + 4
Out[44]:
7
In [45]:
2 * 4
Out[45]:
8
In [46]:
2 ** 4
Out[46]:
16
In [47]:
10 / 3
Out[47]:
3.3333333333333335
In [48]:
10.0 / 3
Out[48]:
3.3333333333333335
In [49]:
10 % 3
Out[49]:
1
In [50]:
250 / (5 + 5) * (7 - 3)
Out[50]:
100.0

Pressing Return within a cell will create a new line in the cell code. When you run a cell, it will print the last value calculated unless you use Python's print statement to print earlier values.

In [51]:
2 + 3
4 + 6
Out[51]:
10

In the cell above, the value of 2 + 3 isn't shown because 4 + 6 is calculated and returned after.

In [52]:
print(2 + 3)
4 + 6
5
Out[52]:
10

Now both values are shown because print was explicitly called for 2 + 3.

Note that some operators are not automatically available in the core Python library. For example, did you see a square root operator above? You can access additional functions using additional Python libraries (more on libraries later).

In [53]:
import math
math.sqrt(16)
Out[53]:
4.0

The square root function is available with the Math library.

Comments

Using a pound sign (#) in a Python line will create a comment. Comments don't have to be at the start of a line, but note that any part of the line after the pound sign will be considered a comment (unless it is part of a string), so you can't place comments in the middle of a data type that is expecting a specific character (like a closing a parenthesis or bracket in a list).

In [54]:
# this is a comment

4 + 2 # this is another comment
Out[54]:
6
In [55]:
# this cell will fail to run!

mylist = [1, 2, # comment]
  File "<ipython-input-55-f1c12498ee7a>", line 3
    mylist = [1, 2, # comment]
                              ^
SyntaxError: unexpected EOF while parsing
In [ ]:
# but this works (more on lists below)

mylist = [1, 2, # comment
         ]
In [ ]:
# this pound sign does not start a comment
# it is within a string!

mystring = 'hi # comment?'
mystring

Creating variables

To create a simple variable, type the variable name, the equals (`=`) sign, and the variable value. For example:

In [ ]:
a = 1
b = 4

Here a is a variable, and its value is set to 1. You can print the variable name to show its value, or type the variable name in the last line of the cell to see the value.

In [ ]:
print(a)
b

Variable names must begin with an alphabetic character (a-z), digit (0-9), or underscore (_). Variable names are case sensitive!

In [ ]:
One = 1
one = 1.0
print(One) # these are different
print(one)
In [ ]:
# this will fail - not a valid variable name

*hi = 'hi'

Whitespace

Blank lines between code lines are ignored, so you can use blank lines to group things for ease of reading.

In [ ]:
a = 1
b = 4

c = 'cat'

White space within a line is often ignored, so you can condense, align, or spread out code for clarity.

In [ ]:
# these are all equivalent

a=3#comment
a = 3 # comment
a    =    3       #      comment

However, space is needed after or between key words and commands so Python can parse them properly.

In [ ]:
# Not a good Example for Python 3
# prints a

print(a)
In [ ]:
# fails - no command or variable called 'printa' exists

printa

White space in the front of a line is very important for python to work correctly. When typing code in a cell, make sure each regular line starts at the very beginning of the line with no leading space.

In [ ]:
# this works

a = 2
b = 3
In [ ]:
# this will fail

a = 2
 b = 3

Indentation, typically using Tab, represents a group of lines (called a code block) that are related to the last unindented line. Each indented line in a cell needs to match up with all the other lines that are indented the same amount of times and the same amount of space (again usually in increments of Tab) needs to occur before each indented line. Although you can indent code blocks with spaces instead of Tabs, it is often easier to use Tab (and you need to be consistent throughout a script or within a cell).

In [ ]:
# example of indented block

a = 2
if a:
    print('a exists')
In [ ]:
# you can have multiple indentation levels

a = 2
if a:
    print('a exists')
b = 3
if b:
    print('b exists')
if a:
    if b:
        print('a and b exist')
In [ ]:
# indent with spaces
# this works, but Jupyter notebook will highlight keywords in red because it expects Tab
a = 2
if a:
  print('a exists')
b = 3
if b:
  print ('b exists')
In [ ]:
# this works but is --NOT-- recommended - make sure your indents match!

a = 2
if a:
  print ('a exists')
b = 3
if b:
    print ('b exists')
In [ ]:
# this doesn't work
# indentation is not consistent within a code block

a = 2
if a:
  print ('a exists')
    print ('not sure if b exists')

Basic Object Types

Python has a variety of basic variable types, and we have already seen a few! See some further examples below. The type function can indicate an object or variable's type.

Basic numeric types:

In [56]:
1    # integer
1.0  # float
Out[56]:
1.0
In [57]:
print(type(1))
print(type(1.0))
<class 'int'>
<class 'float'>
In [58]:
# convert between types

print(float(1))
print(int(1.23)) # truncates
print(int(1.83)) # does not round
1.0
1
1

None type:

None is a special designation to indicate that the variable exists but has not been assigned to a particular value.

In [59]:
a = None
print(a)
None

Boolean types:

A special type is a Boolean variable. This designates a variable as True or False (note the case!).

In [60]:
a = True
b = False
In [61]:
# this fails because a variable 'true' hasn't been defined

a = true
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-61-0f1cd998411c> in <module>
      1 # this fails because a variable 'true' hasn't been defined
      2 
----> 3 a = true

NameError: name 'true' is not defined

You can check if a variable is True or False like this:

In [ ]:
a is True
In [ ]:
print(a is False)
print(b is True)
print(b is False)

There are special cases where other types of variables evaluate to True or False. While most variable values evaluate to True, variables set to None, equal to 0, 0.0 or equivalent, or empty are False. Note that evaluating to True or False is not the same as being assigned to True or False.

In [ ]:
# will evaluate code block if a evaluates to true

a = 3
if a:
    print('a')
In [ ]:
# here, b evaluates to false; nothing prints

b = 0
if b:
    print ('b')
In [ ]:
# a evaluates to True but does not equal true

a = 3
a is True

Strings:

A string is a sequence of alpha-numeric characters:

In [ ]:
'hello'
'cloud9'
In [ ]:
type('hello')

Accents and Unicode characters are supported, but may cause issues with string functions if not handled carefully.

In [ ]:
cafe = 'café'
cafe
In [ ]:
print (cafe)

A 'u' in front of a string designates it as unicode. You can copy unicode characters to use to set a variable like this:

In [ ]:
hello = u'你好'
hello
In [ ]:
print (hello)

Or you can define unicode characters using an escape sequence. Here '\u4f60' refers to 你.

In [ ]:
hello2 = u'\u4f60\u597d'
hello2
In [ ]:
print (hello2)

Escaping characters:

Escape characters reference special characters that are part of a string but can't be directly typed into a string. For example, you can not include a single quote (') in a string unless it is escaped. To escape a special character, prefix it with the back slash (\).

See a list of escape characters here

In [ ]:
# a new line \n is a common escape character

new_line_str = 'hi\nhi2'
new_line_str
In [ ]:
# it prints with the new line

print (new_line_str)
In [ ]:
print ('don\'t', 'path\\to\\file')

'Smart Quotes':

Be careful when copying text that uses 'smart quotes', these are quotation marks and apostrophes that are curved. Python does't recognize these characters!

Use this!Not this! Use this!Not this!
" '
" '
In [ ]:
# this cell will fail

copy_text = Hello there

Other types of sequences include lists and tuples. Elements in a list can be changed, but elements in a tuple can not unless a new tuple is created.

In [ ]:
# lists are created using square brackets

mylist = [1, 2, 3]
mylist
In [ ]:
# you can add a value to a list after making it

mylist.append(4)
mylist
In [ ]:
# tuple are created using parentheses

mytuple = (1, 2, 3)
mytuple
In [ ]:
# you can't add a value to a tuple

mytuple.append(4)
In [ ]:
# this works because newtuple is a new tuple, but may not work as you would expect!

newtuple = (mytuple, 4)
newtuple

You can select a specific value of a list or tuple using square brackets around the item's index. These need to be directly at the end of the variable name. Python index values start from 0.

In [ ]:
# select by index

print (mylist[2])
print (mytuple[0])

: is a special value that will select all items.

In [ ]:
# select all

print (mylist[:])

Using a negative index value will count from the back. It starts with -1.

In [ ]:
print (mytuple[-1])

It is possible to stack indices to select an item in a multi-level list.

In [ ]:
# multi-level index

nested_list = [[1, 2], [3, 4]]
nested_list[0][1] # select first list, then second item

You can change and delete values from a list using the index.

In [ ]:
# change last list item

nested_list[-1] = [4, 5]
nested_list
In [ ]:
# delete list value

del nested_list[0][0]
nested_list

Dictionaries:

Dictionaries are a collection of unordered key-value pairs.

In [ ]:
# lists are created using curly braces

pet_list = {'alice':'cat', 'becky':'cat', 'chaoli': 'parrot', 'dan':'dog'}
pet_list
In [ ]:
print (pet_list)

Dictionaries have keys and values. This is similar to a physical dictionary - you look up a word to find its definition.

In [ ]:
# list all keys

pet_list.keys()
In [ ]:
# list all values

pet_list.values()

You can find which specifically value goes with which key by using the key as the index.

In [ ]:
pet_list['dan']

Like lists, you can change dictionary keys and values after the fact.

In [ ]:
# add a key/value pair

pet_list['ewan'] = 'bunny'
pet_list

It's good to check if a key/value pair exists before deleting a value.

In [ ]:
# delete a key/value pair

if 'alice' in pet_list.keys():
    del pet_list['alice']
pet_list

Dictionaries can be nested.

In [ ]:
pet_list_ext = {'alice': {'type':'cat', 'age':3}, 
            'becky': {'type':'cat', 'age':9}, 
            'chaoli': {'type':'parrot', 'age':23}, 
            'dan': {'type':'dog', 'age':7.5}}
pet_list

Use the double named index selection to retrieve values in nested dictionaries.

In [ ]:
pet_list_ext['chaoli']['type']

Boolean Operators and Comparisons

Boolean Operators are used to evaluate combinations of either Boolean variables or other variables through evaluation. The operators are `and`, `or`, and `not`.

Try to guess what will be returned for each combination below!

In [ ]:
True and True
In [ ]:
True and False
In [ ]:
False and False
In [ ]:
True or True
In [ ]:
True or False
In [ ]:
not True
In [ ]:
not False
In [ ]:
if (1 and 'hi'): # through evaluation
    print('OK')
In [ ]:
if (0 and 'hi'): # through evaluation
    print('OK')

Comparisons are used to evaluate relative values (ex. is x greater than y), equivalence, or identity. A few examples are shown below.

In [ ]:
1 > 2
In [ ]:
1 < 2
In [ ]:
1 >= 2
In [ ]:
1 <= 2

NOTE! Testing for equivalence needs two equal signs, not one!

In [ ]:
# are these equal?

1 == 1 
In [ ]:
# this fails

1 = 1
In [62]:
1 != 2 # is not equal to
Out[62]:
True

`is` and `is not` can also be used for comparisons.

In [63]:
1 is 2
Out[63]:
False
In [64]:
1 is not 2
Out[64]:
True

You can use `in` and `not in` to see if a value is part of a sequence.

In [65]:
1 in (1, 2, 3)
Out[65]:
True
In [66]:
1 not in (1, 2, 3)
Out[66]:
False

Importing Libraries

There are a few different ways to import a Python library or specific function from a library.

If you import an entire library, you need to preface a function in that library with the library name. For some commonly used libraries or ones with long names, it is common to give it a nickname when importing. If you import a specific function from a library, you can use that function without prefixing it with the library name.

In [67]:
import time                        # import entire library
import numpy as np                 # call numpy using np
from math import sqrt              # just import square root function from math library
from math import factorial as fac  # just import factorial function from math library, call it fac

Be careful with your nicknames because you could potentially conflict with an existing function.

In [68]:
# prints current time (seconds since January 1, 1970)

print(time.time())
1576880271.1121275
In [69]:
# call numpy function using nickname np for numpy

np.array([2,3,4])
Out[69]:
array([2, 3, 4])
In [70]:
# can call sqrt function without having 'math.' in front

sqrt(16) 
Out[70]:
4.0
In [71]:
# can call factorial function by nickname without having 'math.' in front

fac(5)
Out[71]:
120

Control Flow

Most of the time python programs run line by line, executing each statement in order from top to bottom. However, there are cases when certain lines should be skipped if some condition occurs, or a certain section of code should be run many times. Control flow tools are used to change the order or number of times lines or code sections are run.

In [72]:
# if a exists, print

a = 3
if a:
    print ('a =', a)
a = 3
In [73]:
# print elements in list

mylist = [1, 2, 3]
for i in mylist:
    print (i, end=" ")
1 2 3 

The range function returns a list of numbers from 0 to the specified number.

In [74]:
range(5)
Out[74]:
range(0, 5)
In [75]:
# print numbers in a certain range

for i in range(5):
    print (i, end=" ")
0 1 2 3 4 

Certain keywords can affect how the loop functions:

In [76]:
# stop if 7 is reached

for i in range(10):
    if i == 7:
        break
    print (i, end=" ")
0 1 2 3 4 5 6 
In [77]:
# prints '- no break' if loop completed without break

for i in range(10):
    if i == 12:
        break
    print (i, end=" ")
else:
    print ('- no break')
0 1 2 3 4 5 6 7 8 9 - no break
In [78]:
# skips even numbers, but continues through loop after

for i in range(10):
    if i % 2 == 0:
        continue
    print (i, end=" ")
1 3 5 7 9 

Sometimes it is useful to have a placeholder in a loop. Here the loop loops, but due to the pass keyword it does nothing.

In [79]:
# do nothing

for i in range(10):
    pass

While loops are useful to continue for an unspecified amount of time until a certain condition is met. If there is no condition specified or nothing changes this loop will keep looping!

In [80]:
# while loop

a = 0
while a < 10:
    print (a, end=" ")
    a += 1
0 1 2 3 4 5 6 7 8 9 

The try, except, and finally keywords are used to catch things that have failed. finally will always run, but except will only run if the specified error occurred.

In [81]:
try:
    1 / 0
except ZeroDivisionError:
    print("that didn't work")
finally:
    print('end')
that didn't work
end
In [82]:
try:
    1 / 1
except ZeroDivisionError:
    print ("that didn't work")
finally:
    print ('end')
end

You can also have a general except clause to catch any type of error.

In [83]:
try:
    1 / 0
except:
    print ("that didn't work")
finally:
    print ('end')
that didn't work
end

List Comprehension

List Comprehension is a quick way to run through a loop. The following two cells create the same resulting list.

In [84]:
mylist = []
for i in range(5):
    mylist.append(i * 2)
mylist
Out[84]:
[0, 2, 4, 6, 8]
In [85]:
mylist = [i*2 for i in range(5)]
mylist
Out[85]:
[0, 2, 4, 6, 8]

Custom Functions

It is useful to create custom functions when you want to reuse sections of code many times.

The def keyword is used to start a function definition. Arguments that the function expects to receive are listed between parentheses before the :.

In [86]:
# define a function with no arguments

def myfunct():
    print ('hello')
In [87]:
# define a function with one argument

def myfunct2(name):
    print ('hello,', name)
In [88]:
# call the functions

myfunct()
myfunct2('Iekika')
hello
hello, Iekika

If you forget the parentheses in the function call Python will tell you about the function rather than calling it.

In [89]:
myfunct
Out[89]:
<function __main__.myfunct()>

File Operations

You can open, read, write files using Python.

In [90]:
# open a file

myfile = open('test_file.txt')
myfile
Out[90]:
<_io.TextIOWrapper name='test_file.txt' mode='r' encoding='UTF-8'>
In [91]:
# read file lines

lines = myfile.readlines()
lines
Out[91]:
['Hello! - line 1\n',
 'second line\n',
 '\n',
 '4th line\n',
 'this is a test file\n']
In [92]:
# print each line

for line in lines:
    print(line)
Hello! - line 1

second line



4th line

this is a test file

In [93]:
# print a specific line

print(lines[3])
4th line

It is important to close the file when you are finished accessing it!

In [94]:
# close file

myfile.close()

A trick is to use the with statement to read a file instead. The file will be closed automatically.

In [95]:
# open with 'with' statement

with open('test_file.txt') as newfile:
    newlines = newfile.read()
    
newlines
Out[95]:
'Hello! - line 1\nsecond line\n\n4th line\nthis is a test file\n'
In [96]:
# get current time

nowtime = time.time()
nowtime
Out[96]:
1576880271.3161533
In [97]:
# write to a file

with open('write_me.txt', 'w') as wfile:

    wfile.write('Hi there! ' + str(nowtime))
In [98]:
# read written file

with open('write_me.txt') as rfile:
    rlines = rfile.read()
    
rlines
Out[98]:
'Hi there! 1576880271.3161533'

Geospatial Data Processing

This last section will briefly cover raster and vector data and show a few introductory ways to work with these data types.

Raster data

The idea of raster data is extended from digital photography, where a matrix is used to represent a continuous part of the world. A GeoTIFF extends the TIFF image format by including geospatial context of the corresponding image.

Generic image readers/libraries ignore the geospatial info and retrieve only the image content. Geospatially-aware software/libraries are needed to extract complete information from this image format.

RasterIO

RasterIO is a light-weight raster processing library that provides enough utility and flexibility for a good range of common needs. Refer to this example as a start.

In [99]:
# load raster data

chicago_tif = rasterio.open(os.path.join('pyintro_resources/data','Chicago.tif'))
In [100]:
# see type

type(chicago_tif)
Out[100]:
rasterio.io.DatasetReader
In [101]:
# find the shape of the array (rows vs columns)

chicago_tif.shape
Out[101]:
(929, 699)
In [102]:
# assign the first image band to a variable

band1 = chicago_tif.read(1)

Vector data

Vector data describe the world with explicit coordinates and attributes. A GeoJson is a straight-forward format derived from Json. It packs vector data in a way easy for both humans and machines to read/write.

In [103]:
# load chicago vector data

chicago = json.load(open(os.path.join('pyintro_resources/data','Chicago_Community.geojson')))
In [104]:
# Json is represented in Python as a dictionary

type(chicago)
Out[104]:
dict
In [105]:
# we can see the dictionary keys

chicago.keys()
Out[105]:
dict_keys(['type', 'features'])
In [106]:
# the value of 'type' is 'FeatureCollection': a collection of vector features

chicago['type']
Out[106]:
'FeatureCollection'
In [107]:
# 'features' contains a list of feature values

type(chicago['features'])
Out[107]:
list
In [108]:
# what are the keys for the first feature in the list

chicago['features'][0].keys()
Out[108]:
dict_keys(['type', 'properties', 'geometry'])
In [109]:
# what are the properties for the first feature in the list

chicago['features'][0]['properties']
Out[109]:
{'community': 'DOUGLAS',
 'area': '0',
 'shape_area': '46004621.1581',
 'perimeter': '0',
 'area_num_1': '35',
 'area_numbe': '35',
 'comarea_id': '0',
 'comarea': '0',
 'shape_len': '31027.0545098'}

Basic Image Visualization

Matplotlib is a powerful library commonly used to display vector data, but one that can handle raster data. Use the %matplotlib inline command to help display plots as cell output.

In [110]:
%matplotlib inline
import matplotlib.pyplot as plt
In [111]:
# plot the band with Matplotlib

fig = plt.figure(figsize=(12,10))
plt.imshow(band1, cmap='gray', extent=chicago_tif.bounds)
Out[111]:
<matplotlib.image.AxesImage at 0x7f4c5d5ab588>