This notebook provides an introduction to Python programming fundamentals, including an overview of basic programing concepts, common data structures, and simple visualization. This notebook was created by Becky Vandewalle based off of prior work by Dandong Yin.
Python is a commonly used scripting language that is popular due to its accessibility. While this notebook covers Python 2.7, many concepts are similar to Python 3.
General documentation: https://docs.python.org/2.7/
Python tutorial: https://docs.python.org/2.7/tutorial/index.html
# import required libraries
%matplotlib inline
import os
import json
import rasterio
import time
#execfile(os.path.join('/share/pyintro_resources/','highlight_feats.py'))
filename = os.path.join('pyintro_resources/','highlight_feats.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))
This section will provide a brief overview of basic concepts and operations in Python.
A simple, yet powerful attribute of Python is how you can use it to calculate basic numeric operations. This is useful for getting a first introduction to the language. You can type a sequence of numbers and operators into a cell to get a result like a calculator. Parentheses can be used to order operations.
See a list of useful operators here.
Try running these cells to see how it works:
3 + 4
2 * 4
2 ** 4
10 / 3
10.0 / 3
10 % 3
250 / (5 + 5) * (7 - 3)
Pressing Return within a cell will create a new line in the cell code. When you run a cell, it will print the last value calculated unless you use Python's print
statement to print earlier values.
2 + 3
4 + 6
In the cell above, the value of 2 + 3
isn't shown because 4 + 6
is calculated and returned after.
print(2 + 3)
4 + 6
Now both values are shown because print
was explicitly called for 2 + 3
.
Note that some operators are not automatically available in the core Python library. For example, did you see a square root operator above? You can access additional functions using additional Python libraries (more on libraries later).
import math
math.sqrt(16)
The square root function is available with the Math library.
Using a pound sign (#
) in a Python line will create a comment. Comments don't have to be at the start of a line, but note that any part of the line after the pound sign will be considered a comment (unless it is part of a string), so you can't place comments in the middle of a data type that is expecting a specific character (like a closing a parenthesis or bracket in a list).
# this is a comment
4 + 2 # this is another comment
# this cell will fail to run!
mylist = [1, 2, # comment]
# but this works (more on lists below)
mylist = [1, 2, # comment
]
# this pound sign does not start a comment
# it is within a string!
mystring = 'hi # comment?'
mystring
To create a simple variable, type the variable name, the equals (`=`) sign, and the variable value. For example:
a = 1
b = 4
Here a
is a variable, and its value is set to 1
. You can print the variable name to show its value, or type the variable name in the last line of the cell to see the value.
print(a)
b
Variable names must begin with an alphabetic character (a
-z
), digit (0
-9
), or underscore (_
). Variable names are case sensitive!
One = 1
one = 1.0
print(One) # these are different
print(one)
# this will fail - not a valid variable name
*hi = 'hi'
Blank lines between code lines are ignored, so you can use blank lines to group things for ease of reading.
a = 1
b = 4
c = 'cat'
White space within a line is often ignored, so you can condense, align, or spread out code for clarity.
# these are all equivalent
a=3#comment
a = 3 # comment
a = 3 # comment
However, space is needed after or between key words and commands so Python can parse them properly.
# Not a good Example for Python 3
# prints a
print(a)
# fails - no command or variable called 'printa' exists
printa
White space in the front of a line is very important for python to work correctly. When typing code in a cell, make sure each regular line starts at the very beginning of the line with no leading space.
# this works
a = 2
b = 3
# this will fail
a = 2
b = 3
Indentation, typically using Tab, represents a group of lines (called a code block) that are related to the last unindented line. Each indented line in a cell needs to match up with all the other lines that are indented the same amount of times and the same amount of space (again usually in increments of Tab) needs to occur before each indented line. Although you can indent code blocks with spaces instead of Tabs, it is often easier to use Tab (and you need to be consistent throughout a script or within a cell).
# example of indented block
a = 2
if a:
print('a exists')
# you can have multiple indentation levels
a = 2
if a:
print('a exists')
b = 3
if b:
print('b exists')
if a:
if b:
print('a and b exist')
# indent with spaces
# this works, but Jupyter notebook will highlight keywords in red because it expects Tab
a = 2
if a:
print('a exists')
b = 3
if b:
print ('b exists')
# this works but is --NOT-- recommended - make sure your indents match!
a = 2
if a:
print ('a exists')
b = 3
if b:
print ('b exists')
# this doesn't work
# indentation is not consistent within a code block
a = 2
if a:
print ('a exists')
print ('not sure if b exists')
Python has a variety of basic variable types, and we have already seen a few! See some further examples below. The type function can indicate an object or variable's type.
Basic numeric types:
1 # integer
1.0 # float
print(type(1))
print(type(1.0))
# convert between types
print(float(1))
print(int(1.23)) # truncates
print(int(1.83)) # does not round
None type:
None is a special designation to indicate that the variable exists but has not been assigned to a particular value.
a = None
print(a)
Boolean types:
A special type is a Boolean variable. This designates a variable as True
or False
(note the case!).
a = True
b = False
# this fails because a variable 'true' hasn't been defined
a = true
You can check if a variable is True
or False
like this:
a is True
print(a is False)
print(b is True)
print(b is False)
There are special cases where other types of variables evaluate to True
or False
. While most variable values evaluate to True
, variables set to None
, equal to 0
, 0.0
or equivalent, or empty are False
. Note that evaluating to True
or False
is not the same as being assigned to True
or False
.
# will evaluate code block if a evaluates to true
a = 3
if a:
print('a')
# here, b evaluates to false; nothing prints
b = 0
if b:
print ('b')
# a evaluates to True but does not equal true
a = 3
a is True
'hello'
'cloud9'
type('hello')
Accents and Unicode characters are supported, but may cause issues with string functions if not handled carefully.
cafe = 'café'
cafe
print (cafe)
A 'u
' in front of a string designates it as unicode. You can copy unicode characters to use to set a variable like this:
hello = u'你好'
hello
print (hello)
Or you can define unicode characters using an escape sequence. Here '\u4f60' refers to 你.
hello2 = u'\u4f60\u597d'
hello2
print (hello2)
Escaping characters:
Escape characters reference special characters that are part of a string but can't be directly typed into a string. For example, you can not include a single quote ('
) in a string unless it is escaped. To escape a special character, prefix it with the back slash (\
).
See a list of escape characters here
# a new line \n is a common escape character
new_line_str = 'hi\nhi2'
new_line_str
# it prints with the new line
print (new_line_str)
print ('don\'t', 'path\\to\\file')
'Smart Quotes':
Be careful when copying text that uses 'smart quotes', these are quotation marks and apostrophes that are curved. Python does't recognize these characters!
Use this! | Not this! | Use this! | Not this! | ||
" | “ | ' | ‘ | ||
" | ” | ' | ’ |
# this cell will fail
copy_text = “Hello there”
# lists are created using square brackets
mylist = [1, 2, 3]
mylist
# you can add a value to a list after making it
mylist.append(4)
mylist
# tuple are created using parentheses
mytuple = (1, 2, 3)
mytuple
# you can't add a value to a tuple
mytuple.append(4)
# this works because newtuple is a new tuple, but may not work as you would expect!
newtuple = (mytuple, 4)
newtuple
You can select a specific value of a list or tuple using square brackets around the item's index. These need to be directly at the end of the variable name. Python index values start from 0
.
# select by index
print (mylist[2])
print (mytuple[0])
:
is a special value that will select all items.
# select all
print (mylist[:])
Using a negative index value will count from the back. It starts with -1
.
print (mytuple[-1])
It is possible to stack indices to select an item in a multi-level list.
# multi-level index
nested_list = [[1, 2], [3, 4]]
nested_list[0][1] # select first list, then second item
You can change and delete values from a list using the index.
# change last list item
nested_list[-1] = [4, 5]
nested_list
# delete list value
del nested_list[0][0]
nested_list
Dictionaries:
Dictionaries are a collection of unordered key-value pairs.
# lists are created using curly braces
pet_list = {'alice':'cat', 'becky':'cat', 'chaoli': 'parrot', 'dan':'dog'}
pet_list
print (pet_list)
Dictionaries have keys and values. This is similar to a physical dictionary - you look up a word to find its definition.
# list all keys
pet_list.keys()
# list all values
pet_list.values()
You can find which specifically value goes with which key by using the key as the index.
pet_list['dan']
Like lists, you can change dictionary keys and values after the fact.
# add a key/value pair
pet_list['ewan'] = 'bunny'
pet_list
It's good to check if a key/value pair exists before deleting a value.
# delete a key/value pair
if 'alice' in pet_list.keys():
del pet_list['alice']
pet_list
Dictionaries can be nested.
pet_list_ext = {'alice': {'type':'cat', 'age':3},
'becky': {'type':'cat', 'age':9},
'chaoli': {'type':'parrot', 'age':23},
'dan': {'type':'dog', 'age':7.5}}
pet_list
Use the double named index selection to retrieve values in nested dictionaries.
pet_list_ext['chaoli']['type']
Boolean Operators are used to evaluate combinations of either Boolean variables or other variables through evaluation. The operators are `and`, `or`, and `not`.
Try to guess what will be returned for each combination below!
True and True
True and False
False and False
True or True
True or False
not True
not False
if (1 and 'hi'): # through evaluation
print('OK')
if (0 and 'hi'): # through evaluation
print('OK')
Comparisons are used to evaluate relative values (ex. is x greater than y), equivalence, or identity. A few examples are shown below.
1 > 2
1 < 2
1 >= 2
1 <= 2
NOTE! Testing for equivalence needs two equal signs, not one!
# are these equal?
1 == 1
# this fails
1 = 1
1 != 2 # is not equal to
`is` and `is not` can also be used for comparisons.
1 is 2
1 is not 2
You can use `in` and `not in` to see if a value is part of a sequence.
1 in (1, 2, 3)
1 not in (1, 2, 3)
There are a few different ways to import a Python library or specific function from a library.
If you import an entire library, you need to preface a function in that library with the library name. For some commonly used libraries or ones with long names, it is common to give it a nickname when importing. If you import a specific function from a library, you can use that function without prefixing it with the library name.
import time # import entire library
import numpy as np # call numpy using np
from math import sqrt # just import square root function from math library
from math import factorial as fac # just import factorial function from math library, call it fac
Be careful with your nicknames because you could potentially conflict with an existing function.
# prints current time (seconds since January 1, 1970)
print(time.time())
# call numpy function using nickname np for numpy
np.array([2,3,4])
# can call sqrt function without having 'math.' in front
sqrt(16)
# can call factorial function by nickname without having 'math.' in front
fac(5)
Most of the time python programs run line by line, executing each statement in order from top to bottom. However, there are cases when certain lines should be skipped if some condition occurs, or a certain section of code should be run many times. Control flow tools are used to change the order or number of times lines or code sections are run.
# if a exists, print
a = 3
if a:
print ('a =', a)
# print elements in list
mylist = [1, 2, 3]
for i in mylist:
print (i, end=" ")
The range
function returns a list of numbers from 0
to the specified number.
range(5)
# print numbers in a certain range
for i in range(5):
print (i, end=" ")
Certain keywords can affect how the loop functions:
# stop if 7 is reached
for i in range(10):
if i == 7:
break
print (i, end=" ")
# prints '- no break' if loop completed without break
for i in range(10):
if i == 12:
break
print (i, end=" ")
else:
print ('- no break')
# skips even numbers, but continues through loop after
for i in range(10):
if i % 2 == 0:
continue
print (i, end=" ")
Sometimes it is useful to have a placeholder in a loop. Here the loop loops, but due to the pass
keyword it does nothing.
# do nothing
for i in range(10):
pass
While loops are useful to continue for an unspecified amount of time until a certain condition is met. If there is no condition specified or nothing changes this loop will keep looping!
# while loop
a = 0
while a < 10:
print (a, end=" ")
a += 1
The try
, except
, and finally
keywords are used to catch things that have failed. finally
will always run, but except
will only run if the specified error occurred.
try:
1 / 0
except ZeroDivisionError:
print("that didn't work")
finally:
print('end')
try:
1 / 1
except ZeroDivisionError:
print ("that didn't work")
finally:
print ('end')
You can also have a general except
clause to catch any type of error.
try:
1 / 0
except:
print ("that didn't work")
finally:
print ('end')
List Comprehension is a quick way to run through a loop. The following two cells create the same resulting list.
mylist = []
for i in range(5):
mylist.append(i * 2)
mylist
mylist = [i*2 for i in range(5)]
mylist
It is useful to create custom functions when you want to reuse sections of code many times.
The def
keyword is used to start a function definition. Arguments that the function expects to receive are listed between parentheses before the :
.
# define a function with no arguments
def myfunct():
print ('hello')
# define a function with one argument
def myfunct2(name):
print ('hello,', name)
# call the functions
myfunct()
myfunct2('Iekika')
If you forget the parentheses in the function call Python will tell you about the function rather than calling it.
myfunct
You can open, read, write files using Python.
# open a file
myfile = open('test_file.txt')
myfile
# read file lines
lines = myfile.readlines()
lines
# print each line
for line in lines:
print(line)
# print a specific line
print(lines[3])
It is important to close the file when you are finished accessing it!
# close file
myfile.close()
A trick is to use the with statement to read a file instead. The file will be closed automatically.
# open with 'with' statement
with open('test_file.txt') as newfile:
newlines = newfile.read()
newlines
# get current time
nowtime = time.time()
nowtime
# write to a file
with open('write_me.txt', 'w') as wfile:
wfile.write('Hi there! ' + str(nowtime))
# read written file
with open('write_me.txt') as rfile:
rlines = rfile.read()
rlines
This last section will briefly cover raster and vector data and show a few introductory ways to work with these data types.
Raster data
The idea of raster data is extended from digital photography, where a matrix is used to represent a continuous part of the world. A GeoTIFF extends the TIFF image format by including geospatial context of the corresponding image.
Generic image readers/libraries ignore the geospatial info and retrieve only the image content. Geospatially-aware software/libraries are needed to extract complete information from this image format.
RasterIO
RasterIO is a light-weight raster processing library that provides enough utility and flexibility for a good range of common needs. Refer to this example as a start.
# load raster data
chicago_tif = rasterio.open(os.path.join('pyintro_resources/data','Chicago.tif'))
# see type
type(chicago_tif)
# find the shape of the array (rows vs columns)
chicago_tif.shape
# assign the first image band to a variable
band1 = chicago_tif.read(1)
Vector data
Vector data describe the world with explicit coordinates and attributes. A GeoJson is a straight-forward format derived from Json. It packs vector data in a way easy for both humans and machines to read/write.
# load chicago vector data
chicago = json.load(open(os.path.join('pyintro_resources/data','Chicago_Community.geojson')))
# Json is represented in Python as a dictionary
type(chicago)
# we can see the dictionary keys
chicago.keys()
# the value of 'type' is 'FeatureCollection': a collection of vector features
chicago['type']
# 'features' contains a list of feature values
type(chicago['features'])
# what are the keys for the first feature in the list
chicago['features'][0].keys()
# what are the properties for the first feature in the list
chicago['features'][0]['properties']
Matplotlib is a powerful library commonly used to display vector data, but one that can handle raster data. Use the %matplotlib inline
command to help display plots as cell output.
%matplotlib inline
import matplotlib.pyplot as plt
# plot the band with Matplotlib
fig = plt.figure(figsize=(12,10))
plt.imshow(band1, cmap='gray', extent=chicago_tif.bounds)
Matplotlib is powerful for generating graphs. Here is a simple example graph:
plt.plot([1,2,3,4])
plt.title('My Plot')
plt.ylabel('some numbers')
plt.show()
Python libraries optimized for visualizing geospatial vector data will be covered in a later notebook!
Enjoy getting to know Python through Jupyter Notebooks!