Python: Introduction

University Libraries at the University of North Carolina at Chapel Hill

Note: If you would like to use Python for the duration of this workshop without downloading anything (or have problems downloading Anaconda), we recommend using repl.it.

If you would like to save your work and continue to use Python on your computer, we recommend downloading Anaconda.

  • Please install Anaconda, or Miniconda with Spyder, following the instructions in Setup before the workshop if possible.
  • If you haven't already installed Anaconda, please begin downloading the Anaconda distribution of Python as soon as possible. It's a large download and can be slow to download over wifi. If needed, we also have a limited number of thumb drives with the Anaconda installers. Please flag one of the staff if you need to copy the installer.

Goals:

  • Get the Anaconda Distribution of Python 3.7 downloaded and installed on your computer
  • Learn to work with basic Python data types and objects
  • Introduce Loops and Conditionals

Why Python?

What is a programming language?

Programming languages allow one to write instructions in human-readable form that can then be read and understood by a computer. "High level" languages like Python translate sets of commands or instructions (programs) created by people (programmers) to "low level" languages more readily understood by the computer. Once these instructions have been translated the computer can then follow the steps within to perform a specific task. (For more information on high- and low-level languages, please see this definition.)

Python is a general-purpose programming language that has become popular for data and text analysis. Python can be easier to learn than other languages because it emphasizes (human) readability and flexibility. Python is the second most used language on GitHub; this means you'll find packages (sets of functionality developed by other users) to use Python for a wide variety of problems and tasks.

If you haven't worked with a programming language before, learning Python will introduce you to methods used in many programming languages, making it easier to learn other languages like Java and R later on.

Use Cases

Scripting

Writing code to automate repeptitive tasks. For example, you might need to extract text from thousands of pdf files and sort them into directories based on whether the texts mention particular phrases. To do this, you would need to create instructions for how to find the pdfs (and skip any non-pdfs!), open them, extract text, search for the key terms, then move the file to the proper final location. In this series, we'll learn about some of the fundamental building blocks for translating a such processes into instructions the computer can understand. The other uses outlined below often involve some element of automation!

Natural Language Processing

The NLTK package provides tools for dealing with unstructured text such as parsing words and sentences, or tagging parts of speech. tesseract applies optical character recognition (OCR) to transform images into machine readable text. Other packages provide access to algorithms like topic modeling on text corpora. Scripting to automate these steps is necessary to apply these algorithms to the vast corpora common in NLP (for example, Early English Books Online provides access to 130,000 works produced in English from 1473-1700).

Data Science

Python has a well-developed ecosystem of specialized tools and functions for everything from fitting deep learning models with tensorflow to creating visualizations with seaborn. Most of these tasks aren't directly covered in Python's standard functionality (our focus today). Later on, we'll explore some of the foundational packages used for Data Science in Python: the SciPy ecosystem, especially pandas.

Others

There are also packages available in Python for web development (Django), image processing (pillow), web scraping (BeautifulSoup), APIs, databases, games (pygame), psychological experiments (PsychoPy), astronomy (astropy), and many many other uses.

Jupyter Notebooks are a popular tool to share python code in a "literate" format, mixing regular english with code and outputs, including formatted tables, visualizations, etc., for easy comprehension by non-Python users. We'll explore Jupyter Notebooks later on in this series of workshops.

However, there are few things you can do in Python that can't also be done in other languages! If you already know one or more programming language, you'll have to decide where Python best fits in your own workflows.

Python vs. R

R is another popular language often compared to Python in the realm of data science. Each has relative strengths and weaknesses, but in most cases Python and R can ultimately accomplish the same goals. We also teach a series of workshops introducing the R language: R Open Labs, however we usually recommend that you focus on one language at a time to avoid confusion!

Python 2 vs. Python 3

Both Python 2 and Python 3 are widely used in research. Unfortunately, while both Python 2 and 3 are very similar, there are a few key syntax differences: Python 3 cannot always run Python 2 code (or vice versa).

Python 3 was released in 2008; since then, nearly all important tools have been re-written or updated for Python 3. Python 2 is not maintained as of January 1, 2020. This workshop will use Python 3.

Warning!

If you're already comfortable with basic programming concepts, you'll probably find this workshop very straightforward. The later workshops in the series may be more helpful if you're already familiar with the concepts below and just need to learn new syntax. Experienced attendees are more than welcome to stay, review, and help others!

Getting Started

IDEs

An Integrated Development Environment (IDE) is software that combines many tools to help you use a programming language, such as a code editor, compiler, and debugger in a convenient interface. There are many different IDEs to choose from. IDEs are not necessary, but are often good for beginners and useful for experienced users.

As you gain experience, you can choose whether an IDE is right for your uses and which one works best for you. For the purposes of this workshop, we will use Spyder, which comes packaged with Anaconda.

Open Spyder:

  • Windows: Start>Anaconda3 64-bit>Spyder
  • Mac: Applications>Anaconda Navigator>Spyder

SpyderIDE.png

Spyder's default interface provides three panes:

  • The Editor pane (left) is a scripting window for writing Python code for reuse or sharing later.
    • Scripts should be self contained to ensure easy reuse later on. You should always be able to restart and run your script from scratch to make sure you haven't left anything important out.
  • The Console pane (bottom right) contains a console for executing code. We'll use this to test our code interactively.
  • The Explorer pane (top right) contains other helpful tools listing defined variables, files in the working directory, and other help.

Note: Code prepared in a simple text editor (not a formatted editor like Microsoft Word) can be executed (run) using your computer's command line or terminal.

Entering code

We'll begin by using Python as a simple calculator. The objective here is to introduce you how the windows in Spyder work together and some basic Python syntax.

In this workshop, Python code will be presented in numbered grey cells as below. Any output generated will also be displayed below the grey cell.

In [1]:
2+2
Out[1]:
4

To execute this in Spyder, copy or type the code yourself into the Ipython console pane. Press Enter to execute.

2+2_console.png

You can also enter code into the Editor pane. This is particularly useful when writing more complicated or reusable code. The code you write in the Editor pane can be saved as a .py file to revisit or run later.

To use the Editor pane to save and execute code, type the code in the Editor pane, highlight the line(s) you want to execute and click:

  • Run > Run Selection or Current Line
  • Shortcut: F9 (or FN+F9 on many laptops)

The code will then execute in the Console pane. Note that if you don't have a line selected, this shortcut will run the "current line", i.e. the line where the cursor is located.

Standard arithmetic operations are available in Python.

In [2]:
3*3
Out[2]:
9

Note: We can annotate our code with comments. Python uses # to denote comments. Anything typed after a # will be ignored on execution of the code.

In [3]:
#1+2
5/2 #division
Out[3]:
2.5

Exercises

  1. What is 126544 squared? (Exponentiation is denoted **, i.e. $ 2^3 $ is 2**3 in Python).

  2. What is 5 divided by 0?

Data Types and Variables

Ultimately, we need Python to store various values and objects for future re-use. Python has many default data types available. We will focus on a few common examples.

Strings and Numbers

We assign a value to a variable using =. We do not need to declare a type or give any other information.

In [4]:
number = 42

text = "Hello, World"

String objects, like text above, contain textual values. These are identified to Python by quotes; you can use either ' or " as long as you use the same type to begin and end your string.

Python uses several different numeric data types for storing different values. Examples include integers, long integers, and floating point numbers (decimals). Numbers can also be stored in string values using quotes.

For example:

In [5]:
notnumber = "42"

Once we have defined an object, we can use it again, most simply by printing it.

In [6]:
print(text)
Hello, World
Note: The print command is one of the most basic differences between Python 2 and Python 3. In Python 2, print does not require parentheses: `print text` In Python 3, you must include parentheses: `print(text)`

We can also modify the contents of objects in various ways such as redefining them or changing their type. In some cases this is crucial to how Python can work with them. For example:

In [7]:
print(number+58)
#print(number+notnumber)
100

So we can add a value, 58, to our number object, but we can't add our notnumber object. Let's double check what notnumber contains:

In [8]:
print(notnumber)
42
In [9]:
print(number)
42

Even those these appear the same to our eye, Python uses them very differently. Remember how we defined notnumber? Let's check what data type Python is using with type.

In [10]:
type(notnumber)
Out[10]:
str

Fortunately Python provides a set of functions to convert objects between different data types. A function packages a set of prewritten commands to accomplish a particular task. Most functions take one or more objects as inputs or 'arguments' and produce some new object as output.

The int function takes an object as an argument and converts it to an int (integer) numeric object. The usage is as follows:

In [11]:
newnumber = int(notnumber)
print(newnumber)
type(newnumber)
42
Out[11]:
int

Now we can try adding objects again.

In [12]:
print(number+newnumber)
84

int objects can only hold integer values. If you have decimal values, use the float (floating decimal) type instead.

In [13]:
myfloat = float(newnumber)+0.5
print(myfloat)
42.5

Getting help with functions

You can access documentation for functions in Python with help, for example help(sum). Base Python functions and those provided by packages also usually have online documentation that may be easier to read.

Exercises

  1. Define two variables, j and k, with values 37 and 52 respectively.
  2. What is the sum of j and k? The product? Write code for each of these in the editor window, and run with the keyboard shortcut (refer back to Section 1. Entering Code).
  3. Now re-assign j and k to have the vales 8 and 3 respectively. Re-use your code from the editor to determine their sum and product.

Lists

Python's lists store multiple objects in a sequence. All of the data types we have seen so far (and indeed most data types in Python) can be placed in a list. For example, we can save numbers and character strings together:

In [14]:
my_list = [1, 2, 3, "four"]
print(my_list)
[1, 2, 3, 'four']

We can also define lists using previously defined objects (including other lists!):

In [15]:
obj0 = 12
obj1 = "cat"
obj2 = ["a", "b", "c"]
my_list1 = [obj0, obj1, obj2]
print(my_list1)
[12, 'cat', ['a', 'b', 'c']]

Once we've defined a list, we can add more elements to it with the .append function.

In [16]:
my_list1.append("dog")
print(my_list1)
[12, 'cat', ['a', 'b', 'c'], 'dog']

Exercises

  1. Create a list of:

    • your favorite color (str)
    • your two favorite holidays (list)

      For example: ["red", ["Halloween", "New Years"]]

  2. Then append the number of pets you have (int) as a new list item.

Indexing

Python retains the order of elements in our list and uses that order to retrieve objects in lists. These numbered positions are called indices.

We use [ and ] to provide indices in Python.

Most importantly, Python starts counting at zero: The first element in your list is denoted [0], the second [1], the third [2] and so on. This can take some getting used to!

In [17]:
my_list2 = ["cat", "dog", "parrot"]
print(my_list2[2])
parrot

We can use multiple indices for lists within lists, one after the other:

In [18]:
#recall
my_list1 = [12, 'cat', ['a', 'b', 'c']]
print(my_list1[2][1]) #i.e. the second element of the list held in the third element of my_list1
b

We can extract multiple adjacent items from a list using a colon.

[n:m] retrieves the values from index n to index m-1.

In [19]:
my_list1[0:2]
Out[19]:
[12, 'cat']

The len function provides the length of an object in Python.

In [20]:
print(len(my_list1))
3

If len(my_new_list)=10 that means there are ten elements in the list. Remember that Python starts counting at 0, so the indices are 0 through 9.

We can use the range function with len to generate a list of indices. This can be a useful object to work with later on.

In [21]:
my_indices1 = list(range(len(my_list1)))
print(my_indices1)
[0, 1, 2]

Note: When we're experimenting with indices, the Spyder console provides a useful shortcut. Click the active line on the console and use the up arrow to move through the code you've previously executed in the console.

For example:
We might want to bring back the line of code above, my_list1[0:2] and modify it to my_list[0:1] to see how that affects the output.

This shortcut can save time typing or copying code you want to experiment with. Remember that whatever code you settle on probably belongs in your script if later code will depend on it!

Indexing beyond lists

Indexes can also be used with any sequential data type, including strings.

For example:

In [22]:
my_str = "The quick brown fox jumps over the lazy dog."
print(my_str[4])
print(my_str[4:9]) #4:9 indicates characters 4-8
q
quick

Using : ranges with one end blank will automatically go to the end of the object.

We can also work from right to left using negative numbers.

In [23]:
print(my_str[-4:])
print(my_str[:4])
dog.
The 

We can still use multiple indices across sequential data types. For instance, a list of strings:

In [24]:
["home", "away"][0][0:3]
Out[24]:
'hom'

It can be helpful to unpack nested indices, for example, let's look at what the first index [0] alone gives us:

In [25]:
["home", "away"][0]
Out[25]:
'home'

Unfortunately, not all data types are sequential - indices will not work on numeric values, unless we convert them to strings with str. Python considers numbers to be a single "value" whereas the strings and lists both have natural component parts.

Exercises

  1. Try to use indexing to get the tenth digit of my_pi as defined below. Does it work as defined? Do we need to change the variable somehow?
    my_pi = 3.141592653589793
  2. Below is a list of lists containing the NATO phonetic codes for each letter of the alphabet. Each list within nato contains a letter of the alphabet and its corresponding code.
    nato = [["A", "Alfa"],
           ["B", "Bravo"],
           ["C", "Charlie"],
           ["D", "Delta"],
           ["E", "Echo"],
           ["F", "Foxtrot"],
           ["G", "Golf"],
           ["H", "Hotel"],
           ["I", "India"],
           ["J", "Juliett"],
           ["K", "Kilo"],
           ["L", "Lima"],
           ["M", "Mike"],
           ["N", "November"],
           ["O", "Oscar"],
           ["P", "Papa"],
           ["Q", "Quebec"],
           ["R", "Romeo"],
           ["S", "Sierra"],
           ["T", "Tango"],
           ["U", "Uniform"],
           ["V", "Victor"],
           ["W", "Whiskey"],
           ["X", "X-ray"],
           ["Y", "Yankee"],
           ["Z", "Zulu"]]
  • What is the fifteenth letter of the alphabet?
  • What is the code for the twenty-third letter of the alphabet?
  • What is the fourth letter of the code for the eighth letter of the alphabet?

Review: Data Types

Type Example Description
int 1, 2, 3 Integers (whole numbers)
float 1.5, 2.72, 3.14 Floating point numbers (decimals)
str "cat", "dog", "car" String, character, or text values
list [1, 2, 3], ["cat", "dog"], [1, [2, 3]] One or more objects stored by order

We will cover Python's dictionary (dict) and boolean (bool) types later on.

Read more about Python's built-in data types here.

Flow Control

Conditions and Booleans

Conditionals allow for more flexible instructions, letting our code react differently as our inputs change.

Conditions often arise from comparisons:

    <          strictly less than
    <=      less than or equal
    >        strictly greater than
    >=      greater than or equal
    ==      equal
    !=      not equal
    is      object identity
    is not  negated object identity
    in      sequence membership
    not in  sequence non-membership

Note: = is used for assignment, whereas == checks if two objects are equal.

Each condition considered evaluates to a Boolean value - True or False. Booleans have their own data type: bool.

In [26]:
num=5
num<3
Out[26]:
False
In [27]:
letter="a"
letter in ["a","b","c"]
Out[27]:
True

Conditional Statements

A conditional statement allows your code to branch and behave differently based on these conditions.

A simple conditional statement takes the form:

if <condition>:
    <do something only if condition is true>

Your instructions can be as long as necessary, provided they remain indented. Indentation is very important in Python as it groups lines of code without using explict characters like { and } as in many other languages.

You can indent with spaces or tabs, but you must be consistent.

We can supply alternate steps if the condition is false with else, or even consider multiple conditions with elif (i.e. else if).

if <condition1>:
    <do something if condition1 is true>
elif <condition2>:
    <do a different thing if condition1 is false and condition2 is true>
else
    <do a third thing if neither condition is true>
In [28]:
num = 5
if num > 4:
    print("This number is greater than four")
This number is greater than four
In [29]:
num = 3
if num > 4:
    print("This number is greater than four")

Adding else lets us give instructions if our condition is False.

In [30]:
num = 3
if num > 4:
    print("This number is greater than than four")
else:
    print("This number is less than or equal to four")
This number is less than or equal to four

Finally, the elif command lets us split the possible values of num into more groups. You can have as many elif statements as you need to split up possible conditions.

In [31]:
num = 8
if num < 3:
    print("This number is less than three")
elif num < 10:
    print("This number is greater than or equal to three and less than ten")
else:
    print("This number is greater than or equal to ten")
This number is greater than or equal to three and less than ten

Exercises

  1. Write an if statement that prints a message if a person is old enough to get a driver’s license (teenagers can get their driver’s licenses at age 16). Next, add in an else statement that gives a different message.

    #Template:
    age = <your choice of age>
    if <condition using age>:
         print(“you are old enough to get a driver’s license.”)
  2. Using indexing, write a conditional that prints a word only if it ends with the letter 'e'.

    #Template:
    testword = <your choice of word>
    if <condition using testword>:
         print(testword)

For Loops

A "for loop" allows us to apply the same steps to each element in a list or other iterable. In essence, loops let us automate tasks relative to some sequence that we might otherwise write like this:

In [32]:
sales = [5, 2, 7, 9, 3]
total_sales = 0
total_sales = total_sales + sales[0]
total_sales = total_sales + sales[1]
total_sales = total_sales + sales[2]
total_sales = total_sales + sales[3]
total_sales = total_sales + sales[4]
print(total_sales)
26

In the code above, we're essentially applying the same operation (cumulative summation) to each object in sales one by one. A loop will let us write this operation in a general way and apply it to each object in a list or sequence.

Loops take the form:

for <name> in <list>:

    do something based on name


  • <name> is completely arbitrary, though i, j, k, and n are relatively common. Use something that makes sense to you (and others)!
  • <list> is a pre-defined list or other iterable object.
  • Reminder: Indentation is very important in Python and must be used consistently across the loop(s) Only the code indented under the loop will be run in each iteration.
In [33]:
my_nums = list(range(6))

for n in my_nums:
    print(n)
0
1
2
3
4
5

We can also loop within loops. Indentation is key to control which blocks of code are executed within which loop.

In [34]:
#Nesting loops - indentation is key!
listOfWords = ["blue", "yellow", "red", "green"]
newList = [] #initialize an empty list

for color in listOfWords:
    numLetters = 0 #resets to zero each time the loop runs
    for letter in color:
        numLetters += 1
    temporaryList = [color, numLetters]
    newList.append(temporaryList)
    
print(newList)
[['blue', 4], ['yellow', 6], ['red', 3], ['green', 5]]

Notice that before the loop begins we create an empty list. This is a common stragegy to collect outputs from some or all of the loops iterations. This can generalize to numbers by defining a zero-valued variable before the loop and modifying it with each iteration.

How could we write the code above with fewer lines? Is there a simpler way to find the length of each word?

For Loops with Conditionals

Loops become even more useful when combined with conditionals, to perform different steps based on conditions that change with each iteration of the loop.

In [35]:
for number in range(10):
    if number % 2 == 0: 
    # % denotes the modulo operation - the result is the remainder after dividing by 2 
    # (i.e. 6%2 = 0, but 5%2 = 1)
        print(number)
0
2
4
6
8

Recall that we can combine multiple conditions with and.

In [36]:
scores=[95, 90, 66, 83, 71, 78, 93, 81, 87, 81]
grades=[]
for score in scores:
    if score >= 90:
        grade = "A"
    elif score >= 80:
        grade = "B"
    elif score >= 70 and score < 80:
        grade = "C"
    elif score >= 60 and score < 70:
        grade = "D"
    else:
        grade = "F"
    grades.append([score, grade])       
print(grades)
[[95, 'A'], [90, 'A'], [66, 'D'], [83, 'B'], [71, 'C'], [78, 'C'], [93, 'A'], [81, 'B'], [87, 'B'], [81, 'B']]

Exercises

  1. Why do I only specify score >= 80 etc. in the elif statements? Can any of these conditionals be simplified?
  2. How many numbers between 1 and 100 are divisible by 7?
  3. Make a new list of NATO codes keeping only those that use the letter "a" in their code.

Breaks

We can use the break statement with a conditional to stop the loop if a certain condition occurs.

First, lets get some new functions from a package called random.

  • from random import choices,seed makes the functions choices and seed from the random package available in our Python session. We already have random because it is part of the Python Standard Library
  • choices(population=range(100), k=50) will sample 50 random numbers (with replacement) from the numbers 0-99.
  • seed(1234) locks the pseudo-random number generator so your results should match mine - try running this again without seed!

You can read more about the functions in the random package here. We'll revisit packages later on.

In [37]:
from random import choices,seed 

seed(1234)

test=choices(population=range(100), k=50)
print(test)
[96, 44, 0, 91, 93, 58, 67, 8, 76, 23, 3, 78, 34, 62, 61, 14, 18, 11, 1, 48, 96, 6, 54, 46, 60, 8, 57, 26, 55, 64, 48, 35, 24, 93, 45, 53, 1, 50, 0, 14, 47, 37, 5, 58, 16, 55, 14, 93, 77, 95]
In [38]:
total = 0
for number in test:
    if number > 10:
        total=total+number
    else:
        print("This number is too low:",number)
        break
print(total)
This number is too low: 0
140

What does the above loop do? How would this run differently if we disabled the break by commenting (i.e. #break)?

Exercise

  1. Use the choices function above to generate a random list of 50 numbers in 0-99. Write a loop that will find the sum of only the first six even numbers.

Try / Except - Robustness

Errors and warnings are very common while developing code, and an important part of the learning process. In some cases, they can also be useful in designing an algorithm. For example, suppose we have a stream of user entered data that is supposed to contain the user's age in years. You might expect to get a few errors or nonsense entries.

In [39]:
user_ages=["34", "27", "54", "19", "giraffe", "15", "83", "61", "43", "91", "sixteen"]

It would be useful to convert these values to a numeric type to get the average age of our users, but we want to build something that can set non-numeric values aside. We can attempt to convert to numeric and give Python instructions for errors with a try-except statement:

In [40]:
ages = []
problems = []

for age in user_ages:
    try:
        a = int(age)
        ages.append(a)
    except:
        problems.append(age)
        
print(ages)
print(problems)
[34, 27, 54, 19, 15, 83, 61, 43, 91]
['giraffe', 'sixteen']

More Data Types

Earlier, we introduced a number of important data structures in Python: string and numeric types, as well as lists. We used indexing to specify particular parts of the sequential objects - strings and lists. Here we introduce dictionaries, which provide a useful alternative format for some types of information. List and dictionary comprehensions provide a more succinct way to generate lists and dictionaries.

Dictionaries

Dictionaries provide a "mapping object"; instead of an index, they used named "keys" to organized data. Dictionaries also benefit from faster performance than lists in most cases, due to their use of hash tables.

A dictionary is defined as follows:

In [41]:
class_dict = {"course":"Python II", "location":"Davis Library", "time":"4pm"}
type(class_dict)
Out[41]:
dict

In this case, "course", "location", and "time" serve as the "keys" for this dictionary. Keys play a similar role to the indices we use for lists (or strings). We can print a particular value by placing its key in the same square brackets [] used by list indices.

In [42]:
print(class_dict["location"])
Davis Library

A numeric index will not work with dictionaries.

We can also generate a list of all of the keys for a dictionary using the .keys() method.

In [43]:
print(class_dict.keys())
dict_keys(['course', 'location', 'time'])

Comprehensions

Python provides some shortcuts to generating lists and dictionaries, especially those that you might (now) generate with a list. For example, let's generate a list of the square of each number from 1 to 15.

In [44]:
squares=[]
for n in range(1, 16):
    squares.append(n**2)
print(squares)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]

Using a "comprehension", we can shorten this to a single line, effectively bringing the loop inside the [] used to define the list.

In [45]:
squares=[x**2 for x in range(1, 16)]
print(squares)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]

The same general format holds for defining dictionaries.

In [46]:
squaresdict={k:k**2 for k in range(1, 16)}
print(squaresdict)
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100, 11: 121, 12: 144, 13: 169, 14: 196, 15: 225}

We can include conditional statements at the end of the comprehension to build more flexible comprehensions.

In [47]:
sentence="the quick brown fox jumped over the lazy dog"
sentence=sentence.split(" ") #splits the string into a list with each space
print(sentence)
print([w for w in sentence if len(w)>4])
['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']
['quick', 'brown', 'jumped']

Exercise

  1. Write a list comprehension to create a list of just the values (i.e. the squares) from squaresdict.

Review

So far, we've introduced:

  • Numeric types (int, float):
    my_int = 4
  • Strings (str):
    my_string = "cat"
  • Lists (list):
    my_list = [my_int, my_string]
  • Dictionaries (dict):
    my_dict = {'course': 'Python', 'duration': 2}
  • For loops:
    for k in range(10):
      print(k)
  • Conditionals
    if my_string == "cat":
      print("This is a cat!")
    else:
      print("This is not a cat!")

Pseudocode and Comments

Pseudocode

As you get started coding in Python, there will be many many tasks and steps you aren't familiar with! As you learn new functions and approaches, you'll become better and better at searching for help online and reviewing documentation. Learning to write and use pseudocode where appropriate can help organize your plan for any individual script.

Pseudocode is essentially a first draft of your code, written in English for human consumption, though with the tools of your programming language in mind. For example, we might write pseudocode for extracting text from pdf files as:

1. Set Working Directory (tell the computer where we've saved our files)
2. Loop through each pdf in the directory:
    * open the pdf file
    * extract text
    * check length of text extracted
        * if length is zero: add to problems list
        * otherwise, add to output file
3. Write output file(s)

This process can divide a complicated task into more digestible parts. You may not know how to open a pdf file or extract text from it, but you'll often have better luck finding existing help online on smaller tasks like these than with your overall goal or project.

Exercises

  1. Write pseudocode to summarize the following code:
In [48]:
random_words=["statement", "toy", "cars", "shoes", "ear", "busy", 
              "magnificent", "brainy", "healthy", "narrow", "join", 
              "decay", "dashing", "river", "gather", "stop", "satisfying", 
              "holistic", "reply", "steady", "event", "house", "amused", 
              "soak", "increase"]

vowels=["a", "e", "i", "o", "u", "y"]

output=[]

for word in random_words:
    count = 0
    for char in word:
        if char in vowels:
            count = count + 1
    if count >= 3:
        output.append([word, count])
  1. Write pseudocode to check an arbitrary list of numbers, my_numbers, to find all even numbers and convert them to odd numbers by adding one. Put the resulting numbers into a new list my_numbers2. (Recall for loops ,if conditions, and the modulo function % from Python 1.)

Comments

Recall that Python ignores anything following a # as a comment. Comments are a vital part of your code, as they leave notes about how or why you're doing something. As you gain experience, you'll use comments in different ways.

Comments can also provide a link between pseudocode and real code. Once you've written your pseudocode, use comments to put the major steps into your code file itself. Then fill in the gaps with actual code as you figure it out.

Here's a possible answer to the previous exercise.

In [49]:
#1. Get or define the list my_numbers
my_numbers=list(range(100))

#2. Create an empty list for the new all-odd numbers, called my_numbers2.

#3. Use a loop to iterate through the list of numbers

    #3a. For a given number check to see if it is even.
    
    #3b. If the number is even, add 1.
    
    #3c. Append the resulting number to the my_numbers2 list.

Exercise

  1. Use your own pseudocode or the example above as an outline to fill in with Python code. Test your code with the my_numbers object defined above.

User-defined Functions

While Python (and its available packages) provide a wide variety of functions, sometimes it's useful to create your own. Python's syntax for defining a function is as follows:

def <function_name> ( <arguments> ):
    <code depending on arguments>
    return <value>

The mean function below returns the mean of a list of numbers. (Base Python does not include a function for the mean.)

In [50]:
def mean(number_list):
    s = sum(number_list)
    n = len(number_list)
    m = s/n
    return m

numbers=list(range(1, 51))
print(mean(numbers))
25.5

Exercises

Choose one of the following (or both if you're feeling ambitious!):

  1. Define a function, median to find the median of a list. The median is the middle number of an odd-numbered list or the average of the middle two numbers in an even numbered list. (Hint: Use sorted(<your_list>) to create a list sorted from low to high values.

  2. Test your function with the lists below:

In [51]:
data1 = list(range(1, 100))

#Normally Distributed Data:
from numpy.random import normal
data2 = normal(loc=0, scale=2, size=100) #scale=2 defines the standard deviation as 2

Coming up

  • Loading and Using Packages
  • Reading and writing external files
  • Survey of useful packages for Data Science
  • Introduction to pandas
  • Jupyter Notebooks

Getting Ready

  • If do not have Anaconda downloaded yet, please do so (see Setup) since some of the material ahead cannot be completed using pyfiddle. If you are having trouble with installing or need to borrow a computer for the workshop, see one of the instructors!

  • I'm available for one-on-one consultations on Python if you need help. Contact me here.

Questions?

Please feel free to share any ideas or topics you'd like to see covered.

You can also share ideas while filling out our Feedback Survey.

Thanks for coming!

References and Resources