Useful Packages

Some of these packages may NOT be included in your Anaconda installation. Whenever you need to install a package, you need to use the Anaconda prompt window, NOT Python itself. The Anaconda Prompt window can be reached through the Windows Start Menu folder for Anaconda or right clicking and opening a terminal from the Python 3 tab in your Evnironments tab of your Anaconda Navigator on a Mac.

Installing packages known to Anaconda can be done with the conda install <package name> command in your Anaconda Prompt window. Otherwise you may need to use a different manager like pip install <package name>.

More information about managing packages in Python is available here.

You can also install packages from the Anaconda Navigator window.

Data Packages

Other Utilities

Numpy

Numpy provides the mathematical functionality (e.g. large arrayes, linear algebra, random numbers, etc.) for many popular statistical and machine learning tasks in Python. This is a dependency for many of the packages we discuss below, including pandas. One of the foundational objects in numpy is the array:

However, arrays in numpy are constrained to a single data type, unlike lists or DataFrames.

We can use numpy to do many numerical tasks, for example creating random values or matrices/DataFrames:

Arrays make many mathematical operations easier than base Python. For example if we want to add a single value to every element of a list, we could try:

To accomplish this in base Python, we instead need to use a comprehension (maybe even with an if statement if the data types vary!):

With numpy arrays, we can use:

Since the pandas dataframes are built on numpy arrays:

`scipy` adds an array of mathematical and statistical functions that work with numpy objects.

Pandas and Data Visualization Packages

See our dedicated lesson on Pandas (linked above).

scikit-learn

scikit-learn provides a consolidated interface for machine learning in Python:

Read more about using sklearn. Digging into the application of machine learning is beyond the scope of our workshop series.

BeautifulSoup (for parsing HTML or XML data)

Python's built-in urllib.request package makes it relatively easy to download the underlying html from a web page. Note that the from package import function notation used here allows you to selectively import only parts of a package as needed.

Be sure to check the terms of services for any website before scraping! We're scraping our own materials here to be safe!

APIs

APIs (Application Programming Interfaces) provide a structured way to request data over the internet. APIs are generally a better option than web scraping because:

An API call is just a specific type of web address that you can use Python to help you generate (or cycle through many options):
For example: https://api.weather.gov/points/35.9132,-79.0558
This link pulls up the National Weather Service information for a particular lat-long pair (for Chapel Hill). The forecast field leads us to a new link:
https://api.weather.gov/gridpoints/LWX/96,70/forecast

We can use Python to request and parse the content from these links, but often we can find a wrapper someone else has created to do some of that work for us!

Remember that we can install packages in the Anaconda Prompt or Terminal:

Then run the following to install the package:
pip install noaa_sdk

This provides us with a pretty complicated set of nested dictionaries that we can parse to find specific values:

NLTK (text analysis)

The Natural Language Toolkit (`nltk`) provides a wide array of tools for processing and analyzing text. This includes operations like splitting text into sentences or words ("tokenization"), tagging them with their part of speech, classification, and more.

Let's take the example sentence: "The quick brown fox jumps over the lazy dog." and convert it into individual words.

Now we can extract the part of speech for each word in the sentence. Note that this function, like many of the functions in NLTK, uses machine learning to classify each word and therefore may have some level of error!

The meaning of these parts of speech tags are available below:

Read more about getting data for Text and Data Mining projects via the Libraries.

PIL (Pillow)

Pillow is the updated version of the old Python Imaging Library (PIL), which provides fundamental tools for working with images. Pillow can work with a many common formats (some of which may require extra packages or other dependencies) to automate a wide variety of image transformations.

Note: While pillow is how you install the package, you import functions with import PIL.

Mode "LA" is grayscale, preserving transparent image areas. Read more about modes.

Parallel Processing with joblib

As you move into more complicated processes in Python (or applying code to a wide variety of objects or files), processing time can become a major factor. Fortunately, most modern laptops have multiple processor cores that can do separate things at the same time. Python only uses one core by default. If you have a set of loops that don't depend on each other (e.g. processing lots of files one after another), you could split up your many loop iterations between processors to greatly increase speed.

The joblib package provides a straightforward way to split loops up between your computers cores for faster performance on complicated code. Note that parallelization may not benefit you much or may even hurt you for very quick jobs because setting up and consolidating information from separate cores creates overhead costs.

Conda envs

Anaconda provides the option to create separate Python environments inside your installation. All of the code below should be run in the Anaconda Prompt:

(PC) Start Menu > Anaconda3 > Anaconda Prompt (Mac) Finder > Applications > Utilities > Terminal

conda create --name myenv python=3.5 creates an environment called myenv with Python version 3.5 instead of your main installation version.

conda activate myenv makes this environment active. From here you can install packages, and open software (e.g. spyder will open spyder after installation).

conda deactivate deactivates the active environment and returns to base Anaconda.

Conda environments are a great place to test out code, or run code that has very specific requirements. It's generally a good idea to be careful about vastly changing your environment (e.g. upgrading to a new version of Python), because it can break your project code! Environments provide a great way to test before making the change in your main environment.

Read more about conda environments