Download the csv file CountyHealthData_2014-2015.csv.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("CountyHealthData_2014-2015.csv")
# df = pd.read_csv(r"C:\Users\npbyers\Desktop\Python\Jupyter\CountyHealthData_2014-2015.csv")
1. What are the 5 states with the highest average adult smoking rates? Show the states and their smoking rates.
.groupby()
and .mean()
functions to reorganize your data to feature state-level statistics..sort_values()
function can be used to sort your dataframe by column values like "Adult smoking". Becaue sort_values()
scrambles the existing indices, you'll need to reset the indices of your dataframe using .reset_index()
so that you can easily index your sorted rows later..loc
to index specific areas of your dataframe)2. Food Environment Index (FEI) measures access to food and food insecurity (0 is bad, 10 is good). According to countyhealthrankings.org, "In 2019, the average FEI value for counties was 7.5 and most counties fell between about 6.9 and 8.2." (Caveat: our data is from 2014-2015)
.shape
to see how many rows you have left.).value_counts()
function to output a ranked list of all the unique values in the column with the number of times each appears in your dataframe. In other words, this function will give you a list of the states in your new dataframe along with the number of times they appear.)
#example:
df['Column Name'].value_counts()
3. In many counties one can find more primary care physicians than dentists. However, in a few counties there are more than 5 times (!) as many dentists as there are primary care physicians.
Which counties are these? (hint: Using conditionals, create a new dataframe dfDent
with only these counties. Use .loc
to show the relevant information - you'll need "County", "State", "Dentists", "Primary care physicians", and "2011 population estimate" for this exercise.)
Note: It is important to use the .copy()
method when creating new dataframes instead of using slices of existing dataframes. This will prevent certain assignment errors when you begin working with your new dataframe.
# Example using .copy()
dfNew = df['Column1'].copy()
#example using slices
dfNew = df['Column1']
.sort_values
on your new dataframe, dfDent
).loc
to add columns (and rows) to your dataframe. You can set the column based on values in other columns using simple operators. After adding a new column, you can sort using the sort_values()
method.
# Example syntax: adding a column
df.loc[:, "Column3"] = df[Column1]*df[Column2]
#This adds a new column, "Column3," with values equal to the products of the values for Column1 and Column2 in each row.
4. Create a dataframe (again, use .copy()
) that only includes counties in the state of North Carolina. (Hint: use the Rhode Island example from "Indexing" section to select the values for a single state)
.loc
to display the County name, "State," "Adult obesity," and "Physical inactivity" for each county. Do you notice any patterns? How do "Pysical inactivity" values change as "Adult obesity" values go up or down?
#sorting example:
df.sortvalues(by="Column1", ascending=False)
#the 'by' parameters tells the function which column to sort by, while the 'ascending' parameter lets the computer know to sort high-low or low-high
matplotlib
to visualize this relationship. Does your plot look the way you expected it to?
#syntax example:
plt.scatter(x=df["Coumn1"], y=df["Column2"])
plt.xlabel("Column1")
plt.ylabel("Column2")
5. The "Some college" column denotes the percantage of the population that have taken any number of college classes.
.isin()
function. To do so, declare a list for each column(s) you want to filter by. Populate that list with the values you wish to retain. You can then use one or multiple .isin()
statements when indexing your data frame to filter based on column values. Note, all of the years in our sheet follow this format: "1/1/2015" or "1/1/2014.")
#example syntax:
Letters = ["X, Y, Z"]
Numbers = ["1, 2, 3"]
df[df.Column1.isin(Letters) & df.Column2.isin(Numbers)]
#this will return a dataframe with rows that have values for Column 1 and Column 2 that were in your Letters and Numbers lists, respectively
#Hint: Use the .mean() function. Syntax:
df["Column1"].mean()