PandasExercises

1. What are the 5 states with the highest average adult smoking rates? Show the states and their smoking rates.

You'll need to use the .groupby() and .mean() functions to reorganize your data to feature state-level statistics.
The .sort_values() function can be used to sort your dataframe by column values like "Adult smoking". Becaue sort_values() scrambles the existing indices, you'll need to reset the indices of your dataframe using .reset_index() so that you can easily index your sorted rows later.
Once your new smoking dataframe has been sorted and reset, you can index the top 5 rows to find the top 5 smoking states. (Hint: use .loc to index specific areas of your dataframe)

2. Food Environment Index (FEI) measures access to food and food insecurity (0 is bad, 10 is good). According to countyhealthrankings.org, "In 2019, the average FEI value for counties was 7.5 and most counties fell between about 6.9 and 8.2." (Caveat: our data is from 2014-2015)

Can you find all the counties with FEI values of 9 or higher? How many are there? (Hint: Use indexing and conditionals to select only the rows you need. You can use .shape to see how many rows you have left.)
What are the 5 most represented states in this group of counties and how many counties from these 5 states are included in your new dataframe? (hint: use the .value_counts() function to output a ranked list of all the unique values in the column with the number of times each appears in your dataframe. In other words, this function will give you a list of the states in your new dataframe along with the number of times they appear.)
```
  #example:
  df['Column Name'].value_counts()
```

3. In many counties one can find more primary care physicians than dentists. However, in a few counties there are more than 5 times (!) as many dentists as there are primary care physicians.

Which counties are these? (hint: Using conditionals, create a new dataframe dfDent with only these counties. Use .loc to show the relevant information - you'll need "County", "State", "Dentists", "Primary care physicians", and "2011 population estimate" for this exercise.)
- Note: It is important to use the .copy() method when creating new dataframes instead of using slices of existing dataframes. This will prevent certain assignment errors when you begin working with your new dataframe.
```
  # Example using .copy()
  dfNew = df['Column1'].copy()

  #example using slices
  dfNew = df['Column1']
```
Which of these counties has the most dentists? (hint: use .sort_values on your new dataframe, dfDent)
Which of these has the most dentists per capita? (hint: You can use .loc to add columns (and rows) to your dataframe. You can set the column based on values in other columns using simple operators. After adding a new column, you can sort using the sort_values() method.
```
  # Example syntax: adding a column
  df.loc[:, "Column3"] = df[Column1]*df[Column2]
  #This adds a new column, "Column3," with values equal to the products of the values for Column1 and Column2 in each row.
```

4. Create a dataframe (again, use .copy()) that only includes counties in the state of North Carolina. (Hint: use the Rhode Island example from "Indexing" section to select the values for a single state)

Sort your new dataframe by "Adult obesity." Use .loc to display the County name, "State," "Adult obesity," and "Physical inactivity" for each county. Do you notice any patterns? How do "Pysical inactivity" values change as "Adult obesity" values go up or down?
```
  #sorting example:
  df.sortvalues(by="Column1", ascending=False)
  #the 'by' parameters tells the function which column to sort by, while the 'ascending' parameter lets the computer know to sort high-low or low-high
```

Use matplotlib to visualize this relationship. Does your plot look the way you expected it to?

  #syntax example:
  plt.scatter(x=df["Coumn1"], y=df["Column2"])
  plt.xlabel("Column1")
  plt.ylabel("Column2")

5. The "Some college" column denotes the percantage of the population that have taken any number of college classes.

What is the "Some college" value for Orange County, North Carolina in 2015? (Hint: You can filter your dataframe using multiple columns using the .isin() function. To do so, declare a list for each column(s) you want to filter by. Populate that list with the values you wish to retain. You can then use one or multiple .isin() statements when indexing your data frame to filter based on column values. Note, all of the years in our sheet follow this format: "1/1/2015" or "1/1/2014.")
```
  #example syntax:
  Letters = ["X, Y, Z"]
  Numbers = ["1, 2, 3"]
  df[df.Column1.isin(Letters) & df.Column2.isin(Numbers)]
  #this will return a dataframe with rows that have values for Column 1 and Column 2 that were in your Letters and Numbers lists, respectively
```
How does it compare to the "Some college" value for Orange County, California?
How do the values for these two counties compare to the national average "Some college" value?
```
  #Hint: Use the .mean() function. Syntax:
  df["Column1"].mean()
```

Pandas Exercises¶

Setup¶

Exercises¶