data:image/s3,"s3://crabby-images/fd369/fd369eabe23578c23666864e4d9a9e53407afb04" alt="Nfl play by play data"
data:image/s3,"s3://crabby-images/a2d5d/a2d5d0ccb75dc4786aaf04a0b9d7377deedb6059" alt="nfl play by play data nfl play by play data"
The data returned by import_pbp_data() does not include any player names, but we do have columns for player IDs. Now, we need to join our filtered dataframe with roster and team data to get player names and team colors for our plot. We should now have all of the plays of interest for our analysis. # filter to pass plays df_2021 = df_2021 = "pass"] Hopefully now you are starting to see the power of the high level of granularity presented to us by nfl_data_py. Again, we have a key called play_type that we can use to filter to pass plays. Now, naturally since we are interested in passing statistics, we can additionally filter down to only the passing plays. # remove two point attempts df_2021 = df_2021 = False] Luckily for us, there is a boolean column called two_point_attempt that we can directly use to do this for us. The next subtlety that we must deal with is to filter out any two-point conversions since these yards do not contribute to the overall statistics. # filter to regular season df_2021 = df_2021 = "REG"] Filtering the data to passing playsįirstly, the current play-by-play dataframe contains all the of the plays from the regular season and postseason, so we are going to want to filter to only plays from the regular season. In this article we will examine passing yards and touchdowns, so we will have to do a bit of filtering with pandasfirst to get the dataframe to the form that we want.
data:image/s3,"s3://crabby-images/a7293/a7293d8e88df1c44009c97d4f02d40d956bfcf89" alt="nfl play by play data nfl play by play data"
We have 372 unique columns for each play! So many so that they don’t all print by default - with this dataset the possibilities for the analyses that you could do are basically endless. Just as an example lets print out the columns in the dataset: # print columns df_lumns > Index(, dtype='object', length=372) The data we have loaded into df_2021 has extremely rich play-by-play data taken from the NFL API. # import packages import pandas as pd import aph_objects as go import nfl_data_py as nfl # load data df_2021 = nfl.import_pbp_data() df_players = nfl.import_rosters() df_teams = nfl.import_team_desc() Now, lets import our packages and load three datasets for the 2021–22 NFL season: play-by-play data, roster data, and team info. To get started, we need to make sure we have the required packages for our analysis. In this article, I will walk through pulling in data using nfl_data_py and creating two visualizations for passing yards and touchdowns using plotly. Fortunately for us, there is an awesome Python package called nfl_data_py that allows us to pull play-by-play NFL data and analyze it. I’m a huge San Francisco 49ers fan and also a data visualization enthusiast, so I’ve always looked for opportunities to mix sports data with data science and analytics.
data:image/s3,"s3://crabby-images/fd369/fd369eabe23578c23666864e4d9a9e53407afb04" alt="Nfl play by play data"