Last Update: 2019-01-31 00:09:44

GitHub Repository: sv4u/goalie-and-skater-heat-maps

In the NHL, players have their *sweet-spots*. For skaters, it can be the top of a certain circle, or right in-front of the net. For goalies, it could be where their vision is best and where they have the best angle to cut down a shot. All players have these sweet-spots, but it is difficult to analytically say where they are. By using shot location data, we can determine these locations and create models show where goalies and skaters need improvement and where they succeed.

Before we jump in, letâ€™s clean up our R environment and also load in some libraries we will be using.

```
rm(list = ls())
library(purrr)
library(ggplot2)
```

To start, we need to read in our data. Our data is formatted nicely in CSV format. We have data from the, 2015-2016 season, 2016-2017 season, 2017-2018 season, and 2018-2019 season (up to 1/30/19). This data was downloaded from MoneyPuck. Letâ€™s first start by loading in all three seasons of data:

```
data.2015 = read.csv("data/2015.csv")
data.2016 = read.csv("data/2016.csv")
data.2017 = read.csv("data/2017.csv")
data.2018 = read.csv("data/2018.csv")
```

Note: this will take a

relativelylong time to compute as the datasets are large. Each dataset contains all shot data (includingplayoffs).

Weâ€™ll only look at regular season data. The playoffs in the NHL are a beast of their own.

```
get.regular.season = function(data) {
subset(data, isPlayoffGame == 0)
}
season.2015 = get.regular.season(data.2015)
season.2016 = get.regular.season(data.2016)
season.2017 = get.regular.season(data.2017)
season.2018 = get.regular.season(data.2018)
```

Now that we have our data, we can remove extraneous columns. Here is a table of what columns we are keeping, and what we are renaming them to:

Old Column | New Column |
---|---|

xCordAdjusted | x |

yCordAdjusted | y |

goal | goal |

shotAngleAdjusted | angle |

goalieNameForShot | goalie_name |

shooterName | skater_name |

game_id | game |

Now, here is the R code to do this subsetting of the original dataset.

```
get.helpful.data = function(data) {
data.frame(x = data$xCordAdjusted,
y = data$yCordAdjusted,
goal = data$goal,
angle = data$shotAngleAdjusted,
goalie_name = data$goalieNameForShot,
skater_name = data$shooterName,
game = data$game_id)
}
analysis.2015 = get.helpful.data(season.2015)
analysis.2016 = get.helpful.data(season.2016)
analysis.2017 = get.helpful.data(season.2017)
analysis.2018 = get.helpful.data(season.2018)
```

Now, we have all the data we need.

From our data, we can calculate some very important statistics like the following:

- Goal Percent: goals per total shots
- Save Percent: saves (total shots - goals) per total shots
- Shots per Goal: total shots per goal

Additionally, we can break up our data by game. There are some generic functions we can write to help for both goalies and skaters. Letâ€™s write them now!

```
get.goal.percent = function(data) {
shots = length(data$goal)
temp = subset(data, goal == 1)
goals = length(temp$goal)
goals / shots
}
get.save.percent = function(data) {
shots = length(data$goal)
temp = subset(data, goal == 1)
goals = length(temp$goal)
(shots - goals) / shots
}
get.shots.per.goal = function(data) {
shots = length(data$goal)
temp = subset(data, goal == 1)
goals = length(temp$goal)
shots / goals
}
```

Note: when using

`get.shots.per.goal`

, if there were no goals scored, R will handle the division by zero by returning infinity. This will be problematic when graphing data. I am still working on a good solution to this problem. Earlier, I used 200 as a substitute value. However, 200 still skews graphs, which is unideal.

```
get.games = function(data) {
unique(data$game)
}
get.single.game = function(data, game_id) {
subset(data, game == game_id)
}
get.all.games = function(data) {
games = get.games(data)
Map(function(x) get.single.game(data, x), games)
}
```

Now, we can create our game by game statistic functions:

```
get.game.goal.percent = function(data) {
gameframe = get.all.games(data)
games.gp = map(gameframe, function(x) get.goal.percent(x))
unlist(games.gp, use.names = FALSE)
}
get.game.save.percent = function(data) {
gameframe = get.all.games(data)
games.sp = map(gameframe, function(x) get.save.percent(x))
unlist(games.sp, use.names = FALSE)
}
get.game.shots.per.goal = function(data) {
gameframe = get.all.games(data)
games.spg = map(gameframe, function(x) get.shots.per.goal(x))
unlist(games.spg, use.names = FALSE)
}
```

Also, weâ€™ll need a function to get match-ups between a specific goalie and skater. Letâ€™s write that here, instead of in our goalies *and* our skaters sections.

```
get.matchup.data = function(data, goalie, skater) {
subset(data, goalie_name == goalie & skater_name == skater)
}
```

Weâ€™ve now written our generic data handling functions.

Letâ€™s first start with a function to get data for a specific goalie.

```
get.goalie.data = function(data, name) {
subset(data, goalie_name == name)
}
```

Letâ€™s first start with a function to get data for a specific skater.

```
get.skater.data = function(data, name) {
subset(data, skater_name == name)
}
```

Given specific data, we should be able to graph the location of shots. Letâ€™s write a function that uses ggplot to do so.

```
graph.shot.locations = function(data, primary, secondary, name) {
plot = ggplot(data) +
geom_hex(aes(x = x, y = y, alpha = log(..count..)), fill = primary, color = secondary) +
labs(title = paste(name, "Shot Locations", sep = " "), x = "X Position", y = "Y Position") +
theme_minimal()
plot
}
```

To see a test of what this does, letâ€™s quickly make a graph of Roberto Luongoâ€™s shots against him.

```
luongo = get.goalie.data(analysis.2017, "Roberto Luongo")
plot = graph.shot.locations(luongo, "#041E42", "#C8102E", "Roberto Luongo")
plot
```