Welcome to the third part of the blog series about Load Forecasting. In this series of tutorials, I will guide you through the whole process of a load forecasting workflow, from preparing the data to building a machine learning model. I will provide a lot of tips and tricks that I have found useful throughout the time.
This is a blog version of the tutorial originally made in Jupyter notebook. If you are interested in replicating this tutorial, I strongly suggest to check the notebook, which is available here.
In this blog, I will guide you through the load forecasting benchmark model development based on Multiple Linear Regression (MLR). This model was also used at Global Energy Forecasting Competitions (GefCom) as a baseline model. I will explain the basics behind the feature engineering step by step.
How to model the load with MLR?
MLR is the most widely used method for load forecasting and achieves remarkable results in industry and on load forecasting competitions. Although MLR as an algorithm is very easy and simple to understand at first, modeling various problems is not that straight-forward. You cannot just put something into the model as with Neural Networks and expect it to magically work.
MLR is a linear model, but this does not mean you cannot model non-linear dependencies. All you need is an appropriate feature engineering in order to linearize the problem.
Prerequisites to fully understand this tutorial:
I strongly suggest reading at least the first 120 pages of an excellent and free book: An Introduction to Statistical Learning.
If you haven't watched Andrew Ng's ML course, you have to watch it now. Watch at least the first part about MLR.
Input data consist of a load time series (denoted as y ) in MW from 2013-1-1 until 2017-12-31 with hourly resolution and a temperature (denoted as temp ) in °C for the same time period.
To fully understand the data, please check the second blog from this tutorial about Exploratory Data Analysis.
When modeling with MLR it is all about good feature engineering.
Typical features used for load forecasting are:
categorical features derived from timestamps (calendar) and
numerical features:
temperature (or some other weather variables) and
past load (in case of short-term forecasting)
As I mentioned in my previous blog on Exploratory Data Analysis, load depends on three major factors:
Calendar (the hour of the day, the day of the week and the time of the year).
Weather (especially temperature, this is due to heating and cooling devices).
Economic growth - trend (load generally grows every year).
Now, let's follow the assumptions above and explore how to model this with MLR. In further examples, we will use the whole dataset (no train/test splitting) to simplify explanation and understand how to learn/capture a load time series data with MLR.
The toturial is divided in three major chapters: 1. modeling calendar features, 2. modeling temperature and 3. modeling trend.
1. MODELING CALENDAR FEATURES
1.1. Features used: only constant
In most cases in machine learning, you want to minimize the Residual Sum of Squares (RSS). This means, that if you would search for a single value that would minimize RSS of your model, that value would be a mean target value (in our case true load). Let's look at what this means. Let's build a dataset matrices 𝑋 and 𝑦 and use only one constant feature to build our model and set fit_intercept=False (because sklearn fits intercept by default and adds a constant).
The resulting value 16.944 is the same as mean target value.
Remember that when minimizing RSS which is the same as minimizing Mean Squared Error a single value that will minimize this criteria in your model is a mean target value. In case of minimizing Mean Absolute Error it is median.
1.2. Features used: hours of the day
Now, let's try using only hours of the day as features in our model. This time we will not fit a constant and we will also not drop one dummy variable (will be explained later).
What do we learn now?
Instead of using MLR, we can calculate mean target value for every hour separately and the result will be the same.
This is very important to understand and I suggest you keep this in mind!
Now, let's plot the prediction and the target for a randomly chosen week and observe the predicted profile.
The model can not capture anything else than the hour of the day, which results in average behavior during the whole time. When actual profiles are above average, predicted will be below and vice versa. It can be clearly seen that during the working days (6 until 10 Janurary) when the demand is usually higher predictions are lower than actual demand and vice versa during weekends (11 and 12 Janurary). Since the model can not distinguish between different months, neither days this leads to the same daily predictions during the whole time.
What about colinearity and dropping one dummy?
When modeling categorical features with dummy variables one category has to be dropped in order to avoid colinearity or do not use a constant by setting fit_intercept=False. We will obey this in further calculations.
1.3. Features used: month + day of the week + hour of the day
Now, let's add dummy variables for months, day types and hours so that our model will able to incorporate it.
It can be seen that MAPE improves significantly. With the above-added features, the model is able to learn one parameter for every month, day type and hour separately which also impacts the daily shape of predicted profiles.
1.4. Features used: month + day of the week x hour of the day
Now, let's add dummy variables for months and use the interaction between day types and hours. If we model the day of the week and the hour of the day as in the previous section we can model this behavior with 7 + 24 = 31 parameters (before dropping last dummies). In case of using interactions between the day of the week and hours, this behavior is modeled with 7 x 24 = 168 parameters (before dropping last dummies), which is much more and leads to modeling a particular hour on a particular day separately.
It can be seen that MAPE improves again. Let's try comparing daily profile shapes learned by different models in the figure below. For further understanding, I suggest to take a look at the original Jupyter Notebook and play with the data to truly understand the underlying concepts.
2. MODELING TEMPERATURE
The correlation between the load and the temperature is below. To understand it, first take a look at my previous blog on Exploratory Data Analysis.
How to model this with MLR?
First, you have to understand the concept of a piece-wise and polynomial linear regression. I suggest reading this chapter from an excellent book.
In the next few sections I will guide you step by step through piece-wise and polynomial linear regression modeling. By understanding this concept you will gain the intuition behind modeling of load-temperature correlation. In a supervised learning task the formulation is as follows:
Below is a sampled dataset. Actual function is marked with red and sampled data together with Gaussian noise is plotted with blue dots.
2.1. Modeling a data using original feature
First, let's try modeling the correlation using the original feature. As this correlation is not linear, our model fails to capture the data.
2.2. Segmentation of numerical feature into two features (piece-wise linear regression)
Instead of modeling a problem with the original numerical feature 𝑥, we create two features from an original feature by adding a knot at 𝑥 = 2 (i.e. one feature represents 𝑥 values when 𝑥 <= 2 and another feature represents 𝑥 values when 𝑥 > 2) since we can see that data changes there. These two new features are created from an original feature 𝑥, whereas one is defined only on a left side and another only on the right side of 𝑥-axis otherwise, it equals zero. This is the same as modeling the interaction of one numerical feature with one categorical feature.
Below is a results. It can be seen, that the model fits the data much better than before.
2.3. Adding 2nd degree polynomial
Another important thing to understand is polynomial regression, where we add polynomials (such as 𝑥^2 or 𝑥^3) of the original feature 𝑥 to the model. Let's add a square, so that we are able to model a quadratic function. It can be seen below, that the model fits the data quite better than originally with only one feature.
2.4. Adding 2nd degree polynomial with segmentation
And finally let's combine piece-wise and polynomial regression by adding 2nd degree polynomial and segmenting it. Look at the magic below! The model fits the data properly!
If you truly understand this, you will be able to model the correlation between the load and the temperature!
It is very important to understand how to artificially generate data and then learn it back. It truly deepens your understanding.
Next, we will use the knowledge about piece-wise and polynomial linear regression to add temperature to our existing model. As I have already explained in my previous blog load-temperature correlation is originally poor. Thus, by dividing the data depending on a month and hour, it becomes stronger. Therefore, we multiply months and hours (categorical dummy features) with temperature polynomials (combining piece-wise and polynomial regression as in the example above) to segment the original correlation depending on a month and hour.
2.5. Modeling temperature in a load forecasting
Features used:
Load and temperature correlation changes during the year (different seasons) and is different between various hours of the day. E.g.: correlation is not the same during the night time when people sleep than during the day time. For a detailed understanding look at the previous blog about Exploratory Data Analysis. Since we want to capture non-linear dependencies, we also add 3rd degree polynomials.
Why 3rd and not 2nd?
Because 2nd degree polynomial or quadratic function is symmetrical as opposed to the 3rd degree that can model asymmetrical dependencies.
To model different load-temperature correlations between months and hours, we add two sets of features:
interaction between temperature with monthly dummies and corresponding 3rd degree polynomials &
interaction between temperature with hourly dummies and corresponding 3rd degree polynomials.
As you can see above, MAPE significantly improves, by adding a temperature into the model.
3. MODELING TREND
Electricity load generally grows every year due to economic growth and we have to incorporate this in a model. Why? Assuming that behavior throughout the observed time span stays the same, load still increases a little bit. A trend is a general systematic linear or nonlinear component that changes over time and does not repeat. The easiest way to model trend is by adding an additional feature trend that assigns a natural number to each observation starting from the first timestamp to the end. Now, let's add feature trend to our existing model and observe the performance.
By adding trend, the performance improves a little bit.
Congratulation you have built the first MLR load forecasting model.
I suggest reading this excellent paper by prof. Tao Hong, where this benchmark was originally proposed.
Suggestion: every time you start with a load forecasting problem, first apply this benchmark and then start improving it. There are lots of things that can be improved. You can also use this code and try new features to further improve the performance and of course, use time-based cross-validation for evaluating your model performance.
If you find this blog useful, please share it with others and let me know your thoughts on a LinkedIn!
If you want to connect with other experts working in this field, join my LinkedIn group AI in Smart Grids where I post about this topic!
Comments