The Future of Australian Energy Prices
Time-Series Analysis of Historic Prices and Forecast for Future Prices
1. Introduction
1.1. Background
In recent months, media outlets have notified the public about fluctuations in energy prices with headlines such as Australia’s High Electricity Prices the ‘New Normal’, Report Says (Hutchens, 2018), and Higher Energy Prices are Here to Stay — Here’s What We Can Do About It (Percival, 2018) and ‘No Likelihood of Relief Ahead’: Future Power Prices Continue to Rise (Latimer, 2018). These articles create a sense of concern due to the impact to Australian’s financial wellbeing. However, there is very little fact in these articles that are grounded in statistical evidence.
While these articles may have lacked academic rigour and rhetoric, the sentiment is still reflected in academic literature. Sardar (2015) justifies in his 2015 article entitled Research and Development, Welfare and Efficiency: An Australian Energy Perspective that increasing numbers of Australians are being driven to welfare as a direct result of Energy Prices. Moreover, in a 2017 article entitled Australian Energy Policy and Economic Rationalism, Horan et al. (2017) accuse the Australian Government of having irrational and inefficient energy policy, which is placing increasing and unnecessary financial pressure on Australian households and businesses. Furthermore, Lincoln (2012) proposes a succinct set of options for change which may curb this pressure, as articulated in the his article Options for Change in the Australian Energy Profile. As shown, the landscape of the Australian Energy Market environment is changing, and this trend may have dire consequences for the future of the Australian economy.
Therefore, with the intent to add some statistical rigour to the discourse around the Australian Energy Prices, this paper aims to model the aggregated monthly Energy Point Price in order to create a prediction of the for the future. The data is extracted from an Australian Government website, visualised, analysed, tested, and then forecast, in order to create such prediction. The resulting prediction will allow citizens to adequately plan for the future, and can also provide advice back to Governmental Agencies in order to advise future policy.
1.2. Research Question
Is the future energy price (monthly mean aggregate) able to be predicted solely using univariate time series data?
1.3. Data Source
The data used in this analysis has been obtained from the official website for the Australian Energy Market Operator (AEMO) (www.aemo.com.au). The data obtained from AEMO had the following characteristics and limitations:
- The raw data was an Energy Point Price, obtained at half hourly increments.
- The data was split by State.
- The prices for the Australian Capital Territory was included in the prices or New South Whales.
- The prices for the Northern Territory were missing from the website.
- The prices for Western Australia were in an inconsistent format and inconsistent time-stamp to the rest of the country, and was not able to be merged together.
- The website only included data that dated back to 1999.
2. Data Exploration
Having aggregated the data in to an average monthly price, the AEMO data can be explored to establish its trend, seasonality, stationarity, and regularity. With reference to Figure 1, the following conclusions can be drawn:
- The data does not indicate a seasonal increase and decrease in price; neither by month nor by year.
- The data trend indicates a steady increase in price per year, as shown by the fitted linear model line.
- At approximately the year 2017, there appears to be a step increase in prices across all states.
- Around the end of 2009 and the start of 2019, there appears to be a substantial spike in prices which is consistent across all states.
- Each State shows a different level of stability, along with a differing level of price increase. Noting the following differences per state:
- 5.1. QLD appears to have the least steep linear model, as given by the slope of the line being
1.8
, while VIC has the steepest with the slope being2.4
. - 5.2. While QLD, SA and VIC appeared to have relatively stable prices between the years 2002 and 2008, NSW appeared to have much more unstable prices in the same period.
- 5.3. The prices for VIC and for SA appear to have the highest average price between the hears of 2017 and 2019.
In addition to visualising the observed prices, as displayed in Figure 1, the time-series data can also be decomposed down to it’s relevant attributes. The two most pertinent attributes for time-series is the Trend and the Seasonal attributes. As shown in Figure 2, the relevant states each have a distance trend and seasonality. The Residual attribute is the residual data, remaining after the data has been decomposed to its Trend and the Seasonal particulars.
This figure has had a smoothed trend line added to the Trend plot. This trend line shows a distinct and prominent upward trend for all states, indicating that this upward trend is likely to continue in to the future, and will increase its price over time.
3. Testing the Data
In order to establish whether the data is suitable for time series analysis and prediction, a number of statistical tests need to be applied to the data. Namely:
- Test for Whitenoise
- Test for Stationarity
- (optional) Test for Seasonality
- (optional) Test for Regularity
- (optional) Test for Stability
- Test for Auto-Correlation
3.1. Test for Whitenoise
The test for Whitenoise is intended to establish whether or not the data is just random points across a period of time. If the data is ‘whitenoise’ (ie. is random data points), then the data cannot be used for time-series forecasting.
For this, the Box-Ljung Test for Whitenoise (Ljung & Box, 1978; R-Core, nd.) was applied to the AEMO data, with the results displayed in Table 1. As seen by each of the tests returning a value of less than the threshold (0.02
), then each of the states are not whitenoise; therefore the data can be used for time-series forecasting
3.2. Test for Stationarity
The test for Stationarity is intended to establish whether or not the data is stationary or not. By declaring that the data is ‘stationary’ indicates that the data does not vary sufficiently per period. In this instance, the period is by year; thus, the data must vary sufficiently per year so as the time-series forecasting can forecast the values for the proceeding periods. Srivastava (2015) provides three effective pictorial examples of non-stationary data, as displayed in Figure 3 for non-stationary Means, Figure 4 for non-stationary Variance, and Figure 5 for non-stationary Covariance.
The KPSS Unit Root Test (Kwiatkowski et al., 1992; Pfaff, nd.) determines if the data is stationary, with a p-value less than the threshold (0.02
) indicating that the data is stationary. This test has been applied to the AEMO data set, with the results displayed in Table 2. The results of this test shows that the data sets are not stationary, and can be used for time-series forecasting.
3.3. Test for Seasonality
The test for Seasonality is not a necessary test before doing a time-series forecasting. However, it is beneficial to understand to what extent does the data vary or remain consistent throughout each period of analysis. By declaring that the data is ‘seasonal’ is to say that the peaks and troughs are the same for each of the periods, and that the future seasonal periods could be predicted with a reasonable level of confidence.
For this, there are two seasonality tests that can be applied to the AEMO data, the first being the QS test (Ollech, 2019; Sax, nd.), and the second is the Seasonal Strength test (Yang & Hyndman, 2019; Hyndman, nd.(a)). The QS test will to determine its level of seasonality, while the Seasonal Strength test will determine the strength of the seasonality. The results of these tests are displayed in Table 3. The threshold p-value are 0.02
, indicating that if a value is below this threshold, then it is seasonal; but if it is above this threshold, then it is not seasonal. As shown, the data in SA is seasonal, while the other states are not seasonal. This information can be used to control the hyper-parameters in the ARIMA forecasting model.
In addition to the formal tests for Seasonality, the seasonality of the data can also be visualised. As shown by the ‘Seasonal’ plots within Figure 2 (and plotted again in Figure 6), the data for all four states can be seen as having a semi-seasonal trend. This is because there is no smooth undulation between the seasons; only sharp sporadic spikes throughout each period.
3.4. Test for Regularity
Like with testing for Seasonality, it is not necessary to test for Regularity in order to produce a time-series forecast. However, it is beneficial to do as it provides information about the attributes for the data. To state that the data is ‘regular’ is to say that the data point are evenly spaced, regularly collected, and not missing data points (ie. do not contain excessive NA values). Logically, it is not always necessary to conduct the Test for Regularity on automatically collected data (like for example with Energy Prices, or Daily Temperature), however if this data was collected manually then it is highly recommended. If the data does not meet the requirements of Regularity, then it is necessary to return to the data collection plan, and revise the methodology used.
For the AEMO data, the Is Regular (Zeileis & Grothendieck nd.; Zeileis nd.) test was conducted, with the resulting outcome reported in Table 4. As shown, all of the states meet the requirements for regular data points, and thus can be used for time-series forecasting.
3.5. Test for Stability
Like with the Test for Seasonality and the Test for Regularity, the test for Stability is not a necessary test in order to perform time-series forecasting. It is, however, quite beneficial as a measure of how much the data varies over each period of time. If a data-set is to be ‘stable’, that means that the means of each time period do not vary dramatically over time. In other words, the higher the variance between the means of each time-period, the more unstable the data is.
For the AEMO data, there are two tests which can be used: the Test for Stability (Yang & Hyndman, 2019; Hyndman, nd.(a)) and the Test for Lumpiness (Yang & Hyndman, 2019; Hyndman, nd.(a)). While the Stability test measures the variance of the means, the Lumpiness test measures the variance of the variances. For both of these measures, they simply indicate the extent to which each series varies by. The limits for this test are 0
and 1
, whereby, a score of 0
would indicate a perfectly stable (or perfectly smooth) data set, while a score of 1
would indicate a completely unstable (or completely sporadic) data set. As displayed in Table 5, the measures for NSW and VIC are somewhat stable, while the other states are not. Noting that the measures are very close to the threshold. However, all four states are recordedly not lumpy.
3.6. Test for Auto-Correlation
An important test to do on Time-Series data is to measure it’s level of Auto-Correlation (McMurry & Politis, 2010; Hyndman, nd.(b)). While ‘correlation’ refers to how two variables change based on the other’s value, ‘auto-correlation’ is how a variable changes based on it’s own value over time (the phrase “auto” refers to “self”). For the Auto-Correlation Function, it uses a ‘lag
’ function. For example, a lag value of 0
is 100% correlated, which is logical, because that is it’s own value; whereas a lag value of 1
or greater, the level of auto-correlation decreases as it get’s further away from lag0
.
For well-structured time-series data sets, it would be expected to see a conical-shaped Auto-Correlation plot. If it were not a well-structured time-series data set, then this Auto-Correlation plot would look more like white noise, and there would not be any logical shape. The blue dotted lines are included as a reference point for determining if any of the observations are significantly different from zero.
Moreover, analysis of the data’s Auto-Correlation (ACF) should be combined with analysis of its Partial Auto-Correlation (PACF). While the ACF is the “direct” relationship between an observation and it’s relevant lag observation, the PACF removes the “indirect” relationship between these observations. Effectively, the Partial Auto-Correlation between lag1
and lag5
is the “actual” correlation between these two observations, after removing the influence that lag2
, lag3
, and lag4
has on lag5
.
What this means is that the Partial Auto-Correlation plot would have a very high value at lag0
, which will drop very quickly at lag1
, and should remain below the blue reference lines for the remainder of the Correlogram. The observations of lag>0
should resemble white noise data points. If it does not resemble white noise, and there is a distinct pattern occurring, then the data is not suitable for time-series forecasting.
When applied to the AEMO data, as displayed in Figure 8, the following conclusions can be drawn:
- (ACF) All four states are suitable for use in time-series forecasting due to their conical shape;
- (PACF) The data is relatively stable, due to the fact that the vast majority of the data points are falling within the blue limit lines.
- (ACF) There is a slight increase in correlation between
lag50
andlag70
, which is congruent with the trend pattern increase in price between 2013 and 2015 (Figure 2). - (PACF) All four states are suitable for use in time-series forecasting due to:
- 4.1. Their rapid drop between
lag0
andlag1
; - 4.2. Their constant, random pattern in
lag>0
; and
4. Forecast
4.1. Context
The result of having applied this testing then allows the data to be forecast forward to create a prediction for the future. The chosen prediction model for this forecast is the ARIMA model. ‘ARIMA’ is an acronym for ‘Auto-Regressive Integrated Moving Average’ (Kang, 2017), and is broken in to three parts in order to make the model fit the data as well as possible:
- Auto Regressive: Indicating level to which an evolving variable (predictor) is regressed (predicted) based on it’s own lagged (previous) observed values.
- Integrated: Indicating the level of differencing to be applied to the data between the observed value (predictor) and an observed value in the previous time step (previous). Effectively, by doing this subtraction allows the properties of the time-series data to not depend on the time of the observation, thus eliminating trend and seasonality, and then also stabilises the mean of the time series.
- Moving Average: Indicating the level of dependency between an observed value (predictor) and the residual error from a moving average model applied to it’s own lagged (previous) observed values.
4.2. Prediction
This ARIMA model thus being applied to the AEMO data produces a prediction as recorded in Figure 9. This figure has the following features:
- The coloured line indicates the state.
- The darker ribbon is the forecast prediction with an 80% confidence interval.
- The lighter ribbon is the forecast prediction with an 90% confidence interval.
- The thick black line is the actual observations, for the data has been split in to Test and Train data sets.
Upon analysis of this forecast, the following predictions can be made:
- The forecast for NSW and QLD have a ribbon shape, while SA and VIC have a conical shape.
- The forecast for QLD and VIC have a relatively stable, flat prediction, while the forecast for NSW and SA have a slightly upward trend.
- All four states have a wide level of uncertainty (≈±$40 per Gwh)
4.3. Accuracy
Using the data in the Test/Train split, the level of accuracy for the prediction can be calculated.
There are four measurement scores shown in Table 6, being:
- Root Mean Square Error (RMSE),
- Mean Average Error (MAE),
- Mean Absolute Percentage Error (MAPE), and
- Mean Absolute Scaled Error (MASE).
The chosen metric for this analysis is the RMSE due to it’s ability to punish scores that are further away from the prediction. For RMSE, a lower score is better, as it indicates a lower amount of error. As shown in this table, the NSW Prediction scored best in all four metrics, while VIC scored the worst. This result is in alignment with the actual scores shown in Figure 9, because NSW is closest to the prediction while VIC is consistently the furthest away.
4.4 Long-Term Forecast
Using this trained model, the forecast is then projected forward to the year 2026, as displayed in Figure 10. Analysis of this projected forecast indicates the following:
- That the long-term energy prices will not be significantly different than the prices seen in the year 2019.
- Due to the sporadic nature of the historic prices, the level of uncertainty of this forecast gets wider in VIC and SA; as seen by the conical shape of the forecast.
- The shape of the NSW and QLD forecasts are more ribbon-like, indicating less volatility in future prices.
5. Findings
Based off the analysis conducted, and the forecast predicted, there are four key findings.
- History of Energy Prices: Firstly, the Energy Prices for all four states were comparatively stable up until circa 2013, after which the prices have begun to escalate at an increasingly steep rate. Combine this with the sporadicity of the seasonal prices for the states indicates that a time-series forecast for the future of energy prices may not be perfectly accurate.
- Suitable for Forecasting: Secondly, while the data may not be perfectly accurate for time-series prediction, it is still suitable for forecasting. That is because it is:
- Not Whitenoise;
- Not Stationary;
- Not Seasonal (or Marginally Seasonal, as with the SA data);
- Is Regular;
- Is Moderately Stable (all scores close to the Stability Threshold);
- Not Lumpy; and
- Is Auto-Correlated.
- Relatively Predictable: Thirdly, having fed the data in to an ARIMA forecasting model, the predictions are relatively accurate, with predictions falling between 40% inaccuracy (as with NSW), and 90% inaccuracy rate (as with VIC).
- Forward Projections: Fourthly, using the forecast model to project forward, and combined with the historic trends, it can be concluded that the energy prices will not be significantly different than the prices seen in the year 2019. The exact energy price could vary dramatically in this forecast due to the broad scope of the actual forecast model.
6. Limitations
Resulting from this research, a number of limitations have been identified. As listed:
- The Future prices are simply based on a single input variable: Past prices.
- The Time-Series model does not take in to account external variables which may affect energy prices, such as energy production or energy capacity or percentage of renewable energy in the Grid.
- The Analysis does not take in to account Government Policies which may have influenced the prices.
7. Opportunities for Future Research
In order to further increase the accuracy of the prediction model, and to address some of the identified limitations, there are some other opportunities for future research. Including:
- Investigate the amount in which the Federal Legislative Landscape has changed since circa 2015, and the extent to which that has influenced energy prices.
- Correlate the energy price fluctuations with the closures of energy production plants.
- Use the results of this prediction as an ensemble feature in a multivariate regression model to predict Future Energy Prices. Other features could potentially include:
- 3.1. Percentage breakdown of Renewable vs. Non-Renewable Energy sources per state per month.
- 3.2. Amount of Energy Produced (or Energy Capacity) per state per month.
- 3.3. Average temperature (or temperature range) per state per month.
It is recommended that these opportunities be explored in full, so as to provide a detailed prediction model for future energy prices.
8. Conclusion
In conclusion, analysis of the AEMO historic energy prices provides a suitable method of forecasting future prices. Exploration of the data revealed that the prices have indeed been increasing over the last 20 years, and have begun an exponential increase since approximately 2013. Moreover, having applied various statistical tests to the data, the data is suitable for time-series forecasting, due to it being not stationary, not whitenoise, marginally seasonal, and moderately stable.
With the intention of wanting to explore whether the future energy prices are able to be predicted solely using univariate time series data, the answer is yes. The historic AEMO data is regular and univariate, and is able to be fed in to an ARIMA forecasting model to predict the future prices to a certain level of confidence. This confidence interval is tighter for NSW and QLD, and very broad for VIC and SA. However, the accuracy level of this model can be greatly improved when modeled in conjunction with various other external influences, as outlined in Future Opportunities. Therefore, this research is able to provide assistance to Australian households and businesses, and is able to advise Government policy to curb this trend.
9. References
Australian Energy Market Operator (AEMO) nd., viewed 1/Oct/2019, <https://www.aemo.com.au/>.
Horan, S, McGrath, T, & Santha, N 2017, ‘Australian energy policy and economic rationalism’, Energy News, vol. 35, no. 3, pp. 16–7, ISSN: 1445–2227.
Hutchens, G 2018, ‘Australia’s high electricity prices the ’new normal’, report says’, The Guardian, viewed 10/Oct/2019, <https://www.theguardian.com/australia-news/2018/jul/01/australias-high-electricity-prices-the-new-normal-report-says>.
Hyndman, R nd.(a), ‘stl_features’, R Documentation, viewed 28/Oct/2019, <https://www.rdocumentation.org/packages/tsfeatures/versions/1.0.1/topics/stl_features>.
Hyndman, R nd.(b), ‘Acf’, R Documentation, viewed 1/Nov/2019, <https://www.rdocumentation.org/packages/forecast/versions/8.9/topics/Acf>.
Kang, E 2017, ‘Time Series: ARIMA Model’, Medium, viewed 30/Oct/2019, <https://medium.com/@kangeugine/time-series-arima-model-11140bc08c6>.
Kwiatkowski, D., Phillips, P., Schmidt, P., & Shin, Y. 1992, ‘Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root?’, Journal of Econometrics, vol. 54, no. 1, pp. 159–178, DOI: 10.1016/0304–4076(92)90104-Y.
Latimer, C 2018, ‘’No likelihood of relief ahead’: Future power prices continue to rise’, The Sydney Morning Herals, viewed 10/Oct/2018, <https://www.smh.com.au/business/the-economy/no-likelihood-of-relief-ahead-future-power-prices-continue-to-rise-20181030-p50cu1.html>.
Lincoln, S 2012, ‘Options for Change in the Australian Energy Profile’, AMBIO, vol. 41, no. 8, pp. 841–50, DOI: 10.1007/s13280–012–0315–0.
Ljung, G & Box, G 1978, ‘On a measure of lack of fit in time series models’, Biometrika, vol. 65, no. 2, pp.297–303, DOI: 10.2307/2335207.
McMurry, T., & Politis, N. 2010, ‘Banded and tapered estimates for autocovariance matrices and the linear process bootstrap’, Journal of Time Series Analysis, vol. 31, no. 6, pp. 471–482, DOI: 10.1111/j.1467–9892.2010.00679.x.
Ollech, D 2019, ‘seastests — Seasonality tests’, R Vignette, viewed 28/Oct/2019, <https://cran.r-project.org/web/packages/seastests/vignettes/seastests-vignette.html>.
Percival, L 2018, ‘Higher energy prices are here to stay — here’s what we can do about it’, The Conversation, viewed 10/Oct/2019, <http://theconversation.com/higher-energy-prices-are-here-to-stay-heres-what-we-can-do-about-it-99187>.
Pfaff, B nd., ‘ur.kpss’, R Documentation, viewed 28/Oct/2019, <https://www.rdocumentation.org/packages/urca/versions/1.3-0/topics/ur.kpss>.
R-Core, nd., ‘Box.test’, R Documentation, viewed 28/Oct/2019, <https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/Box.test>.
Sax, C nd., ‘qs’, R Documentation, viewed 28/Oct/2019, <https://www.rdocumentation.org/packages/seasonal/versions/1.2.1/topics/qs>.
Srivastava, T 2015, ‘A Complete Tutorial on Time Series Modelling in R’, Analytics Vidya, viewed 29/Oct/2019, <https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/>.
Sardar, P 2015, ‘Research and development, welfare and efficiency: an Australian energy perspective’, International Journal of Flobal Energy Issues, vol. 11, no. 1, pp. 155–60, ISSN: 0954–7118.
Yang, Y, & Hyndman, RJ 2019, ‘Introduction to the tsfeatures package’, R Vignette, viewed 28/Oct/2019, <https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html>.
Zeileis, A, & Grothendieck, G nd., ‘zoo: An S3 Class and Methods for Indexed Totally Ordered Observations’, R Vignette, viewed 28/Oct/2019, <https://cran.r-project.org/web/packages/zoo/vignettes/zoo.pdf>.
Zeileis, A, nd. ‘is.regular’, R Documentation, viewed 28/Oct/2019, <https://www.rdocumentation.org/packages/zoo/versions/1.8-6/topics/is.regular>.
Post Script
Acknowledgements: This report was compiled with some assistance from others. Acknowledgements go to:
- Yan Holtz for his code for how to add the footer elements (https://holtzy.github.io/Pimp-my-rmd/ & https://github.com/holtzy/epuRate).
- Tim Holman for his code for how to add the GitHub corner (https://github.com/tholman/github-corners).
- William Dai for his assistance to write the scripts to web-scrape the AEMO website.
- Michael Gordon for his assistance to write the scripts to web-scrape the BOM website.
Publications: This report is also published on the following sites:
- RPubs: RPubs/chrimaho/AusEnergyPrices
- GitHub: GitHub/chrimaho/AusEnergyPrices
- Medium: Medium/chrimaho/AusEnergyPrices
- LinkedIn: LinkedIn/chrimaho/AusEnergyPrices