**Indeed, this determines the amount of generation needed to satisfy the electric load demand. On the short time range, this is accomplished by scheduling enough.
Missing: introa | Must include: introa**

**164 KB – 6 Pages**

PAGE – 1 ============

A publication of CCHHEEMMIICCAALLEENNGGIINNEEEERRIINNGGTTRRAANNSSAACCTTIIOONNSS VOL. 33, 2013The Italian Association of Chemical Engineering Online at: www.aidic.it/cet Guest Editors:Enrico Zio, Piero Baraldi Copyright © 2013, AIDIC Servizi S.r.l., ISBN 978-88-95608-24-2 ;ISSN 1974-9791 Singular Spectrum Analysis for Forecasting of Electric Load DemandHéctor Briceñoa, Claudio M. Roccoa*, Enrico Zio b.caUniversidad Central de Venezuela, Apartado Postal 47937, Caracas, Venezuela bChair on Systems Science and the Energetic Challenge, European Foundation for New Energy-Electricité de France, Ecole Centrale Paris and Supelec, F-92 295 CHÂTENAY-MALABRY Cedex cDipartimento di Energia, Politecnico di Milano, Nuclear Section-Cesnef, vía Ponzio 34/3, 20133 – Milano, Italy croccoucv@gmail.com This paper presents the technique of Singular Spectrum Analysis (SSA) and its application for electric load forecasting purposes. SSA is a relatively new non-pa rametric data-driven technique developed to model non-linear and/or non-stationary, noisy time series. SSA is able to decompose the original time series into the sum of independent components, which represent the trend, oscillatory behavior (periodic or quasi- periodic components) and noise. One of the main advantages of SSA compared to other non-parametric approaches is that only two parameters are required to model the time series under analysis. An example of application is given, with regards to forecasting the monthly electric load demand in a Venezuelan region served by wind power generators. In this case, careful demand estimation is required since the wind generation output could be highly variable and additional conventional generation or transmission links would be required to satisfy the load demand. A comparison with other classical time-series approaches (like exponential smoothing and Auto Regressive Integrated Moving Average models) is presented. The results show that SSA is a powerful approach to model time series, capable of identifying sub-time series (trends along with seasonal periodic components). 1. Introduction Power system planning and operation are important tasks that all electric utilities face. One of the factors that influence their success is load forecasting. Indeed, this determines the amount of generation needed to satisfy the electric load demand. On the short time range, this is accomplished by scheduling enough generation and transmission capacity (in real time) and by planning ne w generation allocation for the mid and long ranges. In the case of intermittent renewable generation sources, e.g. wind, the operator is faced not only with the load variation but also with that of the generation output. In this paper, Singular Spectrum Analysis (SSA) (Elsner and Tsonis, 1996) is applied for forecasting electric power load demand. SSA is a technique developed in dependently in the USA and UK under the name SSA, and in Russia under the name Caterpilar-SSA (Beneki et al., 2009). It is based on statistics and probability theory, dynamical systems and signal processing concepts and it is considered as a non- parametric or data-driven technique (Beneki et al., 2009) for modeling non-linear, non-stationary and noisy short time series (Hassani, 2007). SSA does not rely on a priori defined functions like, for example, the Fourier approach (based on sine and co sine functions), or statistical assumptions (e.g., stationarity, normality). One of the main advantages of SSA compared to other non-parametric approaches is that only two parameters are required to model the time series under analysis. The development of the technique and its application for electric load forecasting are illustrated by way of an example related to a Venezuelan region served by power generators. The remainder of the paper is organized as follows: Section 2 introduces the SSA approach; Section 3 presents the results obtained using SSA and the comparison with other classical time series techniques; finally, Section 4 presents some conclusions. DOI: 10.3303/CET1333154 Please cite this article as: Bri ceno H., Rocco C., Zio E., 2013, Singular spectrum analysis for forecasting of electric load de mand, Chemical Engineering Transactions, 33, 919-924 DOI: 10.3303/CET1333154 919

PAGE – 2 ============

2. Singular Spectrum Analysis (SSA) The main idea of SSA is to make a decomposition of the original time series into the sum of independent components, which represent the trend, oscillatory behavior (periodic or quasi-periodic components) and noise. Some of these components are selected and retained for the forecasting task. SSA has been used successfully in several areas like hydrology (Vautard et al., 1992), geophysics (Vautard and Ghil, 1989), economics (Carvalho et al., 2012), medical engineering (Sanei et al., 2011) among others. Lety=[y1,y2, –., yT] be a time series of length T. The SSA technique consists of two stages: decomposition and reconstruction (Golyandina et al., 2001). 2.1 Decomposition This stage is subdivided into two steps: Embedding and Singular Value Decomposition 2.1.1 Embedding The main result of this step is the definition of a trajectory matrix or a lagged version of the original time seriesy. The matrix is associated with a windows length L (L T/2, to be defined by the user). Let K=T- L+1; the trajectory matrix is defined as: X=[X 1, –XK]=+++T2L1LL431K432K321 yyyyyyyyyyyyyy (1) The trajectory matrix is a Hankel matrix, that is all the elements along the diagonal i+j=const are equal (Hassani, 2007). 2.1.2 Singular Value Decomposition (SVD) From X, define the covariance matrix XX t. The SVD of XX t produces a set of L eigenvalues 12–L 0 and the corresponding eigenvectors U 1, U2, –., UL (often denoted by Empirica l Orthogonal Functions (EOF)). Then the SVD of the trajectory matrix can be written as X= E 1+E2+–+Ed, where E i=iUiVit , d is the rank of XX t (that is the number of non-zero eigenvalues) and V 1, V2, –, Vd are the Principal Components (PC), defined as Vi=XtUi/i. The collection ( i, Ui, Vi) is referred to as the i-th eigentriple of the matrix X. If ==*d1ii, then i/* is the proportion of the variance of X explained by E i: E1has the highest contribution while E dhas the lowest contribution (Hassani et al., 2009). SVD could be time consuming if the length of the time series is large (say T > 1000). 2.2 Reconstruction The reconstruction stage consists of two steps: Grouping and Averaging. 2.2.1 Grouping In this step, r out of d eigentriples are selected by the user. Let I={i 1,i2,..ir} be a group of r selected eigentriples and X I= Xi1+Xi2+ – + X ir. XI is related to the ﬁsignalﬂ of y while the rest of the (d-r) eigentriples describe the error term .2.2.2 Averaging The group of r components selected in the previous sta ge is then used to reconstruct the deterministic components of the time series. The basic idea is to transform each of the terms X i1, Xi2, –,Xir into reconstructed time series yi1,yi2, –, yir through the Hankelization process H()or diagonal averaging (Golyandina et al., 2001): if z ij is an element of a generic matrix Z, then the k-th term of the reconstructed time series can be obtained by averaging z ij, if i+j=k+1. That means that H(Z) is a time series of length T reconstructed from matrix Z. At the end of the av eraging step, the reconstructed time series is an approximation of y: y=H(Xi1)+ H (Xi2) + – + H (Xir) + (2)920

PAGE – 3 ============

As pointed by Alexdradov and Golyandina (2005), the reconstruction of a single eigentriple is based on the whole time series. This means that SSA is not a local method and hence robust to outliers. 2.3 Forecasting The SSA forecasting approach uses the notion of Linear Recurrent Formulae (LRF) defined as: yT-i= a1L1kkiTky−−−=, 0 i T-L and a k&0. Broad classes of continuous series are governed by LRF, including harmonics, polynomial and exponential series. The coefficients a k are determined from the eigenvectors obtained from the SVD stage (see Golyandina et al., 2001 for details). 2.4 Parameter selection As previously mentioned only two parameters are required in SSA: the window length L and the number of components r. A detailed description of parameter selection is presented in Golyandina et al. (2001). Values for L and r could be defined using information provided by the time series under study or through additional indices: •The window length L : The general rule is to select L = T/2. However if the user knows that the time series may include an integer-period component s, then a better separability of components could be obtained by defining L proportional to that period (L=(T-js)/2, j=1, 2, ..) •The number of components r : The theory of separability, that is how well the components can be separated, is the basis for the definition of r. A general criterion is based on the contribution of each component to the variance of X, evaluated as i/* (==*d1ii): Select r out of d components so that the sum of their contributions is at least a predefined threshold, for example 90 %. In general, noise components have low contribution. • An index, defined as the weighted correlation w-correlation (Hassani, 2007), could also be used to quantify the degree of separability among two reconstructed components. Let X A and X B be two reconstructed time series: if w-correlation(XA, XB) is zero then the two series are separable. On the other side, if w-correlation (XA, XB) is high, then the components could be grouped. Finally, pairwise scatterplots of eigenvectors are used to detect harmonic components, whose frequencies can be estimated using a periodogram. In general, components with high frequency are associated to noise components. Currently, there are quasi-automatic algorithms to help the user in the determination of r (e.g., AutoSSA (Alexandrov and Golyandina, 2005) or Rssa (Korobeynikov, 2012)). 3. Case study SSA is used to decompose and forecast the monthly electric load demand in a Venezuelan region served by wind power generators. The time series (in MW) is composed by 240 data. As a usual procedure, the corresponding time series is divided into two sets: the first set ( training data set ), defined by the first T=228 observations, is used to ﬁtuneﬂ the mo del, while the second set ( testing data set ) is used to assess the performance of the model. Since data points are recorded monthly, at least a 12-period component is assumed (s=12). Results from SSA are compared with other classical time-series approaches (like exponential smoothing and Auto Regressive Integrated Moving Average models, as Briceño (2012)). SSA evaluations are performed using the Rssa procedure in R (Korobeynikov, 2012). For each technique, the following performance indexes rela ted to the residuals are pres ented: Mean Sq uared Error (MSE), Mean Absolute Percentage Error (MAPE), Box-Pierce statistic Q.3.1 SSA evaluation 3.1.1: Decomposition stage: Embedding and SVD Decomposition Figure 1 shows the contribution of each eigenvalue after the SVD stage, using L= (T-12)/2= 108 as the window length required for the embedding step. Note that the first eight eigenvectors account for almost 95 % of the variance of the time series. 3.1.2: Reconstruction stage: Grouping and Averaging Figure 2 shows the plot of the reconstructed components (RC). At a first glance, RC1 and RC2 are linked to slow moving components (i.e., the trend) while the rest of RC are associated to oscillating components. Figure 3 shows the plot associated to pairs of eigenvectors. The geometric figures associated to eigenvectors 5 and 6 and 7 and 8 show the existence of frequencies related to six and twelve months. The 921

PAGE – 4 ============

rest of the plots do not suggest any known geometric figure. The raw periodogram of the time series under study shows two significant components at 1/12 and 2/12=1/6, indicating the existence of monthly and six- month components, as previously detected. This fact could suggest the use of a new integer L value, proportional to both components detected (e.g., L=96, 84, –.). In order to evaluate the separability of the different eigentriples, Figure 6 shows a graphical representation of the w-correlation matrix. Each cell (Fi,Fj) represents the correlation between components i and j, coded from black (w-correlation = 1) to white ( w-correlation = 0). Note that component 1 is clearly separable and components 2-4 and 9-10 are almost separable ( w-correlation values < 0.50). Components RC5-RC6 and RC7-RC8 have w-correlationsequals to 1and, as previously mentioned, could be grouped. Finally, the reconstruction will be based on eigentriples 1 to 10, as shown in Figure 4, along with the forecast for the next 12 months. 3.1.3: Comparison with other approaches Table 1 shows the comparison of residual indexes between the SSA and two classical approaches, during the training phase. The additive Holt-Winter was the best method selected among exponential smoothing techniques while the best SARIMA model found was a (2,1,0)(2,0,2) 12:(1 + 0,28*B+ 0,17*B2)(1 - 0,48*B12 - 0,5*B24)Xt =(1 Œ 0,26*B 12 + 0,38*B 24)at (3) The SARIMA and SSA residuals are considered as random. Note that SSA performance is better than the other approaches. SSA is able to prod uce additional information abou t the decomposition of the time series, especially trend and harmonic components. Table 1 also shows the MSE and MAPE indexes during the testing phase. Again, SSA produce the best results. 4. Conclusions This paper presented the use of Singular Spectrum Analysis as a technique able to decompose and forecast a non-stationary and/or non-linear time series into a set of independent components. In the example presented, related to the electric load demand of a Venezuelan region, the performance of SSA was the best, compared with other classical time series approaches. SSA relies only on the definition of two parameters: the windows length L and the number of components r. As a general rule, L could be fixed as L=(T-s)/2 while r could be determined automatically from the values of the W-correlation matrix. Briceño (2012) shows that the best SSA model is obtained using L=84, with performance indexes slightly better that the SSA model presented using L=118. Figure 1 Eigenvector decomposition 922
PAGE - 6 ============
year year Figure 5: Reconstruction using eigentriples 1 to 10 Table 1: Residual indexes comparison Training Phase Testing Phase MethodMSEMAPE (%) Theil Box-Pierce Randomness MSEMAPE (%) AdDitive Holt-Winter 5.51.250.80100.4No24.92.68ARIMA (2,1,0)(2,0,2)126.01.150.4524.3Yes12.61.86SSA3.61.060.2623.1Yes7.01.26References Alexandrov Th., Golyandina N., 2005, Automatic extraction and forecast of time series cyclic components within the framework of SSA, Proceedings of the Fifth Workshop on Simulation, June 26-July 2, 2005, St. Petersburg State University, St. Petersburg, 45Œ50. Beneki C., Eeckels B., Leon C., 2009, Signal Extraction and Forecasting of the UK Tourism Income Time Series. A Singular Spectrum Analysis Approach,

**164 KB – 6 Pages**