Wavelet-Based Elman Neural Network with the Modified Differential Evolution Algorithm for Forecasting Foreign Exchange Rates

Renquan HUANG; Jing TIAN

doi:10.21078/JSSI-2021-421-19

PDF(769 KB)

Journal of Systems Science and Information ›› 2021, Vol. 9 ›› Issue (4) : 421-439. DOI: 10.21078/JSSI-2021-421-19

Wavelet-Based Elman Neural Network with the Modified Differential Evolution Algorithm for Forecasting Foreign Exchange Rates

Author information +

History +

Abstract

It is challenging to forecast foreign exchange rates due to the non-linear characters of the data. This paper applied a wavelet-based Elman neural network with the modified differential evolution algorithm to forecast foreign exchange rates. Elman neural network has dynamic characters because of the context layer in the structure. It makes Elman neural network suit for time series problems. The main factors, which affect the accuracy of the Elman neural network, included the transfer functions of the hidden layer and the parameters of the neural network. We applied the wavelet function to replace the sigmoid function in the hidden layer of the Elman neural network, and we found there was a "disruption problem" caused by the non-linear performance of the wavelet function. It didn't improve the performance of the Elman neural network, but made it get worse in reverse. Then, the modified differential evolution algorithm was applied to train the parameters of the Elman neural network. To improve the optimizing performance of the differential evolution algorithm, the crossover probability and crossover factor were modified with adaptive strategies, and the local enhanced operator was added to the algorithm. According to the experiment, the modified algorithm improved the performance of the Elman neural network, and it solved the "disruption problem" of applying the wavelet function. These results show that the performance of the Elman neural network would be improved if both of the wavelet function and the modified differential evolution algorithm were applied integratedly.

Key words

dividend bonus / normal distribution function / differentiated taxation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

Renquan HUANG , Jing TIAN. Wavelet-Based Elman Neural Network with the Modified Differential Evolution Algorithm for Forecasting Foreign Exchange Rates. Journal of Systems Science and Information, 2021, 9(4): 421-439 https://doi.org/10.21078/JSSI-2021-421-19

1 Introduction

There are non-linear and volatile characters of the foreign exchange rate data, and it is not efficient to forecast by current statistical models^[1]. Financial time series forecasting is one of the most challenging problems, and artificial neural networks (ANNs) are used to solve this problem. The performance of the time series forecasting model, however, is limited by its low accuracy for forecasting longer periods. In this paper, the wavelet-based Elman neural network, which was applied with the modified differential evolution algorithm integratedly, is proposed to forecast foreign exchange rates.

ANNs performs better than the traditional statistical models such as ARIMA in forecasting the foreign exchange rate. Kadilar, et al.^[2] explored both the ARIMA model and neural networks for the Turkish TL/US dollar exchange rate series. Results show that the ANN method has far better accuracy compared to the ARIMA time series model. Naeini, et al.^[3] compared the feedforward multilayer perceptron (MLP) with a modified Elman neural network (ENN) model to predict a company stock value. The results show that the modified ENN has lower MSE, MAPE, and MAE values in comparison with MLP. ENN has dynamic characteristics, which are provided by internal connections. To the structure of ENN, it does not require the state as an input or training signal. This makes it superior to static feedforward networks, and can be widely used in a dynamic system^[4-6]. The performance of ENN is greatly affected by parameters. PSO, GA, and other optimal algorithms are applied to train the parameters of ENN^[7-12]. According to the researches of Ciarlini^[13], Lei^[14], Ong^[15], et al., the transfer function of the hidden layer is another important factor that affects the forecasting accuracy of ENN. Zhang and Benveniste^[16] applied the wavelet function to the hidden layer of the BP neural network and proposed the wavelet neural network (WNN). The performance of WNN improved a lot by doing this. Zhang^[17] studied the frames of wavelet.

In this work, we firstly apply wavelet function to the hidden layer of ENN, and we expect the performance of it would improve as WNN. Lu, et al.^[18] proposed a recurrent wavelet-based Elman neural network, which was used to control hybrid offshore wind and wave power generation systems, and derived optimal learning rates to adjust the network parameters. Then, we use an improved differential evolution (DE) algorithm to train the parameters of ENN. DE, first reported by Storn and Price^[19], is easy to implement, has few parameters to control, and can improve the performance through just changing parameters without imposing computation burden^[20-24]. The DE family of algorithms has been frequently adopted to tackle multi-objective, constrained, dynamic, large-scale, and multimodal optimization problems^[25-29]. The simulation was made to find how did the wavelet function and the modified DE algorithm improve the performance of ENN.

2 Elman Neural Network with Wavelet Transfer Function

2.1 Elman Neural Network

ENN originally developed by Elman in 1990 based on the Jordan network^[30]. The structure of ENN contains four parts^{[31, 32]}: The input layer, hidden layer, context layer and output layer, which is illustrated in Figure 1. Considering the structure of ENN, it was a modified neural network based on the BP neural network. However, unlike the BP neural network, the context layer is used to store the previous information of the hidden layer and feedback it to the next moment of the hidden layer. In this way, the context layer improves the sensitivity to historical data and makes ENN have a dynamic memory function.

Figure 1 The network structure of ENN

Full size|PPT slide

2.2 Replacing the Sigmoid Function with the Wavelet Function

The WNN was proposed by Zhang^[16] in 1992. WNN is modified from the BP neural network by replacing the sigmoid function with wavelet function in the hidden layer. In the hidden layer of BP neural network, the sigmoid function is usually used as the transfer function. The wavelet function has a better non-linear performance than the sigmoid function, and the performance of WNN improves a lot. Inspired by WNN, we proposed the wavelet-based Elman neural network (WENN), which replaced the sigmoid function of ENN with wavelet function as the transfer function in the hidden layer. We expected to take advantage of the wavelet function and improve the performance of ENN as WNN.

2.3 Wavelet-Based Elman Neural Network

2.3.1 Input Layer

In the input layer, the input vector is of the

p

th sample defined as

(x_{1}^{p}, x_{2}^{p}, \dots, x_{m}^{p})

and the pureline function is the transfer function of the nodes. So, the output vector of the input layer is equal to the input vector:

\begin{aligned} (y_{1}^{(1) p}, y_{2}^{(1) p}, \dots, y_{m}^{(1) p}) = (x_{1}^{p}, x_{2}^{p}, \dots, x_{m}^{p}) . \end{aligned}

(1)

2.3.2 Hidden Layer

The input of the node in the hidden layer is from two parts: One is the output of the input layer, and the other is the context layer. The

k

th node input can be represented by:

\begin{aligned} x_{k}^{(2) p} = \sum_{m = 1}^{M} w_{m k} \cdot x_{m}^{p} + \sum_{h = 1}^{H} r_{h k} \cdot y_{h}^{(3) p}, \end{aligned}

(2)

where,

M

is the number of nodes in the input layer;

K

is the number of nodes in the hidden layer;

H (H = K)

is the number of nodes in the context layer;

w_{m k}

is the connection weight of the input layer to the hidden layer;

r_{h k}

is the connection weight of the context layer to the hidden layer;

y_{h}^{(3) p} = φ_{h}^{'}

is the output of the

h

th node to the context layer, which is the previous value of the hidden layer.

In WENN, the transfer function of the hidden layer is a wavelet function. There are many types of the wavelet function, and Morlet function is chosen in this paper^[17]:

\begin{aligned} ψ (x) = \cos (1.75 x) \cdot \exp (- x^{2} / 2) . \end{aligned}

(3)

The output of the node in the hidden layer is given by

y_{k}^{(2) p} = ψ (\frac{\sum_{m = 1}^{M} w_{m k} \cdot x_{m}^{p} + \sum_{h = 1}^{H} r_{h k} \cdot φ_{h}^{'} - b_{k}}{a_{k}}),

(4)

where

a_{k}

is the dilation coefficient and

b_{k}

is the translation coefficient to the node of the hidden layer.

2.3.3 Context Layer

In the context layer, the input and the output nodes are represented by

x_{h}^{(3) p} = y_{h}^{(3) p} = φ_{h}^{'} = y_{k}^{(2) p} (t - 1) .

(5)

2.3.4 Output Layer

In the output layer, the transfer function is pureline function. So, the input and output nodes are represented by

x_{n}^{(4) p} = y_{n}^{p} = \sum_{k = 1}^{K} v_{k n} \cdot y_{k}^{(2) p} = \sum_{k = 1}^{K} v_{k n} \cdot ψ (\frac{\sum_{m = 1}^{M} w_{m k} \cdot x_{m}^{p} + \sum_{h = 1}^{H} r_{h k} \cdot φ_{h}^{'} - b_{k}}{a_{k}}) .

(6)

2.4 Learning Algorithm of WENN

To create a WENN, the node number of the input layer

M

, hidden layer

K

, and output layer

N

should be defined.

M

and

N

are determined by the researching problems.

K = \sqrt{M + N} + c

^{[33, 34]},

c

is an integer between 1 and 10.

Once the WENN is initialized, supervised learning is used to adjust the parameters of the system. The gradient descent with momentum (GDM) algorithm is in common use to adjust the parameters of the network. To describe the parameter learning algorithm, the energy function is expressed as

E = \frac{1}{2} \sum_{p = 1}^{P} \sum_{n = 1}^{N} {(y_{n}^{p} - T_{n}^{p})}^{2},

(7)

where

T_{n}^{p}

is the

n

th expected value of the

p

th sample.

The main steps of GDM can be described as follows:

Step 1 Calculating the energy function. In the GDM algorithm, the recursive application of the chain rule is used to achieve backpropagation. Error is calculated according to (1)

\sim

(7).

Step 2 Adjust the learning rate. The learning rate

η

can be adjusted as follows. If

E (t) < E (t - 1)

, it seems the training process is moving towards optimization. In this term, the learning rate

η

should be increased

η (t) = α η (t - 1)

α > 1

. If

E (t) > 1.04 E (t - 1)

, it seems the training process is getting bad. The learning rate

η

should be decreased

η (t) = β η (t - 1)

β < 1

Step 3 Adjust the parameters of the network. We defined:

\begin{array}{rcl} {n e t}_{k}^{p} = \sum_{m = 1}^{M} w_{m k} x_{m}^{p} + \sum_{h = 1}^{H} r_{h k} φ_{h}^{'}, \end{array}

(8)

\begin{array}{rcl} ψ_{a, b} ({n e t}_{k}^{p}) = ψ (\frac{{n e t}_{k}^{p} - b_{k}}{a_{k}}) . \end{array}

(9)

In the output layer, the updated law for

v_{k n}

δ_{v_{k n}} = - \sum_{p = 1}^{P} \sum_{n = 1}^{N} (T_{n}^{p} - y_{n}^{p}) ψ_{a, b} ({n e t}_{k}^{p}) .

(10)

The weight

v_{k n}

is updated according to the equation:

v_{k n} (t + 1) = v_{k n} (t) - η (t) δ_{v_{n k}} + m_{c} Δ v_{n k} (t) .

(11)

In the hidden layer, the updated law for

w_{m k}

δ_{w_{m k}} = - \sum_{p = 1}^{P} \sum_{n = 1}^{N} {(T_{n}^{p} - y_{n}^{p})}_{k n} ψ_{a, b} ({n e t}_{k}^{p}) x_{m}^{p} / a_{k} .

(12)

The weight

w_{m k}

is updated according to the equation:

w_{m k} (t + 1) = w_{m k} (t) - η (t) δ_{w_{m k}} + m_{c} Δ w_{m k} (t) .

(13)

In the context layer, the updated law for

r_{h k}

δ_{r_{h k}} = - \sum_{p = 1}^{P} \sum_{n = 1}^{N} (T_{n}^{p} - y_{n}^{p}) v_{k n} ψ_{a, b} ({n e t}_{k}^{p}) φ_{h}^{'} / a_{k} .

(14)

The weight

r_{h k}

is updated according to the equation:

r_{h k} (t + 1) = r_{h k} (t) - η (t) δ_{r_{h k}} + m_{c} Δ r_{h k} (t) .

(15)

The update amounts for the translation

b_{k}

and dilation

a_{k}

are given by

\begin{array}{rcl} δ_{b_{k}} = - \sum_{p = 1}^{P} \sum_{n = 1}^{N} {(T_{n}^{p} - y_{n}^{p})}_{k n} ψ_{a, b} ({n e t}_{k}^{p}) / a_{k}, \end{array}

(16)

\begin{array}{rcl} δ_{a_{k}} = - \sum_{p = 1}^{P} \sum_{n = 1}^{N} {(T_{n}^{p} - y_{n}^{p})}_{k n} ψ_{a, b} ({n e t}_{k}^{p}) (\frac{{n e t}_{k}^{p} - b_{k}}{a_{k}}) / a_{k} . \end{array}

(17)

The translation and dilation are updated as follows:

\begin{array}{rcl} b_{k} (t + 1) = b_{k} (t) - η (t) δ_{b_{k}} + m_{c} Δ b_{k} (t), \end{array}

(18)

\begin{array}{rcl} a_{k} (t + 1) = a_{k} (t) - η (t) δ_{a_{k}} + m_{c} Δ a_{k} (t) . \end{array}

(19)

Step 4 Repeat Step 1 to Step 3 continuously, and output the result when the termination condition is met.

3 The Modified Differential Evolution Algorithm

Differential evolution (DE) converges fast in the optimization problems. To improve the optimizing performance of DE, the crossover probability and crossover factor are modified with adaptive strategies, and the local enhanced operator is added to the algorithm. With those improving strategies, the new differential evolution algorithm is called ADLEDE for short.

3.1 Differential Evolution algorithm

Differential evolution algorithm, which inherits the idea of survival of the fittest, is a kind of evolution algorithm^[35]. For each individual in the population, 3 points are randomly selected from the population. One point is taken as the basis, and the other 2 points are taken as the reference to make a disturbance. New points are generated after crossing, and the better one is retained by natural selection to achieve population evolution. Suppose the problem to be optimized is

min (f (x))

, the main steps of the algorithm are as follows:

Step 1 Initialization. Set the population size

N

, the number of variables

m

, cross probability

P_{c}

, and cross factor

P_{m}

. When the evolutional generation

t = 0

, initialize the low/up bound of the vector lb/ub and the initial population vector

\vec{X} (0) = {{\vec{X}}_{1} (0), {\vec{X}}_{2} (0), \dots, {\vec{X}}_{N} (0)}

, where

{\vec{X}}_{i} (0) = {x_{1}^{i} (0), x_{2}^{i} (0), \dots, x_{m}^{i} (0)}

Step 2 Evaluation. Calculate the fitness value

f ({\vec{X}}_{i} (t))

for each individual

{\vec{X}}_{i} (t)

Step 3 Mutation. For individual vector

{\vec{X}}_{i} (t)

in the population, three indices

r_{1}

r_{2}

r_{3} \in {1, 2, \dots, N}

and an integer

j_{r} \in {1, 2, \dots, m}

are randomly chosen.

x_{j}^{(i)^{'}} (t) = {\begin{cases} x_{j}^{(r_{1})} (t) + P_{m} \cdot (x_{j}^{(r_{2})} (t) - x_{j}^{(r_{3})} (t)), & if (rand < P_{c} or j = j_{r}), \\ x_{j}^{(i)} (t), & otherwise . \end{cases}

(20)

Step 4 Selection.

{\bar{X}}_{i} (t + 1) = {\begin{cases} {\bar{X}}_{i}^{'} (t), & if (f ({\bar{X}}_{i}^{'} (t)) < f ({\bar{X}}_{i} (t))), \\ {\bar{X}}_{i} (t), & otherwise . \end{cases}

(21)

Step 5 Terminal condition. If the individual vector

{\vec{X}}_{i} (t + 1)

satisfies the termination condition, then

{\vec{X}}_{i} (t + 1)

is the optimal solution, otherwise, turn to Step 2.

3.2 Self-Adaptive Strategies to $P_{c}$ and $P_{m}$

The crossover probability

P_{c}

and the crossover factor

P_{m}

are constant values in DE. When the optimization problems are complex, the optimizing efficiency is not efficient enough^[36]. In the adaptive improvement, the crossover probability

P_{c}

and the crossover factor

P_{m}

are adapted according to the individual fitness values. When the population tends to fall into the local optimal solution, it increases the

P_{c}

and

P_{m}

value accordingly. When the population tends to diverge, it reduces the

P_{c}

and

P_{m}

value. For individuals, whose fitness values are higher than the average fitness of the population, are corresponding to the lower

P_{c}

and

P_{m}

value, and the solutions are protected to enter the next generation. For individuals, whose fitness values are lower than the average fitness of the population, are corresponding to the higher

P_{c}

and

P_{m}

value, and the solutions are eliminated. The individual crossover probability

P_{c}

and crossover factor

P_{m}

are adapted according to

\begin{array}{rcl} P_{c} = {\begin{cases} P_{c 1} - \frac{(P_{c 1} - P_{c 2}) \cdot (f^{'} - f_{a v g})}{f_{max} - f_{a v g}} & f^{'} \geq f_{a v g}, \\ P_{c 1} & f^{'} < f_{a v g}, \end{cases} \end{array}

(22)

\begin{array}{rcl} P_{m} = {\begin{cases} P_{m 1} - \frac{(P_{m 1} - P_{m 2}) \cdot (f_{max} - f)}{f_{max} - f_{a v g}}, & f \geq f_{a v g}, \\ P_{m 1}, & f < f_{a v g}, \end{cases} \end{array}

(23)

where

P_{c 1}

is the higher crossover probability 0.7

\sim

0.9;

P_{c 2}

is the lower crossover probability 0.4

\sim

0.6;

P_{m 1}

is the higher crossover factor 0.08

\sim

0.1;

P_{m 2}

is the lower crossover factor 0.01

\sim

0.05;

f_{max}

is the maximum fitness value in the population;

f_{a v g}

is the average fitness value in the population;

f^{'}

is the higher fitness value of

{\vec{X}}_{r_{2}} (t)

and

{\vec{X}}_{r_{3}} (t)

;

f

is the fitness value of

{\vec{X}}_{r_{1}} (t)

According to (22) and (23),

P_{c}

and

P_{m}

can be adaptively adjusted, which improves the optimization performance of the algorithm.

3.3 Local Enhancement Strategy

Because DE generates a new intermediate individual through random deviation perturbation, the local search ability of it is weak. While approximating the optimal solution, it still needs to iterates several generations to get the optimal value, which affects the convergence speed of the algorithm^[37]. Therefore, the local enhancement operator

L E (0 < L E < 1)

is introduced to DE. After obtaining a new population, some individuals in the new population (excluding the current optimal individual) are reassigned with probability

L E

. In this way, the individuals are distributed by the optimal individual of the current population.

L E

could enhance the greediness of the individuals and speed up the convergence of DE.

{\vec{X}}_{i, t + 1} = {\vec{X}}_{best, t + 1} + P_{l} \cdot ({\vec{X}}_{r_{3}, t + 1} - {\vec{X}}_{r_{4}, t + 1}),

(24)

where

{\vec{X}}_{i, t + 1}

is the enhanced new individual;

{\vec{X}}_{r_{3}, t + 1}

and

{\vec{X}}_{r_{4}, t + 1}

are the original individuals;

{\vec{X}}_{b e s t, t + 1}

is the best individual of the current population;

P_{l}

is the perturbation factor. The indices

r_{3}

and

r_{4}

are mutually exclusive integers, which meet

r_{3} \neq r_{4} \neq i

It is the essence of local enhancement to DE, which is to make some individuals seek the solution by the optimal vector of the current population. While keeping the diversity of the population, the greed of the good individuals is increased to ensure that the algorithm finds the global optimal solution quickly. The local searching ability of the algorithm is improved by the perturbation factor

P_{l}

, which accelerates the convergence speed, especially when approximating the global optimal solution.

After the adaptive and local enhanced improvement of DE, the flow chart of AD

L E

DE is shown in Figure 2.

Figure 2 Flow chat of ADLEDE-WENN

Full size|PPT slide

4 ADLEDE-WENN and the Comparative Forecasting Models

4.1 ADLEDE-WENN Forecasting Model

In ENN, gradient descent algorithm is usually chosen as the parameter learning algorithm, including the gradient descent (GD), the GDM, the Levenberg-Marquardt (LM) etc. But there is a vital defect of those algorithms, it is easy to fall into local optimum. Therefore, it is essential to find a new parameter adjustment method of the ENN. In this paper, we applied the AD

L E

DE algorithm to adjust the parameters of ENN. Based on the analysis mentioned above, the main steps of AD

L E

DE-WENN are described in Figure 2.

4.2 Comparative Models

To evaluate the effectiveness of the AD

L E

DE-WENN model in forecasting the closing price of the foreign exchange rates, other comparative models including ENN, WENN and AD

L E

DE-ENN are selected. ENN (described in Subsection 2.1) is the basic model, and other models are modified based on it. WENN was replaced the sigmoid function with wavelet function in the hidden layer based on ENN, and it was introduced in Subsection 2.3. AD

L E

DE-WENN refers to AD

L E

DE-ENN, which was researched in Subsection 4.1. To the transfer function of the hidden layer, the sigmoid function was used in AD

L E

DE-ENN, and the wavelet function was used in AD

L E

DE-WENN.

First, by comparing WENN with ENN, the effect of wavelet function to the Elman neural network could be studied. Second, by comparing AD

L E

DE-ENN with ENN, the feasibility and effectiveness of the AD

L E

DE algorithm could be evaluated in adjusting the parameters of the Elman neural network. Third, by comparing AD

L E

DE-ENN with AD

L E

DE-WENN, the effect of wavelet function could be researched on the use of AD

L E

DE to the Elman neural network. At last, in order to evaluate the comprehensive effects of the AD

L E

DE algorithm and the wavelet function to the Elman neural network, AD

L E

DE-WENN and ENN could be selected.

4.3 Performance Measure

In this part, we comparatively evaluate the prediction effects of wavelet function and AD

L E

DE algorithm applied to ENN. For further analyzing the forecasting performance of ENN, WENN, AD

L E

DE-ENN and AD

L E

DE-WENN, we choose several measures to value error and trend performance, including mean square error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and error limit proportion (ELP). Those are all the error-type measures of the deviation between predicted values and actual data, and those indexes reflect the prediction of global error. The corresponding definitions are given as follows:

Mean square error (MSE):

M S E = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2} .

(25)

Mean absolute error (MAE):

M A E = \frac{1}{N} \sum_{i = 1}^{N} | {\hat{y}}_{i} - y_{i} | .

(26)

Mean absolute percentage error (MAPE):

M A P E = \frac{100 %}{N} \sum_{i = 1}^{N} | \frac{{\hat{y}}_{i} - y_{i}}{y_{i}} | .

(27)

Error limit proportion (ELP):

\begin{array}{rcl} N_{i}^{α} = {\begin{matrix} 1, if (| \frac{{\hat{y}}_{i} - y_{i}}{y_{i}} | \times 100 % \leq α %), \\ 0, otherwise . \end{matrix} \end{array}

(28)

\begin{array}{rcl} E L P - α % = \frac{100 %}{N} \sum_{i = 1}^{N} N_{i}^{α}, \end{array}

(29)

where

{\hat{y}}_{i}

is the predict value and

y_{i}

is the real value;

N

denotes the number of the evaluated data;

α %

is the limited level. MSE, MAE and MAPE are the negative indexes. The smaller MSE, MAE and MAPE value show the less deviation of the forecasting results from the actual values. ELP is the positive index. The value of ELP is higher, and the accuracy is more precise.

5 Experiment Method

5.1 The Foreign Exchange Rate Data

In order to study the validity of the models, we selected 4 kinds of foreign exchange rate closing price. They were EURUSD, USDCNH, GBPUSD and GBPCNY, and the data was from Wind. The closing prices of EURUSD, USDCNH, GBPUSD were from the International Foreign Exchange Market (IFEM), and the closing price of GBPCNY was from the China Foreign Exchange Trade System (CFETS). The information of the foreign exchange rate was shown in Table 1.

Table 1 Foreign exchange information

Foreign exchange	Data	Beginning	End
EURUSD	IFEM	2019.3.28	2020.2.26
GBPCNY	CFETS	2019.2.28	2020.2.26
GBPUSD	IFEM	2019.3.28	2020.2.26
USDCNH	IFEM	2019.3.28	2020.2.26

Note: Data from Wind.

For each foreign exchange rate, 240 days of closing price were chosen. The data set was divided into the training and testing data set. The training data set was composed of 170 days in front, and it was accounting for 71% of the total data. The testing data set was composed of the rest 70 days accounting for 29% of the total data.

5.2 Normalization Preprocessing

The observed foreign exchange rate closing price is the non-normal data. Before we use the ANNs to predict the price, the price data should be normalized. In the data set, the minimum and the maximum values are used to normalize the data:

x_{k}^{'} = \frac{x_{k} - x_{min}}{x_{max} - x_{min}} .

(30)

After the forecasting, anti-normalization need be used to obtain the true value by the formula:

x_{k} = x_{k}^{'} \times (x_{max} - x_{min}) + x_{min},

(31)

where

x_{k}

is the observed (anti-normalized) closing price;

x_{min}

and

x_{max}

are the minimum and the maximum prices of the data;

x_{k}^{'}

is the normalized (predicted) data. In this paper, MSE is also applied to measure the performance of the ANNs models. We defined MSE and MSE* for different use. MSE was used to mark the result, in which the data was normalized. MSE* was used to mark the result, in which the data was anti-normalized.

5.3 Experimental Tools and Configuration of System

In this paper, Matlab was selected to implement those 4 models. To the Elman neural network, the neural network toolbox of MATLAB R2016a was used in the experiment, and the main parameters described in Table 2. While the AD

L E

DE algorithm was used in the experiment, the parameters of AD

L E

DE were set in Table 3.

Table 2 Parameters of Elman neural network

Parameters	Description	Value
M	Number of the input layer nodes	5
N	Number of the output layer nodes	1
K	Number of the hidden layer nodes	10/12
H	Number of the context layer nodes	10/12
transferFcn	Transfer function of the hidden layer	tansig/Morlet
trainFcn	Training function	traingdm
epochs	Maximum training times	15, 000/1, 000*

Note: *1, 000 was set by using ADLEDE to the ANNs. Otherwise, it was set 15, 000.

Table 3 Parameters of ADLEDE

Parameters	Description	Value
nofv	Number of the variables	171/229*
lb	low bound of the variables	-1
ub	up bound of the variables	1
popsize	population size	100
pl	the perturbation factor	0.5
LE	the local enhancement operator	0.01
$p_{c 1}$	the higher crossover probability	0.8
$p_{c 2}$	the lower crossover probability	0.5
$p_{m 1}$	the higher crossover factor	0.09
$p_{m 2}$	the lower crossover factor	0.04

Note: * When the hidden layer nodes were 10 (12), the variables were 171 (229).

6 Results and Discussion

The data of the foreign exchange rate was described in Subsection 5.1, including EURUSD, GBPCNY, GBPUSD, and USDCNH. To study the performances of the models, which were ENN, WENN, AD

L E

DE-ENN and AD

L E

DE-WENN, all the 4 models were researched in the experiments to each group of the foreign exchange rate. There was a fatal defect to the ANNs——The randomness, which meant we couldn't get the same neural network exactly. The reason for the randomness was that: When the neural network was trained, the parameters to each of the neural network

(w_{m k}, r_{h k}, v_{n k}, a_{k}, b_{k},

etc) were different. To reduce the impact of the randomness, each model was simulated 20 times in the experiment, and the result was the average value of all the 20 times. The results were shown in Table 4. Not all the figures of the experiments were shown in this paper, and part figures of EURUSD were shown in Figures 3~6.

Table 4 The results of the experiments

Measurement	Models	MSE*		MAE		MAPE(%)		ELP-0.5%(%)
Measurement	Models	Training	Testing	Training	Testing	Training	Testing	Training	Testing	D-value*
EURUSD	ENN	1.1783 $\times 10^{- 5}$	1.4734e-5	2.6632e-3	2.9016e-3	0.2401	0.2583	89.3529	85.4615	-3.8914
	WENN	2.5488e-5	5.1806e-5	3.8458e-3	5.0302e-3	0.3470	0.4466	76.8235	68.9231	-7.9004
	AD $L E$ DE-ENN	9.5891e-6	1.2389e-5	2.3394e-3	2.6752e-3	0.2109	0.2381	91.4706	87.3846	-4.0860
	AD $L E$ DE-WENN	9.3268e-6	1.0823e-5	2.2803e-3	2.5051e-3	0.2055	0.2230	92.2941	89.2308	-3.0633
GBPCNY	ENN	2.6356e-3	2.1774e-3	4.0411e-2	3.5434e-2	0.4553	0.4023	62.6765	70.4615	7.7850
	WENN	7.4443e-3	4.0192e-3	6.3406e-2	4.9077e-2	0.7145	0.5576	47.4706	57.2308	9.7602
	AD $L E$ DE-ENN	1.9361e-3	1.9861e-3	3.4803e-2	3.1917e-2	0.3917	0.3623	67.8235	74.1538	6.3303
	AD $L E$ DE-WENN	1.9274e-3	1.9554e-3	3.4624e-2	3.1572e-2	0.3896	0.3584	68.1765	76.3077	8.1312
	AD $L E$ DE-WENN*	7.8462e-4	8.7937e-4	5.5716e-3	6.1457e-3	0.2486	0.2879	73.0534	81.4625	8.4091
EURUSD	ENN	5.2576e-5	2.5823e-5	5.562e-3	4.0969e-3	0.4385	0.3186	66.2647	80.0015	13.7368
	WENN	1.0515e-4	8.5562e-5	7.8274e-3	7.0712e-3	0.6177	0.5510	51.2941	57.8462	6.5521
	AD $L E$ DE-ENN	4.3148e-5	2.0371e-5	5.0465e-3	3.5886e-3	0.3980	0.2790	72.2353	85.3846	13.1493
	AD $L E$ DE-WENN	4.1936e-5	1.9777e-5	5.0064e-3	3.5425e-3	0.3949	0.2753	72.4706	85.8462	13.3756
EURUSD	ENN	6.2212e-4	4.9941e-3	1.8443e-2	4.9553e-2	0.2627	0.7338	87.2647	56.0491	-31.2156
	WENN	1.4898e-3	3.0409e-2	2.9172e-2	11.0220e-2	0.4158	1.6330	69.6518	41.5385	-28.1133
	AD $L E$ DE-ENN	4.8748e-4	4.8459e-3	1.5751e-2	1.9072e-2	0.2242	0.2810	91.6471	61.8462	-29.8009
	AD $L E$ DE-WENN	4.8557e-4	6.3973e-4	1.5792e-2	4.4246e-2	0.2248	0.6555	91.8824	84.1372	-7.7452

Note: D-value* = Testing value − Training value;
ADLEDE-WENN* marks the structure of the neural network was changed. The numbers of the hidden and context layer nodes increased from 10 to 12.

6.1 The Impact of Wavelet Function to ENN

If the wavelet (Morlet) function was used in ENN, the efficiency of the traditional learning method (GDM in this paper) was decreased a lot. In the experiments of ENN and WENN, the training function was set as the "traingdm" which referred to the GDM. According to the result in Table 4, all the negative indexes (MSE*, MAE and MAPE) values of WENN were higher and the positive index (ELP-0.5%) was lower than the results of ENN to the same foreign exchange rate. The MSE of ENN and WENN was described in Figure 3(c) and Figure 4(c). MSE decreased smoothly in Figure 3(c), and it got the minimum value 2.7284

\times 10^{- 3}

at the maximum time 15, 000. However, MSE was fluctuating in Figure 4(c), and it got the minimum value 5.6847

\times 10^{- 3}

at 8, 669 times. Therefore, we could conclude that the wavelet function in the WENN disrupted the convergence of the traditional learning algorithm. The "disruption problem" was caused by the non-linear performance of the wavelet function. Therefore, in the experiments of WENN, it didn't improve the performance of ENN. However, the results were getting worse in reverse.

To evaluate the effect of a wavelet function, the AD

L E

DE algorithm was applied to both of ENN (AD

L E

DE-ENN) and WENN (AD

L E

DE-WENN). The performance of AD

L E

DE was described in Subsection 6.2. According to Table 4, the negative indexes value (MSE*, MAE and MAPE) of AD

L E

DE-ENN was higher and the positive index (ELP) was lower than the value of AD

L E

DE-WENN to the same foreign exchange rate. Comparing the experiments of AD

L E

DE-ENN and AD

L E

DE-WENN, the only difference was the transfer function in the hidden layer. The transfer function of AD

L E

DE-WENN was the wavelet (Morlet) function, and the sigmoid function was for AD

L E

DE-ENN. So, we could conclude that the differences in the result of the experiment were caused by the wavelet function. According to the testing result of EURUSD, the MSE, MAE and MAPE values of AD

L E

DE-WENN decreased by 12.64%, 6.36% and 6.34%, and the value of ELP-0.5% increased by 1.85%.

6.2 The Performance of the AD $L E$ DE Algorithm

Based on the analysis of Subsection 6.1, it was necessary to adopt a new parameter training method in the WENN, which would give full play to the non-linear performance of wavelet function. The AD

L E

DE algorithm, which had the characters of fast convergence and global optimization capability, was applied to train the parameters of the neural network in the experiments.

The performance of the AD

L E

DE algorithm was researched by comparing ENN and AD

L E

DE-ENN. According to Table 4, the negative indexes value (MSE*, MAE and MAPE) of ENN was higher and the positive index (ELP-0.5%) was lower than the value of AD

L E

DE-ENN to the same foreign exchange rate. Comparing the experiments of ENN and AD

L E

DE-ENN, the difference was that the AD

L E

DE algorithm was used to train the parameters of the neural network in AD

L E

DE-ENN. According to the testing result of EURUSD, the MSE*, MAE and MAPE values of AD

L E

DE-ENN decreased by 15.92%, 7.80% and 7.82%, and the value of ELP-0.5% increased by 1.92%.

The performance of the AD

L E

DE algorithm was also researched by comparing AD

L E

DE-ENN and AD

L E

DE-WENN. In the training process of AD

L E

DE, the fitness values (MSE) of AD

L E

DE-ENN and AD

L E

DE-WENN were shown in Figure 7. In the experiment of AD

L E

DE-ENN, the AD

L E

DE algorithm terminated at 12 generations and got the optimal result 2.523

\times 10^{- 3}

. In the experiment of AD

L E

DE-WENN, the AD

L E

DE algorithm terminated at 11 generations and got the optimal result 2.507

\times 10^{- 3}

. The parameters were trained by the AD

L E

DE algorithm at first. Then, the parameters were sent to the neural network and trained by the traditional algorithm (GDM). According to Figure 5(c) and Figure 6(c), the MSE of AD

L E

DE-ENN was 2.5226

\times 10^{- 3}

, and the MSE* of AD

L E

DE-WENN was 2.5046

\times 10^{- 3}

. After 1, 000 times trained by the traditional learning algorithm, the MSE of both neural networks had limited improvement. Therefore, we concluded that it was effective to apply the AD

L E

DE algorithm to train the parameters of the neural network. After the training process of the AD

L E

DE algorithm, it almost got the global optimal solution. The traditional algorithm of the Elman neural network had limited performance to the forecasting result. AD

L E

DE algorithm could be applied to train the parameters of the neural network. In our experiments, it not only solved the "disruption problem" caused by the wavelet function, but also could take advantage of the non-linear characters belonging to wavelet function. By applying the AD

L E

DE algorithm to WENN, the forecasting performance of the neural network improved a lot.

**Figure 7 The fitness (MSE*) of ADLEDE**

Full size|PPT slide

According to the testing results of EURUSD, GBPCNY, GBPUSD and USDCNH, all the indexes show that: AD

L E

DE-WENN

≻

L E

DE-ENN

≻

ENN ("

≻

" means better than). AD

L E

DE-ENN

≻

ENN meant that the AD

L E

DE algorithm was an effective method to train the parameters of the neural network. AD

L E

DE-WENN

≻

L E

DE-ENN meant that the wavelet (Morlet) function of the hidden layer improved the performance of the neural network.

6.3 The Structure of the Models and the Over-Fitting Problem

The structure of ENN had a significant impact on the performance of ENN, including the number of the input layer nodes, the hidden layer nodes and the context layer nodes and so on. In the experiments, the same structure was applied to forecast different foreign exchange rates, and the parameters were shown in Table 2. According to the testing results of ELP-0.5%, the structure suited for EURUSD (89.2308)

≻

GBPUSD (85.8462)

≻

USDCNH (84.1372)

≻

GBPCNY (81.4625). Therefore, the experiments show that the same neural network structure had a different performance to forecast the foreign exchange rate. In other words, we should select a certain structure, which could reflect the low of fluctuation to the foreign exchange rate, to predict the close price. In the AD

L E

DE-WENN experiment of GBPCNY (marked AD

L E

DE-WENN* in Table 4), if the numbers of the hidden and context layer nodes increased from 10 to 12, all the indexes including MSE*, MAE, MAPE and ELP-0.5% improved a lot.

The over-fitting problem is common in the neural network. It means the neural network performs well in the training process, but performs badly in the testing process. In the experiments of USDCNH, the over-fitting problems were very serious to ENN and AD

L E

DE-ENN. According to the results of MSE*, MAE, MAPE and ELP-0.5%, the performance of the neural network was very bad in the testing process. It meant that the trained neural network was failed. By comparing the D-value of AD

L E

DE-ENN and AD

L E

DE-WENN, if the testing value was less than training value, the absolute D-value decreased; if the testing value was greater than training value, the absolute D-value increased. According to the results, we could conclude that the over-fitting problem in AD

L E

DE-WENN was less serious than that in AD

L E

DE-ENN, and the "improvement" was also brought about by the wavelet function in the hidden layer of WENN.

7 Conclusion

In the present paper, the AD

L E

DE-WENN predicting model is established aim to forecast the fluctuations of the foreign exchange rate. Based on the experiments of EURUSD, GBPCNY, GBPUSD and USDCNH, the follows could be concluded:

Firstly, if the wavelet function was applied to be the transfer function in the hidden layer of ENN, the non-linear character of the wavelet function could improve the performance of ENN in forecasting. But it would decrease the efficiency of the traditional learning algorithm at the same time. So, a new parameter training algorithm was needed under the circumstances.

Secondly, it was a feasible and effective way to train the parameters of the neural network with the AD

L E

DE algorithm. When the AD

L E

DE algorithm was applied in WENN, it could solve the "disruption problem" and take advantage of the non-linear character belonging to wavelet function. In this way, the performance of the neural network could be improved a lot.

Thirdly, the structure of the neural network had a significant impact on the performance of the neural network. Different structures were needed to forecast the foreign exchange rates.

At last, the over-fitting problem was common in the application of neural networks. The application of wavelet function in the neural network was conducive to weaken the problem of over-fitting.

In the present paper, other problems needed to be studied. There were many other wavelet functions, and the Morlet function was studied in this paper. It was necessary to study the performance of other wavelet functions applied to the neural network. Considering the impact of the structure on the neural network, it was a new problem about how to find a suitable structure for each foreign exchange rate.

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	Das S R, Mishra D, Rout M. A hybridized ELM-Jaya forecasting model for currency exchange prediction. Journal of King Saud University-Computer and Information Sciences, 2020, 32 (3): 345- 366. https://doi.org/10.1016/j.jksuci.2017.09.006 Cited in this article [1]

2	Kadilar C, Alada H. Forecasting the exchange rates series with ANN. Istanbul University Econometrics and Statistics e-Journal, 2009, 9, 17- 29. Cited in this article [1]

3	Naeini M P, Taremian H, Hashemi H B. Stock market value prediction using neural networks. Computer Information Systems and Industrial Management Applications (CISIM), 2010, 132- 136. Cited in this article [1]

4	Gholamzadeh-Chitgar A, Berenjian J. Elman ANNs along with two different sets of inputs for predicting the properties of SCCs. Computers and Concrete, 2019, 24 (5): 399- 412. Cited in this article [1]

5	Krishnan S, Lokesh S, Devi M R. An efficient Elman neural network classifier with cloud supported internet of things structure for health monitoring system. Computer Networks, 2019, 151, 201- 210. https://doi.org/10.1016/j.comnet.2019.01.034

6	Cao S X, Wang Y, Tang Z H. Adaptive elman model of gene regulation network based on time series data. Current Bioinformatics, 2019, 14 (6): 551- 561. https://doi.org/10.2174/1574893614666190126145431 Cited in this article [1]

7	Shin Y, Kim Z, Yu J, et al. Development of NOx reduction system utilizing artificial neural network (ANN) and genetic algorithm (GA). Journal of Cleaner Production, 2019, 232, 1418- 1429. https://doi.org/10.1016/j.jclepro.2019.05.276 Cited in this article [1]

8	Malleswaran M, Vaidehi V, Sivasankari N. A novel approach to the integration of GPS and INS using recurrent neural networks with evolutionary optimization techniques. Aerospace Science and Technology, 2014, 32 (1): 169- 179. https://doi.org/10.1016/j.ast.2013.09.011

9	Ruiz L G B, Rueda R, Cullar M P, et al. Energy consumption forecasting based on Elman neural networks with evolutive optimization. Expert Systems with Applications, 2018, 92, 380- 389. https://doi.org/10.1016/j.eswa.2017.09.059

10	Wang J, Lü Z, Liang Y, et al. Fouling resistance prediction based on GA?Elman neural network for circulating cooling water with electromagnetic anti-fouling treatment. Journal of the Energy Institute, 2019, 92 (5): 1519- 1526. https://doi.org/10.1016/j.joei.2018.07.022

11	Zhou C, Ding L Y, He R. PSO-based Elman neural network model for predictive control of air chamber pressure in slurry shield tunneling under Yangtze River. Automation in Construction, 2013, 36, 208- 217. https://doi.org/10.1016/j.autcon.2013.03.001

12	Xie K, Yi H, Hu G, et al. Short-term power load forecasting based on Elman neural network with particle swarm optimization. Neurocomputing, 2019, 308, 1324- 1338. Cited in this article [1]

13	Ciarlini P, Maniscalco U. Wavelets and Elman Neural Networks for monitoring environmental variables. Journal of Computational and Applied Mathematics, 2008, 221 (2): 302- 309. https://doi.org/10.1016/j.cam.2007.10.040 Cited in this article [1]

14	Lei L, Chen W, Xue Y, et al. A comprehensive evaluation method for indoor air quality of buildings based on rough sets and a wavelet neural network. Building and Environment, 2019, 162, 106296. https://doi.org/10.1016/j.buildenv.2019.106296 Cited in this article [1]

15	Ong P, Zainuddin Z. Optimizing wavelet neural networks using modified cuckoo search for multi-step ahead chaotic time series prediction. Applied Soft Computing, 2019, 80, 374- 386. https://doi.org/10.1016/j.asoc.2019.04.016 Cited in this article [1]

16	Zhang Q, Benveniste A. Wavelet networks. IEEE Transactions on Neural Networks, 1992, 3 (6): 889- 998. https://doi.org/10.1109/72.165591 Cited in this article [2]

17	Zhang Q. Wavelet Frame and System Identification. IFAC Proceedings Volumes, 1997, 30 (11): 35- 40. https://doi.org/10.1016/S1474-6670(17)42819-9 Cited in this article [2]

18	Lu K H, Hong C M, Xu Q. Recurrent wavelet-based Elman neural network with modified gravitational search algorithm control for integrated offshore wind and wave power generation systems. Energy, 2019, 170, 40- 52. https://doi.org/10.1016/j.energy.2018.12.084 Cited in this article [1]

19	Storn R, Price K. Differential evolution: A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 1997, 11 (4): 341- 359. https://doi.org/10.1023/A:1008202821328 Cited in this article [1]

20	Byszewski L, Akca H. Existence of solutions of a semilinear functional-differential evolution nonlocal problem. Nonlinear Analysis: Theory, Methods & Applications, 1998, 34 (1): 65- 72. Cited in this article [1]

21	Dong W. Relaxation theorem for the evolution differential inclusions. Journal of Mathematical Analysis and Applications, 1999, 237 (1): 188- 200. https://doi.org/10.1006/jmaa.1999.6473

22	Lopez Cruz I L, Van Willigenburg L G, Van Straten G. Optimal control of nitrate in lettuce by gradient and differential evolution algorithms. IFAC Proceedings Volumes, 2001, 34 (26): 119- 124. https://doi.org/10.1016/S1474-6670(17)33643-1

23	Chen C W, Chen D Z, Cao G Z. An improved differential evolution algorithm in training and encoding prior knowledge into feedforward networks with application in chemistry. Chemometrics and Intelligent Laboratory Systems, 2002, 64 (1): 27- 43. https://doi.org/10.1016/S0169-7439(02)00048-5

24	Nearchou A C. A differential evolution algorithm for simple assembly line balancing. IFAC Proceedings Volumes, 2005, 38 (1): 247- 252. Cited in this article [1]

25	Babu B V, Chakole P G, Syed Mubeen J H. Multiobjective differential evolution (MODE) for optimization of adiabatic styrene reactor. Chemical Engineering Science, 2005, 60 (17): 4822- 4837. https://doi.org/10.1016/j.ces.2005.02.073 Cited in this article [1]

26	Tvrdk J. Adaptation in differential evolution: A numerical comparison. Applied Soft Computing, 2009, 9 (3): 1149- 1155. https://doi.org/10.1016/j.asoc.2009.02.010

27	Yüzge U. Performance comparison of differential evolution techniques on optimization of feeding profile for an industrial scale baker's yeast fermentation process. ISA Transactions, 2010, 49 (1): 167- 176. https://doi.org/10.1016/j.isatra.2009.10.006

28	Civicioglu P, Besdok E. Bernstain-search differential evolution algorithm for numerical function optimization. Expert Systems with Applications, 2019, 138, 112831. https://doi.org/10.1016/j.eswa.2019.112831

29	Moussa T M, Awotunde A A. Self-adaptive differential evolution with a novel adaptation technique and its application to optimize ES-SAGD recovery process. Computers & Chemical Engineering, 2018, 118, 64- 76. Cited in this article [1]

30	Bulsari A B, Saxn H. A Recurrent Neural Network Model, Aleksander I, Taylor J, editor, Artificial Neural Networks, Amsterdam: North-Holland, 1992: 1091-1094. Cited in this article [1]

31	Krishnan S, Lokesh S, Ramya Devi M. An efficient Elman neural network classifier with cloud supported internet of things structure for health monitoring system. Computer Networks, 2019, 151, 201- 210. https://doi.org/10.1016/j.comnet.2019.01.034 Cited in this article [1]

32	Jon R, Wang Z, Luo C, et al. Adaptive robust speed control based on recurrent elman neural network for sensorless PMSM servo drives. Neurocomputing, 2017, 227, 131- 141. https://doi.org/10.1016/j.neucom.2016.09.095 Cited in this article [1]

33	Yang L, Wang F, Zhang J, et al. Remaining useful life prediction of ultrasonic motor based on Elman neural network with improved particle swarm optimization. Measurement, 2019, 143, 27- 38. https://doi.org/10.1016/j.measurement.2019.05.013 Cited in this article [1]

34	Ren G, Cao Y, Wen S, et al. A modified Elman neural network with a new learning rate scheme. Neurocomputing, 2018, 286, 11- 18. https://doi.org/10.1016/j.neucom.2018.01.046 Cited in this article [1]

35	Islam S M, Das S, Ghosh S, et al. An adaptive differential evolution algorithm with novel mutation and crossover strategies for global numerical optimization. IEEE Transactions on Systems, Man & Cybernetics-Part B: Cybernetics, 2012, 42 (2): 482- 500. Cited in this article [1]

Vali M H, Aghagolzadeh A, Baleghi Y. Optimized watermarking technique using self-adaptive differential evolution based on redundant discrete wavelet transform and singular value decomposition. Expert Systems with Applications, 2018, 114, 296- 312.

https://doi.org/10.1016/j.eswa.2018.07.004

Cited in this article [1]

37	Zhang R, Chang P C, Song S, et al. Local search enhanced multi-objective PSO algorithm for scheduling textile production processes with environmental considerations. Applied Soft Computing, 2017, 61, 447- 467. https://doi.org/10.1016/j.asoc.2017.08.013 Cited in this article [1]

PDF(769 KB)

214

Accesses

Citation

Detail

Sections

Recommended

Abstract
Key words
Cite this article
1 Introduction
2 Elman Neural Network with Wavelet Transfer Function
2.1 Elman Neural Network
Figure 1 The network structure of ENN
2.2 Replacing the Sigmoid Function with the Wavelet Function
2.3 Wavelet-Based Elman Neural Network
2.3.1 Input Layer
2.3.2 Hidden Layer
2.3.3 Context Layer
2.3.4 Output Layer
2.4 Learning Algorithm of WENN
3 The Modified Differential Evolution Algorithm
3.1 Differential Evolution algorithm
3.2 Self-Adaptive Strategies to $P_{c}$ and $P_{m}$
3.3 Local Enhancement Strategy
Figure 2 Flow chat of ADLEDE-WENN
4 ADLEDE-WENN and the Comparative Forecasting Models
4.1 ADLEDE-WENN Forecasting Model
4.2 Comparative Models
4.3 Performance Measure
5 Experiment Method
5.1 The Foreign Exchange Rate Data
Table 1 Foreign exchange information
5.2 Normalization Preprocessing
5.3 Experimental Tools and Configuration of System
Table 2 Parameters of Elman neural network
Table 3 Parameters of ADLEDE
6 Results and Discussion
Table 4 The results of the experiments
Figure 3 Forecasting EURUSD with ENN
Figure 4 Forecasting EURUSD with WENN
Figure 5 Forecasting EURUSD with ADLEDE-ENN
Figure 6 Forecasting EURUSD with ADLEDE-WENN
6.1 The Impact of Wavelet Function to ENN
6.2 The Performance of the AD $L E$ DE Algorithm
Figure 7 The fitness (MSE*) of ADLEDE
6.3 The Structure of the Models and the Over-Fitting Problem
7 Conclusion
References

Received	Accepted	Published
2020-08-18	2021-01-21	2021-08-25
Issue Date
2021-08-25

模态框（Modal）标题

Please choose a citation manager

Content to export

Abstract

Key words

Cite this article

1 Introduction

2 Elman Neural Network with Wavelet Transfer Function

2.1 Elman Neural Network

Figure 1 The network structure of ENN

2.2 Replacing the Sigmoid Function with the Wavelet Function

2.3 Wavelet-Based Elman Neural Network

2.3.1 Input Layer

2.3.2 Hidden Layer

2.3.3 Context Layer

2.3.4 Output Layer

2.4 Learning Algorithm of WENN

3 The Modified Differential Evolution Algorithm

3.1 Differential Evolution algorithm

3.2 Self-Adaptive Strategies to Pc and Pm

3.3 Local Enhancement Strategy

Figure 2 Flow chat of ADLEDE-WENN

4 ADLEDE-WENN and the Comparative Forecasting Models

4.1 ADLEDE-WENN Forecasting Model

4.2 Comparative Models

4.3 Performance Measure

5 Experiment Method

5.1 The Foreign Exchange Rate Data

Table 1 Foreign exchange information

5.2 Normalization Preprocessing

5.3 Experimental Tools and Configuration of System

Table 2 Parameters of Elman neural network

Table 3 Parameters of ADLEDE

6 Results and Discussion

Table 4 The results of the experiments

Figure 3 Forecasting EURUSD with ENN

Figure 4 Forecasting EURUSD with WENN

Figure 5 Forecasting EURUSD with ADLEDE-ENN

Figure 6 Forecasting EURUSD with ADLEDE-WENN

6.1 The Impact of Wavelet Function to ENN

6.2 The Performance of the ADLEDE Algorithm

Figure 7 The fitness (MSE*) of ADLEDE

6.3 The Structure of the Models and the Over-Fitting Problem

7 Conclusion

{{custom_sec.title}}

{{custom_sec.title}}

References

{{custom_fnGroup.title_en}}

Footnotes

3.2 Self-Adaptive Strategies to $P_{c}$ and $P_{m}$

6.2 The Performance of the AD $L E$ DE Algorithm

**Figure 7 The fitness (MSE*) of ADLEDE**