A Case Similarity Calculation Model Based on the Urban Flooding Case with Stratified Data Characteristics

Xiaoyu ZHU, Yuxiang FAN, Junguang GAO

Journal of Systems Science and Information ›› 2018, Vol. 6 ›› Issue (2) : 134-151.

PDF(250 KB)
PDF(250 KB)
Journal of Systems Science and Information ›› 2018, Vol. 6 ›› Issue (2) : 134-151. DOI: 10.21078/JSSI-2018-134-18
 

A Case Similarity Calculation Model Based on the Urban Flooding Case with Stratified Data Characteristics

Author information +
History +

Abstract

As the pace of urbanization is accelerating, increasing amount of floodplain has been projected as the future cities. Subsequently, urban flooding is being studied by global emergency management exports due to its increasingly significant impact on us. Some existing research on flooding emergency management based on the case-based reasoning (CBR) method have made tremendous progress, but the urban flooding case with its stratified data characteristics is required a new methodology which is different from the ones applied to flash floods. So, based on the case-based reasoning (CBR) method, this paper proposed a CPIE-CBR model with four layers, classification filtration, punctiform similarity, interval similarity and entropy weight method, to calculate the case similarity among the urban flooding case with stratified data characteristics. Then we carry out the numerical simulation with the real data about China and conduct some comparison with original ways so that we observe the validity and efficiency of our model in the end.

Key words

urban flooding / CBR / case similarity / emergency management

Cite this article

Download Citations
Xiaoyu ZHU , Yuxiang FAN , Junguang GAO. A Case Similarity Calculation Model Based on the Urban Flooding Case with Stratified Data Characteristics. Journal of Systems Science and Information, 2018, 6(2): 134-151 https://doi.org/10.21078/JSSI-2018-134-18

1 Introduction

Flooding is the one of the most frequent and destructive disasters on our earth, which affects millions of lives around the world every year. According to the statistics from international disaster database EMDAJ[1], just from the 1970s to the early 21 century, flooding broke out 2156 times worldwide, coming along with the economic losses reaching 386 billion dollars and 206303 deaths in total. The affected population in flooding has risen from about 4 million a year in 1950 to more than one percent of the current global population. It is noteworthy that not simply did natural factors result in these shocking numbers, but also more importantly, the man-made did.
Over the past two centuries, the process of urbanization has led to a trend of creating urban agglomeration around the world, where at least 61% of the world's population will live in cities by 2030. This trend will significantly increase the number of people exposed to flooding and the accumulation of economic resources in these urban areas within the scope of floodplains. Due to the changing demographic and socio-economic conditions within these cities, unplanned urbanization, the development of high-risk areas with flood disaster, environmental degradation and other issues, the problem of urban flooding has been deteriorating. Climate variability and extreme weather events caused by urbanization, including sea level rise, increasing storms and extreme torrential rains, has also made urban flooding more visible.
The development of the city has changed our living environment and the way of life. Whereas, the high-intensity urbanization activities also influence the original climate characteristics where urban rainband gradually formed. Obviously, construction of underground garage, over-exploitation of groundwater and increasing high-rise buildings have made urban areas increasingly vulnerable and exposed to the growing impact from heavy rains, floods and other extreme events. According to Bulletin of Flood and Drought Disasters in China[2], there is an average of 158 cities above the county level flooded or suffering serious waterlogging annually from 2006 to 2014 in China, where the upward trend is evident.
As a result, urban flooding forecast, rescue and decision-making support are attracting increasing attention from global disaster emergency response management experts and governments. Among that, the role of emergency response decision support in flooding management has become a hot research issue[3]. In emergency decision making, what decision makers have to face are the limited time and great psychological pressure, so using the cases in the historical flooding case database to provide some experience and information to decision makers can improve the efficiency and quality of emergency response decision making, which saves valuable time and resources for follow-up flooding rescue as well.
Thus, case-based reasoning methods based on the calculation of case similarity can help decision makers to extract historical cases that share much in common with current target case from case base quickly[4, ref5]. In general, the process of CBR consists of two core parts, describing new cases in terms of characteristic attributes of source cases to facilitate retrieval and assigning proper algorithm to calculate the similarities between different cases. Naturally, to build characteristic attributes and similarity algorithm for urban flooding is the task of this paper.
For one, every flooding case is an independent event. In order to utilize case-based reasoning approach to research flooding cases, what we need to do firstly is to construct a measureable attributes system for their basic characteristics. The construction of flooding's characteristic attributes has been studied extensively. Fang, et al.[6] proposed six aspects to describe flooding cases, affected population, deaths, engineering loss, affected area, collapsed houses and direct or indirect economic loss, in the Chinese flood case databases from 1736 to 1991 they built. Kelsch[7] illustrated that flash floods are the phenomenon in which the important hydrologic processes are occurring on the same spatial and temporal scales as the intense precipitation. Hadihardaja, et al.[8] investigated six attributes, rainfall rate, duration, flood area, flood depth, submerged time and travel time, for flooding cases in their decision support system for predicting flood characteristics based on database modeling development.
For another one, case retrieval is the key part in case-based reasoning because it provides the most similar case to the current problem from the case database. The core of case retrieval is to calculate the degree of similarity between source cases and current problem. The general similarity calculation methods include k-nearest neighbors method, decision tree method, knowledge-guided method and so on. Besides that, Zhang, et al.[9] raised a universal method for describing and organizing emergency cases based on three-tier architecture and designed a kind of similarity algorithm based on two-layer structure according to attribute features of emergency cases to avoid the defect of traditional nearest neighbor algorithm. Liu[10] designed a geographical case reasoning (GCR) model of integrating case-based reasoning (CBR) and rule-based reasoning (RBR) based on geography information system (GIS) with geographic index method, similarity method, case index and match method. He applied this model to the disasters emergency intelligent system of a pipeline company. Zhong, et al.[11] focused on the method of case-based reasoning and its application in emergency commanding and decision-making. They analyzed the features and projected the description and storage pattern of emergency cases. Following that, the retrieval algorithm, which is the kernel of CBR, and the process of case-based reasoning in emergency commanding and decision-making was elaborated. Finally, they proposed the CBR prototype system of emergency commanding and decision-making. Wang, et al.[12] thought a widely accepted approach to assist disaster management is to find and learn the experiences from the similar cases in history. They offered a digital disaster case structure to record the whole procedure of each disaster considering the importance of spatial and temporal information. Zhao, et al.[13] proposed a new textual case similarity algorithm, sentence vector space model, to avoid the disadvantage that traditional algorithm based on vector space model actually neglected the word order and structure in sentence so that it would affect the accuracy of similarity computing. Fan, et al.[14] developed a new method for hybrid similarity measure with five formats of attribute values: Crisp symbols, crisp numbers, interval numbers, fuzzy linguistic variables and random variables. Retrieve the proper historical cases according to the obtained hybrid similarities, which are given by aggregating attribute similarities using the simple weighting method.
Overall, the above-described research on the characteristic attributes of flooding cases is lack of consideration about the diversity and complexity of data when different attributes are denoted through various types of data. Meanwhile, emergency cases are typically unstructured with complex data so that the previous studies on similarity calculation in emergency decision-making cannot ensure the validity of calculation when the attributes among cases are inconsistent, especially, for unconventional emergencies.
Therefore, for the stratified data characteristics of urban flooding case, this paper constructs the CPIE-CBR model with different calculating processes designed to attributes of various types of data, which can improve the validity of calculation significantly. The remainder of this paper is organized as follows. First, we describe the CPIE-CBR model in detail, including the four layers, classification filtration, punctiform similarity calculation, interval similarity calculation and entropy weighting method. Then, we discuss the results of our numerical simulations based on the real data from Bulletin of Flood and Drought Disasters in China and make concluding remarks in the end.

2 Model

The attribute data about the case of urban flooding is recognized as an aggregation of diversity and complexity. Different types of data are required to be processed through their adaptive methodologies on similarity measurement, so we stratify these attribute data to construct the CPIE-CBR model to calculate the similarity among various cases of urban flooding.
The CPIE-CBR model is constituted by four layers, classification filtration, punctiform similarity calculation, interval similarity calculation and entropy weighting method. The attribute data processed through the first layer is categorized data, such as the type of city, season and flooding. Obviously, the value of categorized data just show their varying types and it is pointless to compare their value. In the second layer, the adaptive data is measured through ordinal and exact numbers. For example, the intensity of flooding is counted by ordinal numbers because it is defined by grade and the exact numbers mainly involve casualties (affected population, transferred population, deaths and missing population), direct economic loss and duration. As for the data of the third layer, they are intervals, such as the affected area, which is an interval instead of an exact number. Naturally, we propose different similarity calculation algorithm for these different types of data of three layers.
In addition, in order to obtain the ultimate case similarity between the target case and the source ones we also need to assign reasonable weightings to similarity on various attributes. In the fourth layer, we adopt the entropy weight method to determine various attributes similarity. The entropy weight method is a comprehensive evaluation to multiple indicators and objects, whose evaluating results is only up to objective information without the interference from human factors.
To summarize, the calculation process of the CPIE-CBR model based on the case of urban flooding with stratified data characteristics is showed in Figure 1.
Figure 1 The flowchart of CPIE-CBR model based on the urban flooding cases with stratified data characteristics

Full size|PPT slide

2.1 Classification Filtration: Utilize Categorized Data to Screen Cases

Similarity and dissimilarity is the important concepts on data mining technology, such as clustering, nearest neighbor classification and deviation detection. Usually, we no longer require raw data once the similarity is calculated so this method can be considered as a process, which transforms the data into similarity (dissimilarity) space before the subsequent analysis. The first layer of the CPIE-CBR model mentioned above is classification filtration. Considering the different types of data among these attributes, this filtration is aimed to perform a coarse filter on the source case base under specific filtering requirement according to the similarity of attributes on categorized data. The process is showed as follow.
Let C={C1,C2,,Ck,,Cp} be the set of source case base, case T be the target case and P={P1,P2,,Pi,,Pm} be the set of attributes. Particularly, the set of attributes of categorized data on urban flooding is described as {the type of city, season, the type of flooding}. If the value of attribute Pi in case T equals to that in case Ck from source case base, the dissimilarity between these two cases on this attribute is 0, otherwise it is 1. Thus, the dissimilarity matrix for attributes on categorized data about target case T in the whole source case base can be obtained as
(1)
where diski is binary. Particularly, if the value of Pi is not unique number among source cases and case T, such as Pi is determined from the set A={xPi(x)} for target case T but B={yPi(y)} for source case Ck, we assume that disk1=1 when AB=, otherwise it is 0.
Then, we can calculate the total dissimilarity between Ck and T by
disck=i=1mdiski.
(2)
Since final screening requires the similarity, we will transform dissimilarity to similarity. Generally, the formula sim=1dis is adaptive. But the value of dissimilarity in this paper is determined from the interval [0,]. In order to limit the similarity into [0,], we use a nonlinear method
simck=11+disck.
(3)
At last, sort these similarities under a specific filtering requirement. For example, we filter out the source case whose simck is ranked in the second half, so that we will only research the rest of source cases in the next layer.

2.2 Punctiform Similarity: Utilize Punctiform Data to Calculate the Similarity of Attributes

In the second layer, the punctiform data we use include ordinal and exact numbers. For example, the intensity of flooding is counted by ordinal numbers because there is a distinction on its sequence and size even though it is also a result of classifying attributes like categorized data. As for some attributes, like urban population density, casualties (affected population, transferred population, deaths and missing population), direct economic loss and duration, they are described by exact numbers.
We assume that a is the value of an attribute in target case T and b is the corresponding one in a source case Ck. Then the punciform similarity can be defined as
simkj(a,b)=1|ba|βα,a,b[α,β],
(4)
where α, β is the maximum and minimum values for this punctiform attribute among the whole cases. Thus, we obtain the similarity matrix for punctiform attributes:
(5)
where n is the total number of attributes measured by ordinal or exact numbers.

2.3 Interval Similarity: Utilize Interval Data to Calculate the Similarity of Attributes

However, some attributes on urban flooding are not the exact values, but an interval, so we need a new design for calculating the similarity between two interval attributes. In this section, we refer to the similar algorithm which Zhao, et al.[15] adopted to solve a missile conceptual design problem.
We assume that the value of attribute Ps in a source case Ck is indicated through an interval, [a1, a2], while that in target case T is [b1, b2]. Then the interval similarity is defined as follow:
simks([a1,a2],[b1,b2])=a1a2b1b2sim(x,y)dydx(a2a1)(b2b1),
(6)
where a1,a2,b1,b2[α,β]. Actually, Formula (6) is the average value of the exact similarity between a and b in the whole interval. Setting Formula (5) into Formula (6), we obtain:
simks([a1,a2],[b1,b2])=a1a2b1b2sim(x,y)dydx(a2a1)(b2b1)=1a1a2b1b2|yx|dydx(a2a1)(b2b1)(βα).
(7)
When a1b1, Formula (6) is only up to the value of a2. Obviously, a2 could have three conditions, a2b1, b1<a2b2 or b2<a2. When a2b1, Formula (7) can be rewritten as
simks([a1,a2],[b1,b2])=1a1a2b1b2(yx)dydx(a2a1)(b2b1)(βα),
(8)
where
a1a2b1b2(yx)dydx=(b2b1)(a2a1)(b2+b1a2a1)2.
(9)
Setting Formula (9) into Formula (8), we will obtain
simks([a1,a2],[b1,b2])=1(b2+b1a2a1)2(βα).
(10)
When b1<a2b2, Formula (7) can be rewritten as
simks([a1,a2],[b1,b2])=116(a2a1)(b2b1)(βα)[3(b2b1)(b1a1)(b2a1)+(a2b1)3+(a2b2)3+(b2b1)3].
(11)
When b2<a2, Formula (7) can be rewritten as
simks([a1,a2],[b1,b2])=1a1b1b1b2(yx)dydx+b2a2b1b2(xy)dydx(a2a1)(b2b1)(βα)+b1b1b1x(xy)dydx+b1b2xb2(yx)dydx(a2a1)(b2b1)(βα)=1b22+b12+b2b13(a2a1)(βα)a22+a12(a2+a1)(b2+b1)2(a2a1)(βα).
(12)
Overall, the similarity of interval attributes can formulated as
simks([a1,a2],[b1,b2])={1(b2+b1a2a1)2(βα), a2b1,13(b2b1)(b1a1)(b2a1)+(a2b1)3+(a2b2)3+(b2b1)36(a2a1)(b2b1)(βα),  b1<a2b2,1b22+b12+b2b13(a2a1)(βα)a22+a12(a2+a1)(b2+b1)2(a2a1)(βα), b2<a2.
(13)
Obviously, if a1>b1, exchange the values of a1, b1 and a2, b2 respectively, we can still utilize formula (9) to calculate the similarity between [a1, a2] and [b1, b2]. According the formulas described above, we obtain the similarity matrix for interval attributes
(14)
where t is the number of interval attributes and simks denotes the similarity between target case T and source case Ck on interval attribute Ps.

2.4 Entropy Weight Method: Calculate the Weightings Assigned to the Similarity of Attributes

The concepts of entropy involve three domains, physics, molecular motion theory and information theory. In information theory, entropy is a measure to system disorder. When we use it to assign weightings to those attributes, if the values of the attribute fluctuate relatively strongly among the cases, the entropy is relatively small and indicates that this attribute contains much valid information so we will assign larger weighting to it. Conversely, if this attribute has only slight fluctuation, the entropy is large and there is little valid information so its weighting has to be small. Specially, when an attribute has the same value on any case, its entropy will reach the peak, which implies the attribute could provide no valid message to decision making. Naturally, removing it from the attribute index could be recommended.
Therefore, entropy weight method is an objective weighting methodology. In the fourth layer of CPIE-CBR model based on the case of urban flooding with stratified data characteristics, we utilize this method to assign weighting to each attribute about urban flooding. Based on the similarity matrix for attributes of punciform and interval data, we obtain the overall similarity matrix for all the attributes
simp×(n+t)=[sim11sim12 sim1j sim1n sim1s sim1(n+t)sim21sim22 sim2j sim2n sim2s sim2(n+t)simk1simk2 simkj simkn simks simk(n+t)simp1simp2 simpj simpn simps simp(n+t)].
(15)
Then, there are three steps to determine the weightings for attributes:
1) Standardize the raw data. Formula (16) can be recognized as a matrix with n+t attributes and p source cases:
sim(n+t)×p=[sim11sim12 sim1k sim1psim21sim22 sim2k sim2psimj1simj2 simjk simjpsimn1simn2 simnk simnpsims1sims2 simsk simspsim(n+t)1sim(n+t)2 sim(n+t)k sim(n+t)p].
(16)
Standardize the matrix sim(n+t)×p as
S=(sjk)(n+t)×p,
(17)
where sjk is the standardized value of the similarity about attributes Pj on source case Ck:
sjk=simjkmink{simjk}maxk{simjk}mink{simjk},
(18)
and sjk[0,1].
2) Define entropy. The entropy of attribute Pj among n+t attributes for k source cases can be defined as
Hj=wk=1pfjklnfjk,j=1,2,,n,,n+t,
(19)
where fjk=sjk/k=1psjk, w=1/lnn. Let fjklnfjk=0 when fjk=0.
3) Define entropy weight. After calculating Hj, we will assign entropy weight for Pj like
gi=1Hi(n+t)j=1n+tHj,
(20)
where 0gj1 and j=1n+tgj=1. Now, through entropy weight method, we obtain the weightings for all the attributes. Naturally, the overall similarity between source case Ck and the target case T can be formulated as follow:
simCkT=j=1n+tgj×simjk.
(21)
Using Formula (21) to traverse all cases which are reserved after the classification filtration in the first layer, we will obtain the overall similarity between target case and every source case. Finally, the source case with high overall similarity will be the outcome of case-based reasoning and selected as the reference for emergency response decision of the target case.

3 Results

According to the definition of flooding in Bulletin of Flood and Drought Disasters in China, the urban flooding we discussed in this paper refers to the flood, waterlogging, flash flood, landslide, debris flow and their secondary disasters happening in cities caused by storm water, snowmelt, jumble ice, dam break, storm surge and tsunamis etc.
In this section, we perform numerical simulations on the above-described CPIE-CBR model to verify its feasibility and effectiveness with the data from Bulletin of Flood and Drought Disasters in China, 2013 and 2014. In the first layer, the categorized attributes include the type of city, type of flooding and season. In the second layer, the punctiform attributes selected are the intensity of flooding, affected population, transferred population, deaths, missing population, direct economic loss, collapsed houses and duration. The only interval attributes in the third layer is the affected area. As for the last layer, it is aimed to assign weightings to the attributes described above so that produce the overall similarity between the target case and each source case. The specific attributes in the first three layers are showed in Table 1.
Table 1 Attributes with stratified data in three layers of the CPIE-CBR model
Layer Type of data Attribute
1 Categorized data Type of city
Type of flooding
season
2 Ordinal data Intensity of flooding
Exact data Affected population
Transferred population
Deaths
Missing population
Direct economic loss
Collapsed houses
Duration
3 Interval data Affected are
In the first layer, we divide the attribute, type of city, into five categories, cities in riverine area, coastal area, alpine zone or mountainous region and cities without significant geographical features. In the attribute, type of flooding, there are ten subdivisions, storm water, flash flood, snowmelt flood, jumble ice, outburst flood, urban waterlog, lake flood, astronomical tide, tsunamis and storm surge. The attribute, season, obviously refer to spring, summer, autumn and winter. In the second layer, the intensity of flooding is measured by the rainfall rate per 24 hours. It can be sorted as drizzle, light rain, moderate rain, heavy rain, extremely heavy rain, torrential rain and extremely torrential rain.
Except that, the attributes, affected population, transferred population, deaths, missing population, direct economic loss, collapsed houses, duration and affected area, denoted through exact numbers or interval require no additional interpretation. We show the real data we use of all attributes in source cases and target case in Table 2 and the interpretation and subdivision for categorized and ordinal attributes in Table 3 respectively.
Table 2 The data of attributes on source cases
Layer Type of data Attribute Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10 Case T
1 Categorized data Type of city 1, 2 1, 4 1 2 1, 4 1, 2, 4 1, 4 4 1, 2, 4 1, 2, 4 1
Type of flooding 1 1 1 10 1 2 1 1 10 10 1
season 1 2 2 2 3 1 2 2 2 3 2
2 Ordinal data Intensity of flooding 7 4 5 7 6 6 7 6 7 7 6
Exact data Affected population 134.22 18.62 410.75 325.83 251.12 125.57 349.05 63 837.93 982.22 318
Transferred population 24.17 0.51 54.13 38.6 12.51 0 0 0 126.66 82.8 3.2
Deaths 15 19 19 25 47 39 71 29 50 30 8
Missing population 2 34 3 6 17 7 177 0 4 1 10
Direct economic loss 47.54 6.13 62.88 119.52 39.07 33.58 203.09 9.46 167.06 230.82 20.5
Collapsed houses 1.96 0.16 0.59 2.32 1.6 0.95 1.4 1.2 2.61 0.89 1.04
Duration 5 5 8 2 4 3 5 7 2 2 4
3 Interval data Affected are
Table 3 The interpretation and subdivision for categorized and ordinal attributes
Type of city Riverine area 1
Coastal area 2
Alpine zone 3
Mountainous region 4
No significant geographical features 5
Type of flooding Storm water 1
Flash flood 2
Snowmelt flood 3
Jumble ice 4
Outburst flood 5
Urban waterlog 6
Lake flood 7
Astronomical tide 8
Tsunamis 9
Storm surge 10
Season Spring: March, April, May 1
Summer: June, July, August 2
Autumn: September, October, November 3
Winter: December, January, February 4
Intensity of flooding Drizzle: < 0.1mm 1
Light rain: < 10mm 2
Moderate rain: < 25mm 3
Heavy rain: < 50mm 4
Extremely heavy rain: < 100mm 5
Torrential rain: < 250mm 6
Extremely torrential rain: 250mm 7
Now, we conduct the numerical simulations. Using the methodology about classification filtration proposed in section two, we filter out the source cases whose similarity on categorized attributes is ranked in the last 30%. According to the calculating result from our simulation, the source case 4, 6 10 was screened out so the rest of source case base will be processed in CPIE-CBR model further.
Next, we calculate the similarity of punctiform attributes and interval attribute in the second and third layers of our model for these qualified source cases in the first layer. The results are showed in Table 4.
Table 4 The punctiform similarity and interval similarity for attributes of urban flooding
Layer Attributes Case 1 Case 2 Case 3 Case 5 Case 7 Case 8 Case 9
Punctiform similarity Intensity of flooding 0.6667 0.3333 0.6667 0.6667 1.0000 1.0000 0.6667
Affected population 0.7757 0.6346 0.8868 0.9904 0.9184 0.7651 0.9621
Transferred population 0.8344 0.9788 0.5979 0.7205 0.9265 0.9747 0.9747
Deaths 0.8889 0.8254 0.8254 0.7302 0.3810 0.5079 0.0000
Missing population 0.9548 0.8644 0.9605 0.9774 0.9605 0.9831 0.0565
Direct economic loss 0.8627 0.9270 0.7848 0.4973 0.9057 0.9336 0.0730
Collapsed houses 0.6245 0.6408 0.8163 0.4776 0.7714 0.9633 0.8531
Duration 0.8333 0.8333 0.3333 0.6667 1.0000 0.8333 0.8333
Interval similarity Affected area 83.8084 67.3888 60.0912 87.4572 96.5792 72.0639 33.8655
Then, using the similarity matrix, we obtained in the second and third layers. We assign the entropy weight to each attribute of punctiform data and interval data after the treatment of standardization and normalization (Table 5). Finally, the overall similarities of the seven remaining source cases relating to the target case are showed in Table 6.
Table 5 The entropy weights for attributes
Layer Attributes Entropy weight
Punctiform similarity Intensity of flooding 0.1203
Affected population 0.1233
Transferred population 0.1193
Deaths 0.1090
Missing population 0.0876
Direct economic loss 0.1003
Collapsed houses 0.1324
Duration 0.0977
Interval similarity Affected area 0.1102
Table 6 The overall similarity between the target case and filtered cases
No. Case 1 Case 2 Case 3 Case 5 Case 7 Case 8 Case 9
Overall Similarity 9.9404 8.0848 7.2750 10.2676 11.4013 8.7148 4.2530
Consequently, Table 6 indicates that the source case 7 has the highest similarity with the target case so that case 7 is recommended as reference case of the target case in our simulations. The whole above-discussed simulation run for 56.9559 seconds in Matlab.
To reflect the strengths of our CPIE-CBR model on the urban flooding case with stratified data, we also test it on the case without stratified characteristics to observe the change on eventual result and running time.
In the first test, we discard the first layer, classification filtration, but reserve the stratification of punctiform similarity and interval similarity. Then, we set all similarities of attributes into the original matrix of entropy weight method (Formula (16)) and obtain the weightings assigned for all attributes showed in Table 7.
Table 7 Entropy weights for all attributes without classification filtration
Attribute Entropy weight
Type of city 0.0835
Type of flooding 0.1912
Season 0.1912
Intensity of flooding 0.0615
Affected population 0.0615
Transferred population 0.0534
Deaths 0.0564
Missing population 0.0397
Direct economic loss 0.0794
Collapsed houses 0.0697
Duration 0.0607
Affected area 0.0519
Now, we acquire the overall similarity between the target case and all source cases without classification filtration in Table 8 and this time, simulations consume 73.7512 seconds.
Table 8 Overall similarity between the target case and all source cases without classification filtration
No. Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10
Overall Similarity 5.0095 4.3272 3.9378 5.5694 5.2266 4.8658 5.7556 4.5254 2.2523 4.4716
Obviously, the best recommendation is also case 7 when all source cases have no operation in the first layer so it implies that our CPIE-CBR model has not lost the generality. However, with the pretreatment of classification filtration in the first layer, the running time drops by 16.7953 seconds so that the efficiency of the whole calculation increase by 30%. Thus, we can speculate that under the situation of a massive source case base, if we change the proportion of abandonment from 30% to 50% or higher, the performance of CPIE-CBR model should be better. In the field of emergency management of urgent urban flooding, this model will undoubtedly help decision maker to shorten the decision-making time so that we are accessible to save more lives in the saved time and reduce unnecessary economic losses.
In the second test, we discard not only the first layer, classification filtration, but also the third layer, interval similarity. So, we take the average of maximum and minimum values in the interval of affected area as the new punctiform value for the attributes. Sequentially, the overall similarities change as Table 9.
Table 9 Overall similarity between the target case and all source cases without stratification
No. Case1 Case2 Case3 Case4 Case5 Case6 Case7 Case8 Case9 Case10
Overall Similarity 0.7054 0.8662 0.8515 0.5802 0.7351 0.5499 0.7956 0.8243 0.5127 0.3899
Surprisingly, the results of overall similarities are different from that in the first test and original simulation, which indicates case 2 has the highest similarity with target case. For further analysis, we list the raw data of case 2, 7 and target case to make a detailed comparison in Table 10.
Table 10 The comparison between the target case and case 2, 7 on all attributes
Type of Data Attribute Caes 2 Case 7 Case T
Categorized data Type of city 1, 4 1, 4 1
Type of flooding 1 1 1
Season 2 2 2
Ordinal data Intensity of flooding 4 7 6
Exact data Affected population (ten thousand) 18.62 349.05 318
Transferred population (ten thousand) 0.51 0 3.2
Deaths 19 71 8
Missing population 34 177 10
Direct economic loss (hundred million of yuan) 6.13 203.09 20.5
Collapsed houses (ten thousand) 0.16 1.4 1.04
Duration (days) 5 5 4
Interval data Affected area (thousand hectares) 6.5–7.5 160–170 145–155
In Table 10, we find the two source cases have no significant difference in type of city, type of flooding, season, transferred population and duration. However, case 7 is more consistent with the target case than case 2 in affected population, collapsed houses and affected area while less consistent in deaths, missing population and direct economic loss. Overall, the CPIE-CBR model is more effective regarding the tradeoff between the accuracy and efficiency, especially for interval attributes.

4 Conclusions

In this study, we investigate the characteristic attributes with stratified data of urban flooding case and construct the CPIE-CBR model to calculate the similarity between target case and source cases. In the proposed model, there are four layers to process the source cases in order to obtain the overall similarity to provide recommendation to decision makers. Furthermore, based on the numerical simulation, CPIE-CBR model offer a valid solution and an efficient performance.
Through numerical calculation, we observe the following findings: (i) The CPIE-CBR model run faster than the algorithm without classification filtration at the beginning by 30%. So, we can provide the effective information for decision makers more quickly in limited time, which saves valuable time for urban flooding relief. (ii) Considering the interval data is different from the punctiform ones, if the interval data is converted into the punctiform just according to the mean value, the partial information of the interval data will be lost so consequently affect the accuracy of the similarity calculation. In the third layer of CPIE-CBR model, interval similarity calculated through definite integral performs more validly with the target case. (iii) In the previous study of flooding case-based reasoning, the weighting of each attribute is often assigned by the scoring from experts, but this method is often doped with subjective factors. Thus, the information entropy is used in this paper to measure the effective information offered by the data and determine the weightings of each characteristic attribute of urban flooding by entropy weight method to improve rationality in calculation.
Because of the tide of urbanization in the world, more floodplains are being used for urban development. Therefore, emergency management for urban flooding becomes an important cornerstone to support the strategy for future urbanization. As we all know, urban flooding occurs accompanied by strong uncertainty, rapid expansion and massive destructiveness so governments are required to make response in limited time. If the optimal action could not be taken, urban flooding would cause more damage to the cities, and even lead to secondary disasters. Whereas, due to the limitation of decision makers' capacity and knowledge, it is difficult for them to grasp the comprehensive information in a short time and analyze the situation quickly to make the best decision. But fortunately, many urban flooding cases in the history can provide us with experience on operating on disaster prevention and mitigation. The proposed model can be applied to extract the similar historical cases to offer the optimal reference to decision makers.
However, the operation after case matching is also the key process in disaster relief. Therefore, it would be a potential direction to ameliorate the research on urban flooding and disaster emergency management.

References

1
EM-DAT. The OFDA/CRDA international disaster database. www.em-dat.net.
2
State flood control and drought relief headquarters. Bulletin of Flood and Drought Disasters in China. Beijing: The Ministry of Water Resources of the Peoples Republic of China, 2014.
3
Liu J. Research on the urban flood vulnerability quantitative model and its dynamic evolution. Harbin: Harbin Institute of Technology, 2014.
4
Da S M, Shen H Z, Liu H. Research on case-based reasoning combined with rule-based reasoning for emergency. IEEE International Conference on Service Operations and Logistics and Informatics, 2007, 510- 514.
5
Francesco R, Paolo A, Anna P. Cases on fire: applying CBR to emergency management. The New Review of Applied Expert Systems, 1999, 5 (6): 175- 190.
6
Fang W H, Wang J A. Establishment of case databases concerning the historical floods in China. Journal of Beijing Normal University (Natural Science), 1998, (2): 269- 275.
7
Kelsch M. 2.1 COMET Flash flood cases: Summary of characteristics. 2002.
8
Hadihardaja I K, Indrawati D, Suryadi Y, et al. Decision support system for predicting flood characteristics based on database modelling development (Case study: Upper Citarum, West Java, Indonesia). WIT Transactions on Ecology and the Environment, 2011, 167, 376- 385.
9
Zhang Y J, Zhong Q Y, Ye X, et al. Research on method of emergency aid decision-making based on CBR. Application Research of Computers, 2009, 26 (4): 1412- 1415.
10
Liu Z W, Li L. Research on emergency intelligent decision based on GIS and case-based reasoning. 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization, 2010, 231- 234.
11
Zhong Q Y, Zhang Y J, Qu X F, et al. Research on method of CBR and its application in emergency commanding and decision-making. The 4th International Conference on Wireless Communications, Networking and Mobile Computing, Dalian, 2008, 11787- 11790.
12
Wang F, Huang Q Y. The importance of spatial-temporal issues for case-based reasoning in disaster management. The 18th International Conference on Geoinformatics, Beijing, 2010, 1- 5.
13
Zhao X H, Wu J, Dong H N, et al. Research on textual case similarity algorithm. Journal of Northwest University (Natural Science Edition), 2010, 40 (6): 991- 994.
14
Fan Z P, Li Y H, Wang X, et al. Hybrid similarity measure for case retrieval in CBR and its application to emergency response towards gas explosion. Expert Systems with Applications, 2014, 41 (5): 2526- 2534.
15
Zhao K B, Feng S, Li F. Research of the similarity calculation models based on the features of case properties. Journal of WUT (Information & Management Engineering), 2003, 25 (1): 24- 27.

Funding

Beijing Natural Science Foundation(9162003)
PDF(250 KB)

251

Accesses

0

Citation

Detail

Sections
Recommended

/