Optimization Analysis on Dynamic Reduction Algorithm

Yizhou CHEN, Jiayang WANG

Journal of Systems Science and Information ›› 2018, Vol. 6 ›› Issue (5) : 447-458.

PDF(144 KB)
PDF(144 KB)
Journal of Systems Science and Information ›› 2018, Vol. 6 ›› Issue (5) : 447-458. DOI: 10.21078/JSSI-2018-447-12
 

Optimization Analysis on Dynamic Reduction Algorithm

Author information +
History +

Abstract

On the basis of rough set theory, the strengths of dynamic reduction are elaborated compared with traditional non-dynamic methods. A systematic concept of dynamic reduction from sampling process to the generation of the reduct set is presented. A new method of sampling is created to avoid the defects of being too subjective. And in order to deal with the over-sized time consuming problem in traditional dynamic reduction process, a quick algorithm is proposed within the constraint conditions. We have also proved that dynamic core possesses the essential characteristics of a reduction core on the basis of the formalized definition of the multi-layered dynamic core.

Key words

knowledge discovery / rough set / dynamic reduction / machine learning

Cite this article

Download Citations
Yizhou CHEN , Jiayang WANG. Optimization Analysis on Dynamic Reduction Algorithm. Journal of Systems Science and Information, 2018, 6(5): 447-458 https://doi.org/10.21078/JSSI-2018-447-12

1 Introduction

The information surrounding us is usually inaccurate, incomplete or uncertain, but our research and analysis are both based on the process of information, which means that in order to draw a conclusion, we have to be capable of dealing with the rough information. The idea of rough set theory was raised by Pawlak in 1982[1]. It deals with knowledge on the basis of partition structure, where every set of a single partition is seen as a concept. The roughness of each concept reflects its classification ability: The rougher the concept is, the worse classification ability it grasps. And on the contrary, the more accurate the concept is, the better classification ability it grasps, with higher recognition degree at the same time. As a result, a knowledge base can be obtained, with which the inaccurate or uncertain knowledge can be described.
Rough set theory is a mathematical tool that deals the with inaccurate, inconsistent or incomplete information system. After decades of research, rough set theory has been applied to a variety of areas, especially in the medical field[2-5]. There are a number of classification algorithms based on rough set theory to analyze the information system. The reduction of data is a fundamental process of knowledge discovery, and it is also a significant measure to seek for the most simplified and generalized rules in the information system. There have been various achievements by applying traditional rough set methods to calculate the minimal reduct set of the static data system[6]. However, when dealing with an information system consisting of a mass amount of data, especially when the data is chaotic, traditional methods have a comparatively worse performance in making a forecast of previously unseen objects. So looking for the most stable reduct set becomes a key problem in reduction analysis[7].
Classic statistical techniques have a remarkable performance over ideal static data systems, but due to their sensitiveness to the noise, heuristic methods are not appropriate to deal with chaotic data. Reduction in standard rough sets is based upon the indiscernible relation[8], which can be changed by even one polluted element, and thus a completely different reduct set may be obtained. The reality is that most data is chaotic, especially for incomplete data tables, where the accuracy of the attribute relations plays a vital role. When faced with a massive amount of changing data, static algorithms are not always stable, having the defects of being not able to reflect local changes and time consuming at repeated work.
To solve the problems listed above, a new concept of dynamic reduction[9] was raised by Bazan in 1994. With dynamic reduction, the calculation of the most stable reduct set of a given decision table becomes possible. The reduct set obtained in this way can generate the rules with stronger generalized abilities, which can predict the unseen objects with stronger ability of tolerating noise. Moreover, when dealing with incomplete and chaotic data, the dynamic reduction performs much better[10]. There are also many dynamic reduct applications of vast use[11, 12].
The reduct core plays a vital part in rough set theory, for the reduction core implies the potential meaning of a given decision table, so we have also made several achievements on dynamic core and proved its properties. Though there exist a number of algorithms working on dynamic reduction, there is little literature on sampling process, for the researchers believe that random sampling in the decision table will make the sample set representative. In this paper, we first point out the flaws in former sampling method. And then we present a new sampling algorithm that can also ascertain the size of sampling set. Furthermore, in order to cut down the time consumed in dynamic reduction calculating, we propose a quick reduction algorithm that reduces the computational complexity.

2 Dynamic Core

The idea of dynamic reduction is carried out on the basis of rough set theory. It has an advantage over static methods especially in massive and chaotic data systems, because dynamic reduct set contains the most stable reducts in the process of random sampling of the original decision table.
Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. By P(S) we denote the set of all the subtables of S. FP(S), then by DR(S,F) we denote the set
DR(S,F)=RED(S,d)BFRED(B,d),
(1)
any element of DR(S,F) is called an F-dynamic reduct of S.
The definition of F-dynamic reduct can be generalized into (Fλ)-dynamic reduct, which is defined by
DRλ(A,F)={RRED(S):card({BF:RRED(B)})card(F)λ}.
(2)
Let S=(U,CD) be a decision table, by GDR(S,F) we denote the set:
GDR(S,F)=BFRED(B,d),
(3)
and then by GDRλ(S,F) we denote the set
GDRλ(S,F)={RC:card({BF:RRED(B)})card(F)λ}.
(4)
Any element of GDRλ(S,F) is called an (Fλ)-generalized dynamic reduct of the decision table S. And the number card({BF:RRED(B)})card(F) is called the stability coefficient. GDRλ(S,F) is a term frequently used in dynamic reduct theory, so it has a short term called generalized reduct. Similarly, we can also define the terms of core in dynamic reduction theory just like it in static rough set theory.
If CRED(B,d)(for any BF) then the number:
SCS(F,R)=card({BF:RRED(B)})card(F)
(5)
is denoted as the stability coefficient of the generalized-dynamic reduct C (relative F).
Attribute reduction is one of the most key studies in rough set theory, and the calculation of attribute core has a fairly significance of computing the reduct set. The elements of attribute core set exist in every single attribute reduct, otherwise the indiscernible relation of the information system cannot be guaranteed. There are a number of heuristic algorithms taking advantage of this feature to construct reduct set from the attribute core[13], in which way less time will be consumed and thus the efficiency will be improved. We take the precision coefficient into account to elaborate the features of dynamic core, which completes the theoretical system. It lays a foundation to the deeper discussion of dynamic reduct and dynamic core.
A great deal of literature gives a thorough discussion of static attribute core of traditional static reduct set of the decision table, which is quite clear, showing the core set consists the most stable attributes in the information system. If any element that belongs to the core part is eliminated from the attribute set, the indiscernible relation will be changed[14].
The research on dynamic core is still worth the discussion. Due to the limitations of the sample, the attribute core of an information system can be unstable and sensitive to the random changes of the objects. The dynamic reduct set consists of, to some extent, the most stable reducts. So it is necessary to seek for the dynamic core set to obtain those reducts with high reliability, which possess better generalization and robustness ability.
Let S=(U,CD) be a decision table. We denote the set
CORE(S)=RRED(S)R
(6)
as the reduct core set of S.
The reduct core set of any subtable BS is denoted as
CORE(B)=RRED(B)R.
(7)
By DCORE(S,F) we denote the set
DCORE(S,F)=CORE(S)BFCORE(B)
(8)
as the set of F-dynamic core of S (F relative).
Theorem 1 Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. FP(S), then for a given decision table, the Union of F-dynamic reduct (s) includes its F-dynamic core.
Proof.
From Formula (1), we can know that
DR(S,F)=RED(S)BFRED(B),
thus, DR(S,F)RED(S). For any DR(S,F)RED(S),
RDR(S,F)RRRED(S)R,RDR(S,F)RRRED(B)R,
which can also be denoted as
RDR(S,F)RCORE(S),RDR(S,F)RCORE(B).
Then,
RDR(S,F)RDCORE(S,F).
(9)
From Theorem 1, we can notice that F-dynamic core exits in all F-dynamic reduct (s). As the most stable core in a decision table, the dynamic core possesses the features of the core and can be used as the basis to compute the reduct set heuristically. The definition of F-dynamic core can sometimes be too strict, so in order to apply it into noise data processing, we need a more general notion of generalized dynamic core. It's called (Fλ)-dynamic core. λ(0.5,1]. The definition is as followed.
Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. FP(S),
DCOREλ(S,F)={aCORE(S)|BF:aCORE(B)|F|λ}.
(10)
FP(S), λ(0.5,1]. RED(S) denotes the set that contains all the reducts of decision table S. DCORE(S,F) and DCOREλ(S,F) denote the F-dynamic core and the (Fλ)-dynamic core respectively. Apparently, the following properties stand.
1) If F={S}, then DCORE(S,F)=CORE(S);
2) DCORE1(S,F)=CORE(S);
3) If λ1λ2, then DCOREλ1(S,F)DCOREλ2(S,F).
Theorem 2 Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. FP(S). For a given decision table, the Union of (Fλ)-dynamic reduct (s) includes its (Fλ)-dynamic core.
Proof. From Formula (8), we can know that
CORE(S)DCOREλ(S,F).
And from Formula (2), we can notice that
DRλ(A,F)={RRED(S):card({BF:RRED(B)})card(F)λ}.
Apparently, DRλ(S,F)RED(S), then,
RDRλ(S,F)RRRED(S)R=CORE(S).
So,
RDRλ(S,F)RDCOREλ(S,F).
(11)

3 Generalized Dynamic Core

The definition of F-dynamic core can sometimes be too strict, for it requires the F-dynamic core must be included in the core of decision table S, so we also apply a generalization of F-dynamic core, which is called the F-generalized dynamic core, denoted as GDCORE(S,F).
GDCORE(S,F)=BFCORE(B).
(12)
Similarly, we proposed with the definition of (Fλ)-generalized dynamic core defined as follows
GDCOREλ(S,F)={aC||BF:aCORE(B)||F|λ}.
(13)
Similarly, it has a short term of generalized reduct core.
From Theorem 1, Theorem 2 and Formula (13), the following equations can be easily proved.
Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. FP(S), λ(0.5,1],
1) DCORE(S,F)GDCORE(S,F);
2) DCOREλ(S,F)GDCOREλ(S,F);
3) If SF, then DCORE(S,F)=GDCORE(S,F).
Theorem 3 Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. FP(S), λ(0.5,1]:
The Union of F-generalized dynamic reduct (s) includes its F-generalized dynamic core (s).
The Union of (Fλ)-generalized dynamic reduct includes its (Fλ)-generalized dynamic core.
Proof. From Formula (3), we can know that
GDR(S,F)=BFRED(B).
And for any BF,GDR(S,F)RED(B),
RGDR(S,F)RRRED(B)R,RGDR(S,F)RCORE(B)BFCORE(B).
Then
RGDR(S,F)RGDCORE(S,F).
(14)
2) Based on the definition, we can know that for any attribute aGDCORE, aGDCOREλ(S,F), they all fulfills the condition: |BF:aCORE(B)||F|λ, suppose a exists in k subtables, that is k=|{BF}:aCORE(B)|, then, aCORE(B1),aCORE(B2),, aCORE(Bk), so,
aRRED(B1)R,aRRED(B2)R,,aRRED(Bk)R.
From the definition of (Fλ)-generalized dynamic reduct, we can know that
GDRλ(S,F)={RC:|{BF:RRED(B)}||F|λ}aRGDRλ(S,F)R.
Since λ>0.5, for any BGDRλ(S,F),
RRED(B1)RED(B2)RED(Bk).
According to Formula (1), so
RGDRλ(S,F)RGDCOREλ(S,F).
(15)
Let S=(U,CD) be a decision table, FP(S),BF, the properties below can be concluded.
1) DCORE(S,F)CORE(S);
2) DCOREλ(S,F)CORE(S);
3) GDCORE(S,F)CORE(S);
4) GDCOREλ(S,F)CORE(S);
5) DCORE(S,F)GDCORE(S,F)CORE(S).
By comparing dynamic core with static core, we can observe that the dynamic core is included by the static core, which indicates that the dynamic core is more stable and it depicts the information system.

4 Statistical Sampling Analysis

There are two methods proposed formerly to calculate the size of F: One is based on the trace of subtables and the other is based on the probability of each subtable, both of which have the problem of being too subjective. In order to overcome this setback, Bazan came up with a new algorithm to calculate |F| when the probability of R existing in each subtable is unknown. First, the family of all subtables of the universal decision table is denoted by G, and let the probability of R in each subtable G be PG(R). But actually it is impossible to obtain PG(R), so the stability coefficient of the generalized dynamic reduct R of the decision table S is taken as the maximum likelihood estimator of the probability PG(R)[15].
From the Moivre-Laplace theorem, we know that
MLE(PG(R))PG(R)MLE(PG(R))(1MLE(PG(R)))card(F)
(16)
has approximately a standard normal distribution. Hence,
P[tα/2<MLE(PG(R))PG(R)MLE(PG(R))(1MLE(PG(R)))card(F)<tα/2]=1α.
(17)
If ΔMLE(PG) is a maximal acceptable estimation error of MLE(PG), so
card(F)t2α/2PG(R)(1PG(R))(ΔMLE(PG(R)))2.
(18)
It's apparent to know that if PG(R)=0.5, the value of PG(R)(1PG(R)) takes the maximum value of 0.25. So,
card(F)t2α/24(ΔMLE(PG(R)))2.
(19)
But just take a look at this formula, we can observe that, for example, the value of t2α/2PG(R)(1PG(R))(ΔMLE(PG(R)))2 is same for PG(R)=0.1 and PG(R)=0.9, which is not reasonable. Because by looking at the practical meaning of PG(R), we can observe that PG(R) is the possibility of the reduct R existing in any subtable, which means we want to obtain those R's with as large as possible. When the value of the PG(R) changes, the value of |F| should change accordingly, instead of being constant.
We propose a new method for computing the size of subtable family |F|. First, a new method for evaluating the ability of similarity between the subtable and the original decision table is given, based on which the measurement of the stability of family F is defined. Then the new algorithm of confirming the size of |F| is elaborated. Several parameters are given to evaluate the property of F family.
The stability of dynamic reduct set shows how effective the sample set is, and whether it has the same representativeness as the original decision table. From RS theory, we can withdraw that the relative positive region represents the classification ability of a given decision table, so we can compare the positive region of both the subtable and the original decision table to see if they are similar, and eventually, we are able to evaluate the ability of the whole sample set.
Let S=(U,CD) be a decision table, the division of U (relative D and C respectively) be U/D={Y1,Y2,,Yl} and U/C={Y1,Y2,,Yk}, l,k|U|,
POS(U,C,D)={XU/C|YU/DXY}.
(20)
Let S=(U,CD) be a decision table, FP(S), S=(US,CD) is the Full Sample set where the set is defined as US={UB|B=(UB,CD)F}, then the parameter which evaluates the similarity of the subtable B compared with decision table S is defined as followed
β=|POS(UB,C,D)||POS(US,C,D)|, where|POS(US,C,D)|0.
(21)
Considering that the cardinal numbers of B and S are different, the size of the universe has to be taken into account when computing the parameter of sets with different sizes. So the formula can be transformed into: When
|POS(UB,C,D)||UB||POS(US,C,D)||US|.
(22)
It can be withdrawn that both the decision table B and S have the same representativeness. When UB0US0. It can be transformed into
β|UB||US|.
(23)
Let S=(U,CD) be a decision table, FP(S),S=(US,CD),BF,UBU,USU, the relative stability parameter of sample family F is
SCSPOS(F,B)=|{BF:βUBUS}||F|.
(24)
It can be easily observed that the following properties are tenable:
1) 0SCSPOS(F,B)1;
2) Apparently, when 0SCSPOS(F,B)0.5, the present sample set is not proper so we need to apply the sample set which meet the quality of 0.5SCSPOS(F,B)1.
The algorithm of confirming the size of F is listed as followed, based on the concepts presented formerly:
Algorithm 1: Quick sampling algorithm in dynamic reduction
Input
1) The original decision table S=(U,CD);
2) The threshold of SCSPOS(F,B), noted as α, e.g., α=0.9.
Output
1) The value of |F|;
2) All the subtables in F family.
Step 1 Extract a sample set from U randomly, noted as F;
Step 2 Calculate SCSPOS(F,B);
Step 3 Compare SCSPOS(F,B) with α, if SCSPOS(F,B)>α, go to Step 7, otherwise, go to Step 4;
Step 4 Randomly choose an integer from 1 to |U|, as the number of new subtable samples distracted from the original decision table. It has to meet the demand that the new subtables must be different from the objects in F;
Step 5 Add the new subtables extracted from Step 4 to F;
Step 6 Execute Step 2;
Step 7 Output |F| and all the subtables.
Generally speaking, the more subtables the F family contains, the more stable the outcome reduct set will be. But due to the Moivre-Laplace theorem, when the size of family reaches to a threshold, the influence of continuing extracting the subtable will be quite tiny.
However, there is another aspect that we must emphasize: The quality of the sample set. The distribution of the sample set has a great effect on the result of the reduction. We carry out a few rates to evaluate the quality of a calculated sample set.
Let S=(U,CD) be a decision table, FP(S), S=(US,CD) is the Full Sample set where the set is defined as US={UB|B=(UB,CD)F}. The Central Sample decision table is defined as SCF=(UCF,CD), where UCF={UB|B=(UB,CD)F}, namely the central sample set.
1) Entire Sampling Rate res=|US||U|.
The value of res can intuitively reflect how many objects have been taken into account to the final decision. When res is too low, the given sample set is relatively centralized and not appropriate to present the given decision table. Clearly, 0res1.
2) Entire equilibrium rate ree=|UCF||US|.
The value of ree represents the overlapping coefficient of the sample set. If it is too big, it means there are too many repeated objects that has been extracted during the sampling process, which will reduce the efficiency of the result. Clearly, 0ree1.
3) Local sampling rate rls=|UB||US|.
The value of rls describes how much each subtable takes part in the whole sampling set. And when some subtables have relatively high value of their rls, the sample set will not be proper due to the high proportion of local repetition even though the res is acceptable. Clearly, 0rls1.
4) Local equilibrium rate rle=|UCF||UB|.
rle is the parameter that shows the containing degree of UCF in UB. When there is a fixed number of subtables having over-high value of their rle, this sampling set is not efficient even though the Entire Equilibrium Rate meets the demand. Clearly, 0rle1.

5 The Quick Generalized Dynamic Reduction Algorithm

Traditional reduction algorithms require to calculate all the reducts of the subtables in family F, and then compare the stability coefficient of each reduct with the threshold to see if it meets the demand to be a real dynamic reduct. However, calculating in this way is both time and space consuming. Based on our research, it is not necessary to calculate all the reducts, which means that we can obtain the dynamic reducts by only calculating part of the reducts of the subtables in family F. The definition of generalized dynamic reduct is as followed:
Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. FP(S), λ(0.5,1] then by GDRλ(S,F) we denote the set
GDRλ(S,F)={RC:card({BF:RRED(B)})card(F)λ}
(25)
as the (Fλ)-generalized dynamic reduct of S.
From the definition presented above, we can observe that a reduct R of some subtable is a dynamic reduct if its existing rate of being a reduct of the subtables in family F is no smaller than the threshold λ. On the other hand, it is much easier to judge whether a given reduct R is the reduct of a decision table rather than to calculate the reduct of it. We only need to count the frequency and compare it with the threshold λ.
Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. Set F1, set F2 and set F3 meet the following requirements: F1P(S),F2P(S),F3P(S),F1F2=,F1F2=F. Set F1 contains the subtables of which the reducts have to be calculated. F3 is the set consisting of the union of F1 and the subtables in F2 that have already been judged. The initial status is F1=F3. Under the circumstance that the size of family F is already known, the key problem is to ascertain the cardinality of both F1 and F2.
F1=F|F|λ+1,F2=FF1=|F|λ1,F1F3,F2F3   (except initial status).
It's obvious that the reducts that don't belong to the family F can't be dynamic reducts. When λ(0.5,1], the inequality |F1|<|F2| establishes. So we only need to calculate a smaller part of the subtables to obtain the dynamic reducts.
It's easy to prove that if a reduct R is a generalized dynamic reduct of decision table S=(U,CD), then R is also a reduct of at least one decision table in family F1.
Proof. NUM(F,R) means the number of times R existing in the reducts of Fs members. Note GDR as the set of (Fλ)-generalized dynamic reducts. If any reduct in GDRλ(S,F) set isn't a member to F1 family, then
SC(GDR)=NUM(F,GDR)/|F|=NUM(F2,GDR)/|F||F2|/|F|=|F|λ1/|F|<|F|λ/F=λ.
The stability coefficient is smaller than the threshold λ, which is not possible.
To estimate if a given attribute set R is a set of generalized dynamic reduct, we have to set some measurements to see if this reduct R is effective. To level up the estimating efficiency, we can take advantage of existing information in set F3 to estimate the most possible frequency rate of R existing in F, and compare the rate with the stability threshold. If the rate is smaller than the threshold, this reduct R is definitely not a generalized dynamic reduct; if not, we need take further measurements to make the final decision.
Let S=(U,CD) be a decision table, U be the universe, D be its decision attribute. Set F1, set F2 and set F3 meet the following requirements: F1ρ(S),F2ρ(S),F3ρ(S),F1F2=ϕ,F1F2=F. There are three ratios: Present ratio ηp, optimal ratio ηo, and the real ratio ηr defined respectively below:
ηp=NUM(F3,R)/F,ηo=[NUM(F3,R)+(|F||F3)]/|F|,ηr=[NUM(F1,R)+NUM(F2,R)]/|F|.
Obviously, if the optimal ratio ηo of a reduct R is smaller than the given stability coefficient, so it can't be a generalized dynamic reduct. Based on the analysis listed above, we put up with a quick algorithm of calculating a set of generalized dynamic reducts.
Algorithm 2: Quick generalized dynamic reduction algorithm
Input S=(U,CD), λ, |F|.
Output (Fλ)-generalized dynamic reduct set of S.
Step 1 Calculate the cardinality of F1 and F2, namely |F1| and |F2|, and extract |F| subtables from S to form the original F family;
Step 2 Take |F1| objects from F family to form F1 set, and the rest form F2 set;
Step 3 Calculate the reducts of the tables in F1 set, and put these reducts into candidate set. The cardinality of the candidate set is K, i=0;
Step 4 Take the reduct R from the candidate set one by one. F3=F1, NUM(F3,R)=NUM(F1,R),i++,j=0;
Step 5 Take a table B from F2, j++, F3=F3B;
Step 6 If RRED(B): NUM(F3,R)++, calculate ηp and ηo. If j<|F2|, go to Step 5; otherwise, go to Step 8. If RRED(B), go to Step 7;
Step 7 If ηo<λ: go to Step 4; otherwise, go to Step 5;
Step 8 ηr=ηp. If ηrλ: add R into the generalized dynamic reduct set. If i<K, go to Step 3;
Step 9 Output the generalized dynamic reduct set.
Based on research, for a given decision table: S=(U,CD),m=card(C), the time complexity of calculating all the reduct is O(2m|m||U|2), and the time complexity of judging whether a given attribute set is a reduct to this decision table is O(|U|2). So for traditional reduct algorithms, the time complexity of calculating dynamic reducts in F2 family set will be O(2m|m|i=1|F2||Bi|2), and the time complexity correspondingly for the quick algorithm will be O(Ki|F2||U|2) (K is the cardinality of the reduct set of F1 family).

6 Experimental Analyses

We selected several information tables in UCI data base and then compare the results between former algorithm on the ROSETTA software, denoted as R, while our method is denoted as A. The results are presented in Table 1.
Table 1 Comparison of experimental results between Algorithm and ROSETTA
Database O(n) A(n) |F| |RED| LEN(n)¯ N(r) LEN(r)¯y
R A R A R A R A R A
Auto original 405 9 50 29 26 25 3.385 3.331 10436 9775 3385 3211
Breast cancer 197 11 50 24 141 130 3.262 3.002 15003 12332 3249 3012
CPU performance machine 208 10 50 27 136 112 4.404 4.102 27127 18004 4399 4013
Echo cardiogram 63 13 50 12 74 52 2.338 2.043 3787 3214 2381 2011
Flag 24 11 50 12 181 152 3.945 3.233 3153 2756 4083 3356
Glass 213 11 50 26 35 20 2.286 2.280 7222 5801 2288 2286
Servo 166 5 50 23 1 1 4.000 4.000 166 166 4000 4000
Wine 177 14 50 20 178 89 2.646 2.444 31282 28046 2648 2454
Zoo 100 18 50 18 1065 813 6.255 6.101 25625 24610 6424 6121
*we applied the trace method for sampling in R;
O(n): The number of objects;
A(n): The number of attributes;
LEN(n)¯: The average length of reducts;
N(r): The number of rules;
LEN(n)¯: The average length of rules.
It's clearly that the size of sample set of our method is apparently smaller than the ROSETTA method. Besides, by comparing all the other parameters in the table, our method is slightly better than the ROSETTA method. In conclusion, our method is able to obtain a better result with a smaller sample set, which can prove that our method is efficient.

7 Conclusion

Dynamic core is the most stable set of a given information system, and it can't be eliminated from any reduct set. Dynamic core reflects the essential characteristics of dynamic reduction. We have put up with several theorems to prove that the dynamic core is contained in the union of dynamic reducts, as well as the static core set, which means that the dynamic core plays a vital role in dynamic reduct computation. A new systematic method of dynamic reduction is carried out from sampling process to reduct computation period. By the experiment, it can be concluded that the new methods of sampling and quick algorithm have a better performance over the traditional rough set methods of reduction, which avoid the sample set being too subjective and decrease the computing time. The results turned out to be efficient.

References

1
Pawlak Z. Rough sets. International Journal of Information and Computer Sciences, 1982.
2
Tsumoto S. Medical Diagnosis: Rough Set View. 2017.
3
Aličković E, Subasi A. Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Computing & Applications, 2017, 28(4): 753- 763.
4
Peng X, Wen J, Li Z, et al. Rough set theory applied to pattern recognition of partial discharge in noise affected cable data. IEEE Transactions on Dielectrics & Electrical Insulation, 2017, 24(1): 147- 156.
5
Lupyan M, Kuzminov K. Analysing possible applications for available mathematical models of tracked vehicle movement over the rough terrain to examine tracked chain dynamic processes. Science & Education of Bauman MSTU, 2014, 14(11): 110- 117.
6
Pawlakab Z. Rough set approach to knowledge-based decision support. European Journal of Operational Research, 1995, 99(1): 48- 57.
7
Bazan J G. A comparison of dynamic and nondynamic rough set methods for extracting laws from decision tables. Rough Sets in Knowledge Discovery, Heidelberg: Physica-Verlag, 1998, 321- 365.
8
Skowron A, Rauszer C. The discernibility matrices and functions in information systems. Theory & Decision Library, 1992, 11, 331- 362.
9
Bazan J G, Skowron A, Synak P. Dynamic reducts as a tool for extracting laws from decisions tables//International Symposium on Methodologies for Intelligent Systems. Springer-Verlag, 1994: 346-355.
10
Shu W, Qian W. An incremental approach to attribute reduction from dynamic incomplete decision systems in rough set theory. Data & Knowledge Engineering, 2015, 100, 116- 132.
11
Mukamakuza C P, Wang J, Li L. Dynamic reducts computation analysis based on rough sets//International Conference on Natural Computation. IEEE, 2014: 480-485.
12
Guo Y, Jiao L, Wang S, et al. A novel dynamic rough subspace based selective ensemble. Pattern Recognition, 2014, 48(5): 1638- 1652.
13
Trabelsi S, Elouedi Z, Lingras P. Belief rough set classification for web mining based on dynamic core//The Tenth International Conference on Intelligent Systems Design and Applications ISDA. 2010: 403-408.
14
Wang G Y. Attribute core of decision table. Lecture Notes in Computer Science, 2002, 2475, 213- 217.
15
Rice J A. Les informations contenues dans cette page sont à usage strict de et ne doivent être utilisées ou copiées par un tiers. Mathematical Statistics and Data Analysis, Mathematical Gazette, 1988, 72(72): 390- 391.

Acknowledgements

The authors gratefully acknowledge the editor and two anonymous referees for their insightful comments and helpful suggestions that led to a marked improvement of the article.

Funding

the National Natural Science Foundation of China(61772031)
Funds for Energy Conservation of Changsha
the Fundamental Research Funds for Central Universities of Central South University(2017zzts514)
PDF(144 KB)

133

Accesses

0

Citation

Detail

Sections
Recommended

/