CN102880603A - Method and equipment for filtering ranking list data - Google Patents

Method and equipment for filtering ranking list data Download PDF

Info

Publication number
CN102880603A
CN102880603A CN2011101925152A CN201110192515A CN102880603A CN 102880603 A CN102880603 A CN 102880603A CN 2011101925152 A CN2011101925152 A CN 2011101925152A CN 201110192515 A CN201110192515 A CN 201110192515A CN 102880603 A CN102880603 A CN 102880603A
Authority
CN
China
Prior art keywords
data
day
probability
brothers
seniority
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101925152A
Other languages
Chinese (zh)
Inventor
陈欢
罗佳佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN2011101925152A priority Critical patent/CN102880603A/en
Publication of CN102880603A publication Critical patent/CN102880603A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the invention discloses a method for filtering ranking list data. The method comprises the following steps that: data filtering equipment acquires original data before a ranking date in a preset time period from data storage equipment; the data filtering equipment performs interpolation processing on the original data to obtain historical data according to the defect degree of the original data; the data filtering equipment calculates according to the historical data and the ranking date data and determines the list entrance probability of the ranking date data; and the data filtering equipment filters the ranking date data when the list entrance probability of the ranking date data is smaller than or equal to a probability threshold value. According to the technical scheme provided by the embodiment, a method for calculating the list entrance probability whether the ranking date data can enter the ranking list by taking the historical data as a basis and determining whether the ranking date data is filtered according to the list entrance probability is provided, and the ranking list data filtering accuracy is improved.

Description

A kind of method and apparatus of ranking list data filtering
Technical field
The application relates to field of computer technology, particularly a kind of method and apparatus of ranking list data filtering.
Background technology
Ranking list is goed deep into everybody life as a very concerned things of modern society.For example, the electronic business transaction website can provide the sales volume ranking list of various product to supply with consumer's (for convenience of description, with the consumer referred to as the buyer) or businessman (for convenience of description, with businessman referred to as the seller) reference.
In the prior art, ranking list is normally by sorting obtains to all raw data (for example Sales Volume of Commodity ranking list, its raw data is the sales volume of extensive stock).But the efficient that this mode causes calculating ranking list is very low, especially in the very large situation of original data volume, adopts this mode to be difficult to obtain rapidly ranking list.
In order to overcome the problems referred to above, a kind of ranking list data filtering method is provided in the prior art, the method is hereinafter referred to as the threshold filtering method.The threshold filtering method presets threshold value, only the data greater than threshold value is sorted, and filters out the following data of threshold value.This mode can reduce the data volume that participates in ordering, and still, there is irrational situation in the setting of threshold value, and the data filtering that originally should not be filtered is fallen.For example, for the sales volume ranking list, suppose that threshold value is 5, then sales volume should be fallen at the data filtering below 5 that still, the overall sales volume of some commodity itself is just very low, even this is just so that sales volume still may advance list at the commodity below 5.Therefore, existing threshold filtering method can't be filtered raw data accurately.
Summary of the invention
The embodiment of the present application provides a kind of method and apparatus of ranking list data filtering, the problem that the ranking list data filtering method that passes through simple setting sales volume threshold value that solution exists in the prior art can't filter raw data is accurately filtered raw data accurately.
For achieving the above object, the embodiment of the present application provides a kind of ranking list data filtering method on the one hand, comprising:
Data filtering equipment obtains the raw data in the Preset Time section before the seniority among brothers and sisters day from data storage device;
Data filtering equipment is processed described raw data value of benefit according to the defect degree of described raw data and is obtained historical data;
Data filtering equipment according to described historical data and described seniority among brothers and sisters day data calculate, that determines seniority among brothers and sisters day data advances the list probability;
Data filtering equipment filters described seniority among brothers and sisters day data when the advancing the list probability and be less than or equal to probability threshold value of seniority among brothers and sisters day data.
On the other hand, the embodiment of the present application also provides a kind of data filtering equipment, comprising:
Acquiring unit is used for obtaining the raw data in the Preset Time section before the seniority among brothers and sisters day from data storage device;
Processing unit obtains historical data for according to the defect degree of described raw data described raw data being processed;
Computing unit, be used for according to described historical data and described seniority among brothers and sisters day data calculate, that determines seniority among brothers and sisters day data advances the list probability;
Filter element is used for when the advancing the list probability and be less than or equal to probability threshold value of seniority among brothers and sisters day data, filters described seniority among brothers and sisters day data.
Compared with prior art, the embodiment of the present application has the following advantages:
Enter the comparison between the historical data of the probability of ranking list and seniority among brothers and sisters day data and objects of statistics according to historical data, that calculates seniority among brothers and sisters day data advances the list probability, if advance the list probability greater than probability threshold value, then do not filter seniority among brothers and sisters day data, if advance the list probability less than probability threshold value, then filter seniority among brothers and sisters day data.The ranking list data filtering method that provides by the embodiment of the present application, provide a kind of take historical data as advancing the list probability according to what calculate that can seniority among brothers and sisters day data enter ranking list, advance the list probability according to this and whether determine whether to filter seniority among brothers and sisters day data greater than the probability threshold value of a certain setting, improved the accuracy of ranking list data filtering.
Description of drawings
Fig. 1 is the schematic flow sheet of a kind of ranking list data filtering method of the embodiment of the present application proposition;
Fig. 2 is the schematic flow sheet of a kind of damaged data benefit value method of the embodiment of the present application proposition;
Fig. 3 is the structural representation of a kind of data filtering equipment of the embodiment of the present application proposition.
Embodiment
As stated in the Background Art, ranking list data filtering method of the prior art is by setting threshold, and filters numerical value and carry out the filtration of ranking list raw data less than the data of threshold value, and this method can't be filtered raw data accurately.
In order to address the deficiencies of the prior art, the application has proposed a kind of ranking list data filtering method.When the data of current participation seniority among brothers and sisters were filtered, the raw data before adopting in the certain hour section determined whether the data of current participation seniority among brothers and sisters need to filter as a reference.The value of this time period needs pre-configured, can obtain for empirical value or according to preset strategy.
Below as an example of the sales volume ranking list example the ranking list data filtering method that provides of explanation the embodiment of the present application, with commodity as objects of statistics, need explanation, the ranking list data filtering method that the application provides can also be applicable to the ranking list of other types, such as popularity ranking list etc., and objects of statistics corresponding to the ranking list that is applicable to other types, such as popularity index etc.
As shown in Figure 1, the method may further comprise the steps:
Step S101, data filtering equipment obtains raw data in the predefined time period from data storage device.
Data filtering equipment directly obtains the sales volume data of different types of commodity from the data storage device that stores the data that need filtration, data filtering equipment carries out record with the data that get access in the mode that is suitable for filtering, in the present embodiment this mode can but be not limited to matrix-style, for example can also be the array form.Send in the data acquisition request of data storage device at data filtering equipment, can carry commodity number and the time period of the data of acquisition request.
For convenience, below take the form of raw data as matrix form as example, as follows:
T = ( T 1 , T 2 , . . . . . . , T m ) ' = t 11 t 12 · · · t 1 j · · · t 1 n t 21 t 22 · · · t 2 j · · · t 2 n . . . . . . . . . . . . t i 1 t i 2 · · · t ij · · · t in . . . . . . . . . . . . t m 1 t m 2 · · · t mj · · · t mn ,
Wherein, T represents total commodity data.T iThe data that represent i commodity, and i=(1 ..., m), m is the commodity number in the raw data, the value of this commodity number is pre-configured, can obtain for empirical value or according to preset strategy.T i=(t I1, t I2..., t In), n is the time period.P j=(t 1j, t 2j..., t Nj) expression j the day of trade data, j=(1 ..., n).t IjThe data that represent j the day of trade of i commodity, in the embodiment of the present application, t IjThe sales volumes of commodity in certain day of trade.In actual applications, for example in the popularity ranking list, t IjIt can be the evaluating deg of commodity.
The embodiment of the invention describes with the sales volume data instance of m kind commodity in n days, and the filter method of the embodiment of the invention also can be applicable to the filtration with other forms of 2-D data, and other data that can be transformed to 2-D data.
Step S102, data filtering equipment is judged the defect degree of raw data, whether needs carry out data filtering to determine raw data according to the defect degree of raw data, if do not need data filtering, finish; If need to carry out data filtering, execution in step S103;
Step S103 judges whether raw data needs to carry out damaged data benefit value, if need data benefit value, execution in step S104, otherwise, directly with raw data as historical data, execution in step S105.
The judgment mode shown in step S102 and S103 of enumerating except the embodiment of the invention, whether needs carry out damaged data benefit value also can to determine first raw data according to the defect degree of raw data, if need data benefit value, execution in step S104, otherwise, continue need to judge whether data filtering, if do not need data filtering, finish; If need to carry out data filtering, execution in step S105.
Raw data may exist damaged.For a kind of commodity, when commodity in some time not on frame the time, corresponding data can be damaged in this time, for example, when a commodity among the step S101 when b the day of trade is not on frame, data t then AbDamaged.One damaged threshold value is set in the embodiment of the present application, weighs the data defect degree of commodity.When damaged data surpass damaged threshold value, raw data is not carried out damaged data benefit value, do not carry out data filtering yet.When damaged data do not surpass damaged threshold value, need to judge whether raw data is carried out damaged data benefit value.Damaged threshold value can be damaged data number percent, also can be concrete fate, for example when the time period is 10 days, can set damaged threshold value is 5 days, if data are imperfect, and damaged data were above 5 days, then do not carry out damaged data benefit value, do not carry out data filtering yet.
Judge whether raw data needs to carry out damaged data benefit value and specifically comprise:
If the data within the time period are complete, then do not need to carry out damaged data benefit value; If the data within the time period are imperfect, and damaged data are lower than damaged threshold value, then carry out damaged data benefit value.
Carrying out respectively above-mentioned damaged benefit value for each commodity in the embodiment of the present application processes.
Step S104, the damaged data of data filtering equipment utilization entropy rationale Raw Data Simulation obtain damaged value complement with simulation and are charged into raw data, obtain historical data.
Damaged data produce serious influence to the data filtering in later stage.Consider that commodity do not represent sales volume on frame be 0, therefore with directly larger on the accuracy impact of rank results with the damaged data of 0 expression, a kind of correct thinking should be to simulate these commodity in the day trade transaction amount of frame.In information theory, information entropy is the tolerance that an information source is sent the contained quantity of information of a certain message, and more definite when the message that a certain information source is sent, the information entropy of this information source is just less.It is the tolerance of the unordered degree of system or confusion degree, has represented the average uncertainty of system.And Information Entropy is a kind of a kind of method of determining weight coefficient by the size of information that attribute value provides.It is strong that it has objectivity, the evaluation procedure transparency and the good characteristics of reproducibility.For example, for the attribute j that determines, the difference between j attribute of each data is larger, illustrates that then the relativity of this index is just larger, and namely its quantity of information is just larger, and entropy is less.
According to the entropy theory damaged raw data is simulated in the embodiment of the present application.
As shown in Figure 2, suppose the data t of b the day of trade of a commodity in the raw data AbDamaged, with simulation t AbDamaged value is that example is described as follows:
Step S1041, according to the non-damaged data of a commodity in the raw data, the entropy I of each day of trade that computational data is not damaged.
Suppose that arbitrary day of trade is f, the value of f is the not damaged days of trade of data, and then the entropy of f the day of trade obtains by following formula:
I f=-k·p f·ln(p f)
Wherein, k=1/ln (n), p f = d f Σ i = 1 n d i ,
Wherein,
Figure BSA00000534884500062
Represent f the day of trade data with damaged b the day of trade data distance, embodied both correlativity.t AbDamaged, so i is not equal to a when calculating.
Step S1042 according to the entropy of f the day of trade, calculates the coefficient of variation of f the day of trade.
The coefficient of variation r of f the day of trade f=1-I f, f=1,2 ..., n,
Coefficient of variation is the amount of response data effect size, and its value is larger, and the effect of the data volume of f the day of trade is larger, and vice versa.
Step S1043 according to the coefficient of variation of f the day of trade, calculates the weight coefficient of f the day of trade.
w f = r f Σ i = 1 m r i , f = 1,2 , . . . , n
Step S1044 according to each data that calculate weight coefficient of the not damaged day of trade, calculates the damaged value t of b the day of trade Ab
t ab=w 1t a1+w 2t a2+...w (b-1)t a(b-1)+w (b+1)t a(b+1)+...+w nt an
Fill up damaged value t thereby reach AbEffect.
Above-mentionedly with a damaged data instance process of data benefit value has been described, marquis when the damaged a plurality of data of raw data and need to carry out data benefit value the time calculates respectively each damaged data according to noncolobomatous data in the raw data.
In the damaged value complement value method that the embodiment of the present application provides, considered in the raw data laterally and data dependence longitudinally that have good objectivity, the damaged value complementing method with respect to other has lower time complexity.
Step S105, according to historical data and seniority among brothers and sisters day data, that calculates commodity seniority among brothers and sisters day data advances the list probability.
Data filtering equipment replenishes the seniority among brothers and sisters data on day same day in historical data, i.e. seniority among brothers and sisters day data.
In the embodiment of the present application, that utilizes that Bayes (Bayes) model calculates seniority among brothers and sisters day data advances the list probability, and can its core be exactly to judge according to the sales volume on commodity seniority among brothers and sisters day same day that these commodity are current advance list under this sales volume, if can not then filter; Otherwise, then do not filter.That is, calculate can these commodity advance list the same day under this sales volume probability, if probability less than a certain specified value, then filters; Otherwise, then do not filter.In addition, advance the list probability and be one to be normalized to [0,1] interval interior value, the result that the model-naive Bayesian that provides by the embodiment of the present application draws, do not need to consider different merchandise classifications, the thought that does not namely adopt first classification to refilter is because can be by the historical data and seniority among brothers and sisters day data of these commodity of contrast in this method, thereby know that current seniority among brothers and sisters day data advance the probability of list, and the classificating thought of commodity is contained by this Idea of Probability in fact.
Concrete, step S105 comprises:
Step S1051 calculates the commodity historical data and enters the probability of ranking list, and the commodity historical data enters the probability of ranking list during less than seniority among brothers and sisters day data, and the commodity historical data is less than the probability of seniority among brothers and sisters day data.
When being the commodity historical data less than seniority among brothers and sisters day data, the probability that the commodity historical data enters ranking list during less than seniority among brothers and sisters day data enters the number of times of ranking list and the ratio of the number of times that commodity data enters ranking list, the probability that the commodity historical data enters ranking list is that commodity data enters the number of times of ranking list and the ratio of total degree in the time period, the number of times when the commodity historical data is the commodity historical data less than seniority among brothers and sisters day data less than the probability of seniority among brothers and sisters day data and the ratio of total degree in the time period.In the present embodiment, therefore ordering in a day once can represent that number of times calculates with fate.
Step S1052 enters the probability of ranking list according to the commodity historical data, and the commodity historical data enters the probability of ranking list during less than seniority among brothers and sisters day data, and the time period calculate commodity seniority among brothers and sisters day data advance the list probability.
The below describes the process of advancing the list probability of calculating commodity seniority among brothers and sisters day data among the step S105 with concrete example, can obtain partial data with seniority among brothers and sisters day data according to historical data to be:
X = ( X 1 , X 2 , . . . . . . , X m ) ' = t 11 t 12 · · · t 1 j · · · t 1 n x 11 t 21 t 22 · · · t 2 j · · · t 2 n x 21 . . . . . . . . . . . . . . . t i 1 t i 2 · · · t ij · · · t in x i 1 . . . . . . . . . . . . . . . t m 1 t m 2 · · · t mj · · · t mn x m 1
Wherein, x=(x 11, x 21..., x M1) ', being the seniority among brothers and sisters seniority among brothers and sisters day data on day same day, these data can be the trading volumes of commodity on the same day.Take i kind commodity as example, i kind commodity are X in the trading volume on the seniority among brothers and sisters day same day I1, the historical data of commodity is (t I1, t I2..., t In).U is for advancing the list data set in the order set, and V is not for advancing the list data set.Advance the list data set and do not advance the list data set and can carry out record by data server, when data filtering equipment obtains raw data from data server, can obtain simultaneously into the list data set and do not advance the list data set.
The probability that enters ranking list when (1), the commodity historical data is less than seniority among brothers and sisters day data is P (B|A):
Figure BSA00000534884500091
= count j = 1,2 , . . . , n ( t ij < x i 1 ) count j = 1,2 , . . . , n ( t ij ) , ( t ij &Element; U )
(2), to enter the probability of ranking list be P (A) to the commodity historical data:
Figure BSA00000534884500093
= count j = 1,2 , . . . , n ( t ij ) n , ( t ij &Element; U )
(3), the commodity historical data is less than the probability P (B) of seniority among brothers and sisters day data:
Figure BSA00000534884500095
= count j = 1,2 , . . . , n ( t ij &prime; < x i 1 ) n , ( t ij &prime; &Element; ( U &cup; V ) )
Wherein, the calculating in (1) (2) (3) is regardless of the front and back order.
(4), the list probability that advances of commodity seniority among brothers and sisters day data is P (A|B):
According to the result of calculation of (1) (2) (3), can obtain commodity the seniority among brothers and sisters day of seniority among brothers and sisters day data advance the list probability:
P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B )
P ( A | B = x i 1 ) = count j = 1,2 , . . . , n ( t ij < x i 1 ) count j = 1,2 , . . . , n ( t ij ) &CenterDot; count j = 1,2 , . . . , n ( t ij ) n count j = 1,2 , . . . , n ( t ij &prime; < x i 1 ) n
Wherein, (t Ij∈ U, t Ij' ∈ (U ∪ V)).
For step S105, when calculating, that can calculate respectively each commodity seniority among brothers and sisters day data advances the list probability, in the present embodiment, when calculating a kind of commodity, only need the historical data of these commodity to get final product, that can utilize also that data in the matrix calculate all commodity seniority among brothers and sisters day data simultaneously advances the list probability.
Step S106 compares advancing list probability and probability threshold value, judges whether to filter the seniority among brothers and sisters day data of commodity according to comparative result.Concrete, if advance the list probability greater than probability threshold value, then do not filter the seniority among brothers and sisters day data of commodity, if advance the list probability less than probability threshold value, then filter the seniority among brothers and sisters day data of commodity.
Before step S106, comprise, set probability threshold value.
Wherein, probability threshold value is one and is normalized to [0,1] interval interior value that probability threshold value is the empirical value that can obtain by the data results of reality.
Need explanation, the method for utilizing the entropy theory to carry out damaged data benefit value among the step S104 is preferred benefit value mode, also can mend the value-based algorithm value of benefit according to other, for example mends value-based algorithm based on the damaged data of broad sense mahalanobis distance.
In the embodiment of the present application, enter the probability of ranking list and the comparison between seniority among brothers and sisters day data and the commodity historical data according to the commodity historical data, that calculates commodity seniority among brothers and sisters day data advances the list probability, if advance the list probability greater than probability threshold value, then do not filter the seniority among brothers and sisters day data of commodity, if advance the list probability less than probability threshold value, then filter the seniority among brothers and sisters day data of commodity.The ranking list data filtering method that provides by the embodiment of the present application, provide a kind of take historical data as advancing the list probability according to what calculate that can seniority among brothers and sisters day data enter ranking list, advance the list probability according to this and whether determine whether to filter seniority among brothers and sisters day data greater than the probability threshold value of a certain setting, improved the accuracy of ranking list data filtering.
In the ranking list data filtering method that above-described embodiment provides, considered the relation between the same commodity difference day of trade data, drawn comparatively accurately filter result.But, in the same day of trade, also exist between the data of different commodity and influence each other, further embodiment of this invention provides a kind of ranking list data filtering method, take into account also existing to influence each other between the data of different commodity of the same day of trade, that is, the longitudinal data correlation information in the above-mentioned partial data is also taken into account.
In the present embodiment, step S201~step S204 is with step S101~step S104.
Step S205, according to partial data, calculate commodity seniority among brothers and sisters day data advance the list probability.
In the present embodiment, therefore one day seniority among brothers and sisters once represents that with fate number of times calculates.
Concrete, step S205 comprises:
Step S2051, the probability of valid data difference when the calculating commodity enter ranking list, the probability of the data difference when the calculating commodity enter ranking list calculates commodity data difference general probability.
Wherein, the data of last day are seniority among brothers and sisters day data in the partial data, and other data before the seniority among brothers and sisters day data are historical data.The probability of valid data difference was the cumulative ratio of data difference in valid data difference in the fate of commodity when entering ranking list the cumulative fate when entering ranking list with commodity when commodity entered ranking list, the probability of the data difference when commodity enter ranking list is the cumulative ratio of cumulative and commodity data difference within the time period of data difference in the fate of commodity when entering ranking list, and commodity data difference general probability is the cumulative ratio of cumulative and commodity data difference within the time period of commodity valid data difference within the time period.
Step S2052, the probability of valid data difference when entering ranking list according to commodity, the probability of the data difference when commodity enter ranking list, and commodity data difference general probability, calculate commodity seniority among brothers and sisters day data advance the list probability.
The below describes calculating the process of advancing list probability of commodity in the day of trade among the step S205 with concrete example, and partial data is:
X = ( X 1 , X 2 , . . . . . . , X m ) ' = t 11 t 12 &CenterDot; &CenterDot; &CenterDot; t 1 j &CenterDot; &CenterDot; &CenterDot; t 1 n x 11 t 21 t 22 &CenterDot; &CenterDot; &CenterDot; t 2 j &CenterDot; &CenterDot; &CenterDot; t 2 n x 21 . . . . . . . . . . . . . . . t i 1 t i 2 &CenterDot; &CenterDot; &CenterDot; t ij &CenterDot; &CenterDot; &CenterDot; t in x i 1 . . . . . . . . . . . . . . . t m 1 t m 2 &CenterDot; &CenterDot; &CenterDot; t mj &CenterDot; &CenterDot; &CenterDot; t mn x m 1
Wherein, x=(x 11, x 21..., x M1) ', be the seniority among brothers and sisters data on day same day.
In the data filtering method that present embodiment provides, introduced the concept of data indexes, that be used for to calculate seniority among brothers and sisters day data advances the list probability.With r IjRepresent i commodity at the data index of j the day of trade, in the present embodiment, r IjBe the sales volume index, get the data of the part commodity of j the day of trade and calculate r as data sample Ij, use r IjRepresent that i commodity are in the sales volume status of j the day of trade in this data sample sales situation.
For certain data t IjCorresponding r Ij, r Ij=rank (t Ij), (j ∈ S Ij), S IjBe data sample, can get t IjContiguous N sIndividual data are as data sample S Ij, i.e. r IjExpression t IjAt the contiguous N of j row sThe rank of individual data, same, r iBe seniority among brothers and sisters day data x I1N in the vicinity of ranking day same day sRank in the individual data, r i=rank (x), (j ∈ S i).
The big or small N of data sample sCan be before calculating rule of thumb value set, for example, calculating r IjThe time, the sample size that fetches data is 30, then gets t Ij30 contiguous data are calculated, and in addition, can get j the day of trade with t IjCentered by N sIndividual data also can be got from t as data sample IjThe N of beginning sIndividual data are as data sample, and namely these 30 data can be (t (i-14) j, t (i-13) j..., t (i+15) j) ', these 30 data also can be (t Ij, t (i+1) j..., t (i+29) j) '.
Setting by the data index, be equivalent to certain day of trade data the impact of seniority among brothers and sisters day data has been increased a weight factor, when rationally controlling calculated amount, the whole transaction situation in the market on certain same day day of trade has been included in the scope of the consideration of calculating, weighed more accurately the impact of historical data on seniority among brothers and sisters day data, improved estimation and ranked the accuracy that day data are advanced the list probability.
The below specifies the process of advancing the list probability of calculating commodity seniority among brothers and sisters day data:
The probability of valid data difference was P (B|A) when (1), historical data entered ranking list:
Figure BSA00000534884500121
Figure BSA00000534884500122
= &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) , ( t ij &Element; U )
The probability of the data difference when (2), historical data enters ranking list is P (A):
Figure BSA00000534884500124
Figure BSA00000534884500125
= &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V))
(3), the data difference general probability is P (B):
Figure BSA00000534884500132
= &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V))
Wherein, the calculating in (1) (2) (3) is regardless of the front and back order.
(4), commodity seniority among brothers and sisters day data the list probability that advances be P (A|B):
According to the result of calculation of (1) (2) (3), can obtain commodity seniority among brothers and sisters day data advance the list probability:
P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B )
P ( A | B = x i 1 ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) &CenterDot; r ij r i
Wherein, (t Ij∈ U, t Ij' ∈ (U ∪ V))
Step S206 is with step S106.
In the embodiment of the present application, enter the probability of ranking list and the comparison between seniority among brothers and sisters day data and the commodity historical data according to the commodity historical data, that calculates commodity seniority among brothers and sisters day data advances the list probability, if advance the list probability greater than probability threshold value, then do not filter the seniority among brothers and sisters day data of commodity, if advance the list probability less than probability threshold value, then filter the seniority among brothers and sisters day data of commodity.The ranking list data filtering method that provides by the embodiment of the present application, provide a kind of take historical data as advancing the list probability according to what calculate that can seniority among brothers and sisters day data enter ranking list, advance the list probability according to this and whether determine whether to filter seniority among brothers and sisters day data greater than the probability threshold value of a certain setting, improved the accuracy of ranking list data filtering.
In order to realize the technical scheme of the embodiment of the present application, based on the technical conceive identical with said method embodiment, the embodiment of the present application also provides a kind of data filtering equipment, and its structural representation specifically comprises as shown in Figure 3:
Acquiring unit 11 is used for obtaining the raw data in the Preset Time section before the seniority among brothers and sisters day from data storage device;
Processing unit 12, the defect degree that is used for the raw data obtained according to described acquiring unit 11 is processed described raw data and is obtained historical data;
Computing unit 13 is used for calculating according to described historical data and seniority among brothers and sisters day data, and that determines seniority among brothers and sisters day data advances the list probability;
Filter element 14 was used for when described computing unit 13 calculates seniority among brothers and sisters day during the advancing the list probability and be less than or equal to probability threshold value of data, filtered described seniority among brothers and sisters day data.
Wherein, described computing unit 13 specifically is used for
Enter the probability of ranking list when it is calculated that the probability that obtains historical data and enter ranking list, historical data value less than seniority among brothers and sisters day data value according to described historical data and seniority among brothers and sisters number of days, and historical data is less than the probability of seniority among brothers and sisters day data; The probability, historical data value that enters ranking list according to described historical data enters the probability of ranking list and historical data during less than seniority among brothers and sisters day data value less than the probability of seniority among brothers and sisters day data, and that calculates seniority among brothers and sisters day data advances the list probability.
Enter the probability of ranking list when wherein, described historical data value is less than seniority among brothers and sisters day data value
P ( B | A ) = count j = 1,2 , . . . , n ( t ij < x i 1 ) count j = 1,2 , . . . , n ( t ij ) , ( t ij &Element; U )
Wherein, x I1Be the seniority among brothers and sisters day data of i kind commodity, t IjBe i commodity in the data of j the day of trade, U be historical data enter ranking list advance the list data set, V be historical data do not advance the list data set, n is the default time period;
Described historical data enters the probability of ranking list
P ( A ) = count j = 1,2 , . . . , n ( t ij ) n , ( t ij &Element; U )
Described historical data is less than the probability of seniority among brothers and sisters day data
P ( B ) = count j = 1,2 , . . . , n ( t ij &prime; < x i 1 ) n , ( t ij &Element; U , t ij &prime; &Element; ( U &cup; V ) )
Described seniority among brothers and sisters day data advance the list probability P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B ) .
Wherein, described computing unit 13 specifically is used for,
The probability of valid data difference when it is calculated that with the seniority among brothers and sisters number of days obtaining historical data enters ranking list according to described historical data, the probability of the data difference when historical data enters ranking list, and data difference general probability; The probability of valid data difference when entering ranking list according to described historical data, the probability of the data difference when historical data enters ranking list, and data difference general probability, that determines seniority among brothers and sisters day data advances the list probability.
The probability of valid data difference when wherein, described historical data enters ranking list
P ( B | A ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) , ( t ij &Element; U ) ;
Wherein, x I1Be the data of described i kind commodity on same day seniority among brothers and sisters day, t IjBe i commodity in the data of j the day of trade, r IjBe t IjRank in the data sample that the part commodity data forms in the j row, r iBe x I1Rank in the data sample that seniority among brothers and sisters part commodity data on day same day forms, U be historical data enter ranking list advance the list data set, n is the time period;
The probability of the data difference when described historical data enters ranking list
P ( A ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V));
Wherein, V be historical data do not advance the list data set;
Described data difference general probability
P ( B ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V));
Described seniority among brothers and sisters day data advance the list probability P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B ) .
Wherein, described processing unit 12 specifically is used for,
According to the non-damaged data in the raw data in the time period, calculate the entropy of each noncolobomatous day of trade of data; According to the entropy of the described day of trade, calculate the coefficient of variation in the described day of trade; According to the coefficient of variation of the described day of trade, calculate the weight coefficient of the described day of trade;
Weight coefficient according to each the noncolobomatous day of trade of data that calculates calculates damaged value.
Wherein, the damaged value of described calculating comprises:
The entropy I of f the day of trade f=-kp fLn (p f);
Wherein, k=1/ln (n),
Figure BSA00000534884500164
N participates in the time period of the time span of ordering for expression;
Wherein,
Figure BSA00000534884500165
d fBe used for f the day of trade of expression data and the distance between the damaged data; t IfFor the value f the day of trade i kind commodity data, t IbFor b the day of trade i kind commodity data;
The described coefficient of variation that calculates in the described day of trade according to the entropy of the described day of trade comprises:
r f=1-I f,, f=1 wherein, 2 ..., n;
The weight coefficient that described coefficient of variation according to the described day of trade calculates the described day of trade comprises:
w f = r f &Sigma; i = 1 m r i , f = 1,2 , . . . , m
The weight coefficient of each noncolobomatous day of trade of data that described basis calculates calculates damaged value and comprises:
t ab=w 1t a1+w 2t a2+...w (b-1)t a(b-1)+w (b+1)t a(b+1)+...+w nt an
Wherein, t AbBe damaged value.
The data filtering equipment that provides in the embodiment of the present application, enter the probability of ranking list and the comparison between seniority among brothers and sisters day data and the commodity historical data according to the commodity historical data, that calculates commodity seniority among brothers and sisters day data advances the list probability, if advance the list probability greater than probability threshold value, then do not filter the seniority among brothers and sisters day data of commodity, if advance the list probability less than probability threshold value, then filter the seniority among brothers and sisters day data of commodity.The ranking list data filtering method that provides by the embodiment of the present application, provide a kind of take historical data as advancing the list probability according to what calculate that can seniority among brothers and sisters day data enter ranking list, advance the list probability according to this and whether determine whether to filter seniority among brothers and sisters day data greater than the probability threshold value of a certain setting, improved the accuracy of ranking list data filtering.
Through the above description of the embodiments, those skilled in the art can be well understood to the embodiment of the present application and can realize by hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding, the technical scheme of the embodiment of the present application can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions with so that computer equipment (can be personal computer, server, or the network equipment etc.) each implements the described method of scene to carry out the embodiment of the present application.
It will be appreciated by those skilled in the art that accompanying drawing is a preferred synoptic diagram of implementing scene, the module in the accompanying drawing or flow process might not be that enforcement the embodiment of the present application is necessary.
It will be appreciated by those skilled in the art that the module in the device of implementing in the scene can be distributed in the device of implementing scene according to implementing scene description, also can carry out respective change and be arranged in the one or more devices that are different from this enforcement scene.The module of above-mentioned enforcement scene can be merged into a module, also can further split into a plurality of submodules.
Above-mentioned the embodiment of the present application sequence number does not represent the quality of implementing scene just to description.
More than disclosed only be several implementation scenes of the embodiment of the present application, still, the embodiment of the present application is not limited thereto, the changes that any person skilled in the art can think of all should fall into the traffic limits scope of the embodiment of the present application.

Claims (14)

1. a ranking list data filtering method is characterized in that, comprising:
Data filtering equipment obtains the raw data in the Preset Time section before the seniority among brothers and sisters day from data storage device;
Data filtering equipment is processed described raw data value of benefit according to the defect degree of described raw data and is obtained historical data;
Data filtering equipment according to described historical data and described seniority among brothers and sisters day data calculate, that determines seniority among brothers and sisters day data advances the list probability;
Data filtering equipment filters described seniority among brothers and sisters day data when the advancing the list probability and be less than or equal to probability threshold value of seniority among brothers and sisters day data.
2. the method for claim 1 is characterized in that, described according to described historical data and described seniority among brothers and sisters day data calculate, determine that the list probability that advances of seniority among brothers and sisters day data comprises:
Enter the probability of ranking list and historical data value when it is calculated that the probability that obtains historical data and enter ranking list, historical data value less than seniority among brothers and sisters day data value according to described historical data and seniority among brothers and sisters number of days less than the probability of seniority among brothers and sisters day data value; The probability, historical data value that enters ranking list according to described historical data enters the probability of ranking list and historical data value during less than seniority among brothers and sisters day data value less than the probability of seniority among brothers and sisters day data value, and that calculates seniority among brothers and sisters day data advances the list probability.
3. method as claimed in claim 2 is characterized in that, the probability that described historical data value enters ranking list during less than seniority among brothers and sisters day data value obtains by following manner:
P ( B | A ) = count j = 1,2 , . . . , n ( t ij < x i 1 ) count j = 1,2 , . . . , n ( t ij ) , ( t ij &Element; U )
The probability that described historical data enters ranking list obtains by following manner:
P ( A ) = count j = 1,2 , . . . , n ( t ij ) n , ( t ij &Element; U )
Described historical data obtains by following manner less than the probability of seniority among brothers and sisters day data:
P ( B ) = count j = 1,2 , . . . , n ( t ij &prime; < x i 1 ) n , ( t ij &Element; U , t ij &prime; &Element; ( U &cup; V ) )
The list probability that advances of described seniority among brothers and sisters day data obtains by following manner: P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B ) ;
Wherein, x I1Be the seniority among brothers and sisters day data of i kind objects of statistics, t IjBe i kind objects of statistics in the data of j the day of trade, U be historical data enter ranking list advance the list data set, V be historical data do not advance the list data set, n is the default time period.
4. the method for claim 1 is characterized in that, described according to described historical data and described seniority among brothers and sisters day data calculate, determine that the list probability that advances of seniority among brothers and sisters day data comprises:
Probability and the data difference general probability of the probability of valid data difference, the data difference when historical data enters ranking list when it is calculated that with the seniority among brothers and sisters number of days obtaining historical data enters ranking list according to described historical data; The probability of the probability of valid data difference, the data difference when historical data enters ranking list and data difference general probability when entering ranking list according to described historical data, that determines seniority among brothers and sisters day data advances the list probability.
5. method as claimed in claim 4 is characterized in that, the probability of valid data difference obtained by following manner when described historical data entered ranking list:
P ( B | A ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 &le; t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) , ( t ij &Element; U ) ;
The probability of the data difference when described historical data enters ranking list obtains by following manner:
P ( A ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V));
Described data difference general probability obtains by following manner:
P ( B ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V));
The list probability that advances of described seniority among brothers and sisters day data obtains by following manner: P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B ) ;
Wherein, x I1Be the data of described i kind objects of statistics on same day seniority among brothers and sisters day, t IjBe i kind objects of statistics in the data of j the day of trade, r IjBe t IjRank in the data sample that the data of part objects of statistics form in the j row, r iBe x I1Rank in the data sample that the data of seniority among brothers and sisters part objects of statistics on day same day form, U be historical data enter ranking list advance the list data set, V be historical data do not advance the list data set; N is the time period.
6. the method for claim 1 is characterized in that, described defect degree according to described raw data obtains historical data to the described raw data value of benefit processing and comprises:
Calculate the entropy of each noncolobomatous day of trade of data according to the non-damaged data in the raw data in the time period; Calculate coefficient of variation in the described day of trade according to the entropy of the described day of trade; Calculate the weight coefficient of the described day of trade according to the coefficient of variation of the described day of trade;
Weight coefficient according to each the noncolobomatous day of trade of data that calculates calculates damaged value.
7. method as claimed in claim 6 is characterized in that, the damaged value of described calculating comprises:
Calculate the entropy I of f the day of trade f=-kp fLn (p f);
Wherein, k=1/ln (n), N participates in the time period of the time span of ordering for expression;
Wherein, d fBe used for f the day of trade of expression data and the distance between the damaged data; t IfFor f the day of trade i objects of statistics data, t IbFor b the day of trade i kind objects of statistics data;
The described coefficient of variation that calculates in the described day of trade according to the entropy of the described day of trade comprises:
r f=1-I f,, f=1 wherein, 2 ..., n;
The weight coefficient that described coefficient of variation according to the described day of trade calculates the described day of trade comprises:
w f = r f &Sigma; i = 1 m r i , f = 1,2 , . . . , m
The weight coefficient of each noncolobomatous day of trade of data that described basis calculates calculates damaged value and comprises:
t ab=w 1t a1+w 2t a2+...w (b-1)t a(b-1)+w (b+1)t a(b+1)+...+w nt an
Wherein, t AbBe damaged value.
8. a data filtering equipment is characterized in that, comprising:
Acquiring unit is used for obtaining the raw data in the Preset Time section before the seniority among brothers and sisters day from data storage device;
Processing unit is used for according to the defect degree of described raw data the described raw data value of benefit processing being obtained historical data;
Computing unit, be used for according to described historical data and described seniority among brothers and sisters day data calculate, that determines seniority among brothers and sisters day data advances the list probability;
Filter element is used for when the advancing the list probability and be less than or equal to probability threshold value of seniority among brothers and sisters day data, filters described seniority among brothers and sisters day data.
9. equipment as claimed in claim 8 is characterized in that, described computing unit specifically is used for:
Enter the probability of ranking list and historical data value when it is calculated that the probability that obtains historical data and enter ranking list, historical data value less than seniority among brothers and sisters day data value according to described historical data and seniority among brothers and sisters number of days less than the probability of seniority among brothers and sisters day data value; The probability, historical data value that enters ranking list according to described historical data enters the probability of ranking list and historical data value during less than seniority among brothers and sisters day data value less than the probability of seniority among brothers and sisters day data value, and that calculates seniority among brothers and sisters day data advances the list probability.
10. equipment as claimed in claim 9 is characterized in that, the probability that described historical data value enters ranking list during less than seniority among brothers and sisters day data value obtains by following manner:
P ( B | A ) = count j = 1,2 , . . . , n ( t ij < x i 1 ) count j = 1,2 , . . . , n ( t ij ) , ( t ij &Element; U )
The probability that described historical data enters ranking list obtains by following manner:
P ( A ) = count j = 1,2 , . . . , n ( t ij ) n , ( t ij &Element; U )
Described historical data obtains by following manner less than the probability of seniority among brothers and sisters day data:
P ( B ) = count j = 1,2 , . . . , n ( t ij &prime; < x i 1 ) n , ( t ij &Element; U , t ij &prime; &Element; ( U &cup; V ) )
The list probability that advances of described seniority among brothers and sisters day data obtains by following manner: P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B ) ;
Wherein, x I1Be the seniority among brothers and sisters day data of i kind objects of statistics, t IjBe i kind objects of statistics in the data of j the day of trade, U be historical data enter ranking list advance the list data set, V be historical data do not advance the list data set, n is the default time period.
11. equipment as claimed in claim 8 is characterized in that, described computing unit specifically is used for,
Probability and the data difference general probability of the probability of valid data difference, the data difference when historical data enters ranking list when it is calculated that with the seniority among brothers and sisters number of days obtaining historical data enters ranking list according to described historical data; The probability of the probability of valid data difference, the data difference when historical data enters ranking list and data difference general probability when entering ranking list according to described historical data, that determines seniority among brothers and sisters day data advances the list probability.
12. equipment as claimed in claim 11 is characterized in that, the probability of valid data difference obtained by following manner when described historical data entered ranking list:
P ( B | A ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) , ( t ij &Element; U ) ;
The probability of the data difference when described historical data enters ranking list obtains by following manner:
P ( A ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V));
Described data difference general probability obtains by following manner:
P ( B ) = &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) &CenterDot; r ij r i + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) &CenterDot; r ij r i &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij - x i 1 ) + &Sigma; j = 1 , x i 1 > t ij n ( x i 1 - t ij &prime; ) + &Sigma; j = 1 , x i 1 < t ij n 1 ( t ij &prime; - x i 1 ) ,
(t ij∈U,t ij′∈(U∪V));
The list probability that advances of described seniority among brothers and sisters day data obtains by following manner: P ( A | B ) = P ( A , B ) P ( B ) = P ( B | A ) &CenterDot; P ( A ) P ( B ) ;
Wherein, x I1Be the data of described i kind objects of statistics on same day seniority among brothers and sisters day, t IjBe i objects of statistics in the data of j the day of trade, r IjBe t IjRank in the data sample that the data of part objects of statistics form in the j row, r iBe x I1Rank in the data sample that the data of seniority among brothers and sisters part objects of statistics on day same day form, U be historical data enter ranking list advance the list data set, V be historical data do not advance the list data set, n is the time period.
13. equipment as claimed in claim 8 is characterized in that, described processing unit specifically is used for,
According to the non-damaged data in the raw data in the time period, calculate the entropy of each noncolobomatous day of trade of data; According to the entropy of the described day of trade, calculate the coefficient of variation in the described day of trade; According to the coefficient of variation of the described day of trade, calculate the weight coefficient of the described day of trade;
Weight coefficient according to each the noncolobomatous day of trade of data that calculates calculates damaged value.
14. equipment as claimed in claim 13 is characterized in that, the damaged value of described calculating comprises:
Calculate the entropy I of f the day of trade f=-kp fLn (p f);
Wherein, k=1/ln (n),
Figure FSA00000534884400071
N participates in the time period of the time span of ordering for expression;
Wherein,
Figure FSA00000534884400072
d fBe used for f the day of trade of expression data and the distance between the damaged data; t IfFor the value f the day of trade i kind objects of statistics data, t IbFor b the day of trade i kind objects of statistics data;
The described coefficient of variation that calculates in the described day of trade according to the entropy of the described day of trade comprises:
r f=1-I f,, f=1 wherein, 2 ..., n;
The weight coefficient that described coefficient of variation according to the described day of trade calculates the described day of trade comprises:
w f = r f &Sigma; i = 1 m , f = 1,2 , . . . , m
The weight coefficient of each noncolobomatous day of trade of data that described basis calculates calculates damaged value and comprises:
t ab=w 1t a1+w 2t a2+...w (b-1)t a(b-1)+w (b+1)t a(b+1)+...+w nt an
Wherein, t AbBe damaged value.
CN2011101925152A 2011-07-11 2011-07-11 Method and equipment for filtering ranking list data Pending CN102880603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101925152A CN102880603A (en) 2011-07-11 2011-07-11 Method and equipment for filtering ranking list data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101925152A CN102880603A (en) 2011-07-11 2011-07-11 Method and equipment for filtering ranking list data

Publications (1)

Publication Number Publication Date
CN102880603A true CN102880603A (en) 2013-01-16

Family

ID=47481932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101925152A Pending CN102880603A (en) 2011-07-11 2011-07-11 Method and equipment for filtering ranking list data

Country Status (1)

Country Link
CN (1) CN102880603A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970761A (en) * 2013-01-28 2014-08-06 阿里巴巴集团控股有限公司 Commodity data searching method and device
CN104063378A (en) * 2013-03-18 2014-09-24 深圳市金阶网科技有限公司 Method, device and system for ranking and eliminating for network ranking list
WO2015051752A1 (en) * 2013-10-10 2015-04-16 Beijing Zhigu Rui Tuo Tech Co., Ltd Ranking fraud detection for application
CN105164706A (en) * 2013-03-13 2015-12-16 空中食宿公司 Automated determination of booking availability for user sourced accommodations
CN106294691A (en) * 2016-08-04 2017-01-04 广州交易猫信息技术有限公司 List method for refreshing, device and service end
CN106933855A (en) * 2015-12-30 2017-07-07 阿里巴巴集团控股有限公司 Object order method, apparatus and system
US10606845B2 (en) 2013-10-10 2020-03-31 Beijing Zhigu Rui Tuo Tech Co., Ltd Detecting leading session of application

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1423222A (en) * 2001-11-27 2003-06-11 株式会社世界 Commodity sorting system based on dynamic state and method thereof
US20110060733A1 (en) * 2009-09-04 2011-03-10 Alibaba Group Holding Limited Information retrieval based on semantic patterns of queries

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1423222A (en) * 2001-11-27 2003-06-11 株式会社世界 Commodity sorting system based on dynamic state and method thereof
US20110060733A1 (en) * 2009-09-04 2011-03-10 Alibaba Group Holding Limited Information retrieval based on semantic patterns of queries

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈欢,黄德才: "基于广义马氏距离的缺损数据补值算法", 《计算机科学》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970761A (en) * 2013-01-28 2014-08-06 阿里巴巴集团控股有限公司 Commodity data searching method and device
CN105164706A (en) * 2013-03-13 2015-12-16 空中食宿公司 Automated determination of booking availability for user sourced accommodations
US10467553B2 (en) 2013-03-13 2019-11-05 Airbnb, Inc. Automated determination of booking availability for user sourced accommodations
CN105164706B (en) * 2013-03-13 2022-01-25 空中食宿公司 Automatic determination of subscription availability for user sourced accommodation
US11257010B2 (en) 2013-03-13 2022-02-22 Airbnb, Inc. Automated determination of booking availability for user sourced accommodations
CN104063378A (en) * 2013-03-18 2014-09-24 深圳市金阶网科技有限公司 Method, device and system for ranking and eliminating for network ranking list
WO2015051752A1 (en) * 2013-10-10 2015-04-16 Beijing Zhigu Rui Tuo Tech Co., Ltd Ranking fraud detection for application
US10606845B2 (en) 2013-10-10 2020-03-31 Beijing Zhigu Rui Tuo Tech Co., Ltd Detecting leading session of application
CN106933855A (en) * 2015-12-30 2017-07-07 阿里巴巴集团控股有限公司 Object order method, apparatus and system
CN106933855B (en) * 2015-12-30 2020-06-23 阿里巴巴集团控股有限公司 Object sorting method, device and system
CN106294691A (en) * 2016-08-04 2017-01-04 广州交易猫信息技术有限公司 List method for refreshing, device and service end

Similar Documents

Publication Publication Date Title
CN102880603A (en) Method and equipment for filtering ranking list data
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
Brechmann et al. Flexible dependence modeling of operational risk losses and its impact on total capital requirements
CN111507521B (en) Method and device for predicting power load of transformer area
CN113591380B (en) Traffic flow prediction method, medium and equipment based on graph Gaussian process
WO2015187372A1 (en) Digital event profile filters
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
Sellers et al. Conway–Maxwell–Poisson regression models for dispersed count data
CN111080360B (en) Behavior prediction method, model training method, device, server and storage medium
CN111047078B (en) Traffic characteristic prediction method, system and storage medium
Okhrin et al. gofCopula: Goodness-of-Fit tests for copulae
Zou et al. Mixture modeling of freeway speed and headway data using multivariate skew-t distributions
CN111160959A (en) User click conversion estimation method and device
CN114463727A (en) Subway driver behavior identification method
Mrkvička et al. New methods for multiple testing in permutation inference for the general linear model
CN105590026A (en) PCA (Principal Component Analysis) based satellite telemetering regression method
Jalali et al. Using the method of simulated moments for system identification
CN111898249A (en) Landslide displacement nonparametric probability density prediction method, equipment and storage medium
RU2632124C1 (en) Method of predictive assessment of multi-stage process effectiveness
CN105825347A (en) Economy prediction model building method and economy prediction model prediction method
DE112022000915T5 (en) CREATE A STATISTICAL MODEL AND EVALUATE MODEL PERFORMANCE
CN113177603A (en) Training method of classification model, video classification method and related equipment
Park et al. A Markov chain Monte Carlo-based origin destination matrix estimator that is robust to imperfect intelligent transportation systems data
Liu et al. Functional L-optimality subsampling for massive data
CN113627950B (en) Method and system for extracting user transaction characteristics based on dynamic diagram

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1175553

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20130116

RJ01 Rejection of invention patent application after publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1175553

Country of ref document: HK