CN102622447A - Hadoop-based frequent closed itemset mining method - Google Patents
Hadoop-based frequent closed itemset mining method Download PDFInfo
- Publication number
- CN102622447A CN102622447A CN2012100725242A CN201210072524A CN102622447A CN 102622447 A CN102622447 A CN 102622447A CN 2012100725242 A CN2012100725242 A CN 2012100725242A CN 201210072524 A CN201210072524 A CN 201210072524A CN 102622447 A CN102622447 A CN 102622447A
- Authority
- CN
- China
- Prior art keywords
- list
- frequent
- frequent closed
- hadoop
- term collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses a Hadoop-based frequent closed itemset mining method. The method comprises the following steps of: performing parallel counting; parallelly scanning a database once, and counting frequent time of each data item in the database; constructing global frequency-list (F-List) and group-list (G-List); parallelly mining a local frequent closed itemset; scanning the database again, mining the local frequent closed itemset in each node by adopting a first algorithm, and only saving a global frequent closed itemset. According to the method, calculation tasks are calculated on the basis of Group, so that the allocation of calculation amount is uniform; and meanwhile, the method is simple, and the mining task can be completed only by three steps (two steps of Map-Reduce).
Description
Technical field
The present invention relates to a kind of frequent closed term collection method for digging based on Hadoop.
Background technology
In high-volume database, excavating frequent closed term collection, is a crucial research contents in data mining field.It is widely used in the excavation correlation rule between the mining data, like the market basket analysis problem, and collaborative filtering problem etc.It usually is superior to other data mining algorithms at aspects such as applicability, digging efficiency, accuracy and intelligibilitys.
Present frequent closed term collection method for digging is varied, but basically all is single cpu mode.When facing the data of magnanimity, the algorithm that single cpu mode excavates frequent closed term collection down usually seems unable to do what one wishes.
Summary of the invention
Goal of the invention: the problem and shortage to above-mentioned prior art exists, the purpose of this invention is to provide a kind of frequent closed term collection method for digging based on Hadoop, realize parallel computation.
Technical scheme: for realizing the foregoing invention purpose, the technical scheme that the present invention adopts is a kind of frequent closed term collection method for digging based on Hadoop, comprises the steps:
(1) parallel counting: run-down database concurrently, the number of times (being frequent number of times) that each data item occurs in the staqtistical data base;
(2) overall F-List of structure and G-List:
1) the output result with parallel counting in the said step (1) is input, constructs overall F-List;
2) on the basis of this overall situation F-List, structure G-List;
(3) parallelly excavate local frequent closed term collection: scan database once more, adopt first algorithm to excavate local frequent closed term collection at each node, and only preserve the frequent closed term collection of the overall situation.
In the said step 1): the output result with parallel counting in the step (1) is input, can get the item that satisfies minimum support min_sup, sorts according to frequent number of times is descending, and the result leaves among the F-List.
The said first optimal algorithm selection AFOPT-Closed (AFOPT:Ascending Frequency Ordered Prefix Tree) algorithm.
Beneficial effect: the inventive method utilizes Hadoop to realize parallel computation based on Group Distribution Calculation task, makes that the distribution of calculated amount is balanced more, has improved efficient; Simultaneously, this method is more succinct, as long as three steps (twice Map-Reduce process) just can be accomplished mining task.
Description of drawings
Fig. 1 is the process flow diagram of the inventive method.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment; Further illustrate the present invention; Should understand these embodiment only be used to the present invention is described and be not used in the restriction scope of the present invention; After having read the present invention, those skilled in the art all fall within the application's accompanying claims institute restricted portion to the modification of the various equivalent form of values of the present invention.
As shown in Figure 1, the step of the inventive method comprises:
Claims (3)
1. the frequent closed term collection method for digging based on Hadoop comprises the steps:
(1) parallel counting: run-down database concurrently, the number of times that the frequent number of times of each data item occurs in the staqtistical data base;
(2) overall F-List of structure and G-List:
1) the output result with parallel counting in the said step (1) is input, constructs overall F-List;
2) on the basis of this overall situation F-List, structure G-List;
(3) parallelly excavate local frequent closed term collection: scan database once more, adopt first algorithm to excavate local frequent closed term collection at each node, and only preserve the frequent closed term collection of the overall situation.
2. according to the said a kind of frequent closed term collection method for digging of claim 1 based on Hadoop; It is characterized in that: in the said step 1): the output result with parallel counting in the step (1) is input; Get the item that satisfies minimum support min_sup; Sort according to frequent number of times is descending, the result leaves among the F-List.
3. according to the said a kind of frequent closed term collection method for digging based on Hadoop of claim 1, it is characterized in that: said first algorithm is the AFOPT-Closed algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210072524.2A CN102622447B (en) | 2012-03-19 | 2012-03-19 | Hadoop-based frequent closed itemset mining method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210072524.2A CN102622447B (en) | 2012-03-19 | 2012-03-19 | Hadoop-based frequent closed itemset mining method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102622447A true CN102622447A (en) | 2012-08-01 |
CN102622447B CN102622447B (en) | 2014-03-05 |
Family
ID=46562366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210072524.2A Active CN102622447B (en) | 2012-03-19 | 2012-03-19 | Hadoop-based frequent closed itemset mining method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102622447B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324712A (en) * | 2013-06-19 | 2013-09-25 | 西北工业大学 | Extraction method for non-redundancy plot rule |
CN103714009A (en) * | 2013-12-20 | 2014-04-09 | 华中科技大学 | MapReduce realizing method based on unified management of internal memory on GPU |
CN104008185A (en) * | 2014-06-11 | 2014-08-27 | 西北工业大学 | Frequent close scenario mining method based on same node table and scenario tree |
CN104834709A (en) * | 2015-04-29 | 2015-08-12 | 南京理工大学 | Parallel cosine mode mining method based on load balancing |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104765847A (en) * | 2015-04-20 | 2015-07-08 | 西北工业大学 | Frequent closed item set mining method based on order-preserving characteristic and preamble tree |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6044366A (en) * | 1998-03-16 | 2000-03-28 | Microsoft Corporation | Use of the UNPIVOT relational operator in the efficient gathering of sufficient statistics for data mining |
CN101446978A (en) * | 2008-12-11 | 2009-06-03 | 南京大学 | Core node discovery method based on frequent itemset mining |
CN101799810A (en) * | 2009-02-06 | 2010-08-11 | 中国移动通信集团公司 | Association rule mining method and system thereof |
CN101996102A (en) * | 2009-08-31 | 2011-03-30 | 中国移动通信集团公司 | Method and system for mining data association rule |
-
2012
- 2012-03-19 CN CN201210072524.2A patent/CN102622447B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6044366A (en) * | 1998-03-16 | 2000-03-28 | Microsoft Corporation | Use of the UNPIVOT relational operator in the efficient gathering of sufficient statistics for data mining |
CN101446978A (en) * | 2008-12-11 | 2009-06-03 | 南京大学 | Core node discovery method based on frequent itemset mining |
CN101799810A (en) * | 2009-02-06 | 2010-08-11 | 中国移动通信集团公司 | Association rule mining method and system thereof |
CN101996102A (en) * | 2009-08-31 | 2011-03-30 | 中国移动通信集团公司 | Method and system for mining data association rule |
Non-Patent Citations (1)
Title |
---|
缪裕青: "频繁闭合项目集的并行挖掘算法研究", 《计算机科学》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324712A (en) * | 2013-06-19 | 2013-09-25 | 西北工业大学 | Extraction method for non-redundancy plot rule |
CN103714009A (en) * | 2013-12-20 | 2014-04-09 | 华中科技大学 | MapReduce realizing method based on unified management of internal memory on GPU |
CN104008185A (en) * | 2014-06-11 | 2014-08-27 | 西北工业大学 | Frequent close scenario mining method based on same node table and scenario tree |
CN104834709A (en) * | 2015-04-29 | 2015-08-12 | 南京理工大学 | Parallel cosine mode mining method based on load balancing |
CN104834709B (en) * | 2015-04-29 | 2018-07-31 | 南京理工大学 | A kind of parallel cosine mode method for digging based on load balancing |
Also Published As
Publication number | Publication date |
---|---|
CN102622447B (en) | 2014-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Koseleva et al. | Big data in building energy efficiency: understanding of big data and main challenges | |
CN102622447B (en) | Hadoop-based frequent closed itemset mining method | |
CN103150163A (en) | Map/Reduce mode-based parallel relating method | |
CN103761236A (en) | Incremental frequent pattern increase data mining method | |
CN107229751A (en) | A kind of concurrent incremental formula association rule mining method towards stream data | |
CN103678671A (en) | Dynamic community detection method in social network | |
CN105959372A (en) | Internet user data analysis method based on mobile application | |
Liao et al. | MRPrePost—A parallel algorithm adapted for mining big data | |
CN103020163A (en) | Node-similarity-based network community division method in network | |
CN103218692A (en) | Workflow excavating method based on inter-movement dependency relation analysis | |
CN102799625B (en) | Method and system for excavating topic core circle in social networking service | |
CN105279187A (en) | Edge clustering coefficient-based social network group division method | |
CN106294390A (en) | A kind of data mining analysis method and system | |
Xu et al. | Distributed maximal clique computation and management | |
CN105138650A (en) | Hadoop data cleaning method and system based on outlier mining | |
CN104216889B (en) | Data dissemination analyzing and predicting method and system based on cloud service | |
Abdullah et al. | Density grid-based clustering for wireless sensors networks | |
CN104834557A (en) | Data analysis method based on Hadoop | |
CN111475837B (en) | Network big data privacy protection method | |
CN105069290A (en) | Parallelization critical node discovery method for postal delivery data | |
Xie et al. | Vital node identification in hypergraphs via gravity model | |
Le et al. | A novel algorithm for mining high utility itemsets | |
CN103984723A (en) | Method used for updating data mining for frequent item by incremental data | |
CN104573864A (en) | Data analysis alarm method based on autoregressive prediction | |
KR101693727B1 (en) | Apparatus and method for reorganizing social issues from research and development perspective using social network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |