DE69938339D1 - Ein skalierbares system zum gruppieren von grossen datenbänken - Google Patents

Ein skalierbares system zum gruppieren von grossen datenbänken

Info

Publication number
DE69938339D1
DE69938339D1 DE69938339T DE69938339T DE69938339D1 DE 69938339 D1 DE69938339 D1 DE 69938339D1 DE 69938339 T DE69938339 T DE 69938339T DE 69938339 T DE69938339 T DE 69938339T DE 69938339 D1 DE69938339 D1 DE 69938339D1
Authority
DE
Germany
Prior art keywords
data
database
clusters
models
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
DE69938339T
Other languages
English (en)
Other versions
DE69938339T2 (de
Inventor
Usama Fayyad
Paul S Bradley
Cory Reina
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Corp
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of DE69938339D1 publication Critical patent/DE69938339D1/de
Application granted granted Critical
Publication of DE69938339T2 publication Critical patent/DE69938339T2/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation
DE69938339T 1998-03-17 1999-03-16 Ein skalierbares system zum gruppieren von grossen datenbänken Expired - Lifetime DE69938339T2 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/040,219 US6374251B1 (en) 1998-03-17 1998-03-17 Scalable system for clustering of large databases
US40219 1998-03-17
PCT/US1999/005759 WO1999048018A1 (en) 1998-03-17 1999-03-16 A scalable system for clustering of large databases

Publications (2)

Publication Number Publication Date
DE69938339D1 true DE69938339D1 (de) 2008-04-24
DE69938339T2 DE69938339T2 (de) 2009-03-12

Family

ID=21909787

Family Applications (1)

Application Number Title Priority Date Filing Date
DE69938339T Expired - Lifetime DE69938339T2 (de) 1998-03-17 1999-03-16 Ein skalierbares system zum gruppieren von grossen datenbänken

Country Status (5)

Country Link
US (1) US6374251B1 (de)
EP (1) EP1062590B1 (de)
AT (1) ATE389213T1 (de)
DE (1) DE69938339T2 (de)
WO (1) WO1999048018A1 (de)

Families Citing this family (127)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6581058B1 (en) * 1998-05-22 2003-06-17 Microsoft Corporation Scalable system for clustering of large databases having mixed data attributes
GB9903697D0 (en) * 1999-02-19 1999-04-14 Pattern Computing Limited A computer-based method for matching patterns
US6549907B1 (en) * 1999-04-22 2003-04-15 Microsoft Corporation Multi-dimensional database and data cube compression for aggregate query support on numeric dimensions
US6957189B2 (en) * 1999-08-30 2005-10-18 Sabre Inc. Apparatus and method for creating a marketing initiative
US8935198B1 (en) * 1999-09-08 2015-01-13 C4Cast.Com, Inc. Analysis and prediction of data using clusterization
US7424439B1 (en) * 1999-09-22 2008-09-09 Microsoft Corporation Data mining for managing marketing resources
US7106329B1 (en) 1999-09-30 2006-09-12 Battelle Memorial Institute Methods and apparatus for displaying disparate types of information using an interactive surface map
US6990238B1 (en) * 1999-09-30 2006-01-24 Battelle Memorial Institute Data processing, analysis, and visualization system for use with disparate data types
CN1241135C (zh) * 1999-10-21 2006-02-08 国际商业机器公司 用于排序分类属性以更好地可视化多维数据的系统和方法
US6643645B1 (en) * 2000-02-08 2003-11-04 Microsoft Corporation Retrofitting recommender system for achieving predetermined performance requirements
US7162482B1 (en) * 2000-05-03 2007-01-09 Musicmatch, Inc. Information retrieval engine
US6633882B1 (en) * 2000-06-29 2003-10-14 Microsoft Corporation Multi-dimensional database record compression utilizing optimized cluster models
US6584433B1 (en) * 2000-10-04 2003-06-24 Hewlett-Packard Development Company Lp Harmonic average based clustering method and system
US6944607B1 (en) * 2000-10-04 2005-09-13 Hewlett-Packard Development Compnay, L.P. Aggregated clustering method and system
US6922660B2 (en) * 2000-12-01 2005-07-26 Microsoft Corporation Determining near-optimal block size for incremental-type expectation maximization (EM) algorithms
US6615205B1 (en) * 2000-12-22 2003-09-02 Paul M. Cereghini Horizontal implementation of expectation-maximization algorithm in SQL for performing clustering in very large databases
US6519591B1 (en) * 2000-12-22 2003-02-11 Ncr Corporation Vertical implementation of expectation-maximization algorithm in SQL for performing clustering in very large databases
US6658423B1 (en) * 2001-01-24 2003-12-02 Google, Inc. Detecting duplicate and near-duplicate files
US7398270B1 (en) * 2001-01-31 2008-07-08 Choi Lawrence J Method and system for clustering optimization and applications
US6928434B1 (en) * 2001-01-31 2005-08-09 Rosetta Marketing Strategies Group Method and system for clustering optimization and applications
US6745184B1 (en) * 2001-01-31 2004-06-01 Rosetta Marketing Strategies Group Method and system for clustering optimization and applications
US6785684B2 (en) * 2001-03-27 2004-08-31 International Business Machines Corporation Apparatus and method for determining clustering factor in a database using block level sampling
US7155668B2 (en) * 2001-04-19 2006-12-26 International Business Machines Corporation Method and system for identifying relationships between text documents and structured variables pertaining to the text documents
US20030028504A1 (en) * 2001-05-08 2003-02-06 Burgoon David A. Method and system for isolating features of defined clusters
US7272617B1 (en) * 2001-11-30 2007-09-18 Ncr Corp. Analytic data set creation for modeling in a customer relationship management system
US7747624B2 (en) * 2002-05-10 2010-06-29 Oracle International Corporation Data summarization
US7174343B2 (en) * 2002-05-10 2007-02-06 Oracle International Corporation In-database clustering
US7133811B2 (en) * 2002-10-15 2006-11-07 Microsoft Corporation Staged mixture modeling
US20040093261A1 (en) * 2002-11-08 2004-05-13 Vivek Jain Automatic validation of survey results
US6993516B2 (en) * 2002-12-26 2006-01-31 International Business Machines Corporation Efficient sampling of a relational database
US20050033723A1 (en) * 2003-08-08 2005-02-10 Selby David A. Method, system, and computer program product for sorting data
US7403640B2 (en) * 2003-10-27 2008-07-22 Hewlett-Packard Development Company, L.P. System and method for employing an object-oriented motion detector to capture images
US7539690B2 (en) * 2003-10-27 2009-05-26 Hewlett-Packard Development Company, L.P. Data mining method and system using regression clustering
US7225200B2 (en) 2004-04-14 2007-05-29 Microsoft Corporation Automatic data perspective generation for a target variable
US7349914B1 (en) 2004-05-04 2008-03-25 Ncr Corp. Method and apparatus to cluster binary data transactions
US8078559B2 (en) 2004-06-30 2011-12-13 Northrop Grumman Systems Corporation System and method for the automated discovery of unknown unknowns
US8631347B2 (en) * 2004-11-15 2014-01-14 Microsoft Corporation Electronic document style matrix
US7415487B2 (en) * 2004-12-17 2008-08-19 Amazon Technologies, Inc. Apparatus and method for data warehousing
GB2422081B (en) * 2005-01-05 2010-09-29 Ling Dynamic Systems Statistical streaming
US7359913B1 (en) 2005-05-13 2008-04-15 Ncr Corp. K-means clustering using structured query language (SQL) statements and sufficient statistics
US7739314B2 (en) * 2005-08-15 2010-06-15 Google Inc. Scalable user clustering based on set similarity
US20070055708A1 (en) * 2005-09-07 2007-03-08 Ncr Corporation Processing formulae in rules for profitability calculations for financial processing in a relational database management system
US7840774B2 (en) * 2005-09-09 2010-11-23 International Business Machines Corporation Compressibility checking avoidance
US8001526B2 (en) * 2005-09-15 2011-08-16 Microsoft Corporation Hierarchical property storage
US7783971B2 (en) * 2005-09-13 2010-08-24 Microsoft Corporation Graphic object themes
US20070061351A1 (en) * 2005-09-13 2007-03-15 Microsoft Corporation Shape object text
US20070061349A1 (en) * 2005-09-15 2007-03-15 Microsoft Corporation Hierarchically describing shapes
US7721205B2 (en) * 2005-09-15 2010-05-18 Microsoft Corporation Integration of composite objects in host applications
US7716169B2 (en) * 2005-12-08 2010-05-11 Electronics And Telecommunications Research Institute System for and method of extracting and clustering information
US20070250476A1 (en) * 2006-04-21 2007-10-25 Lockheed Martin Corporation Approximate nearest neighbor search in metric space
US20070255684A1 (en) * 2006-04-29 2007-11-01 Yahoo! Inc. System and method using flat clustering for evolutionary clustering of sequential data sets
US8266147B2 (en) * 2006-09-18 2012-09-11 Infobright, Inc. Methods and systems for database organization
US8700579B2 (en) * 2006-09-18 2014-04-15 Infobright Inc. Method and system for data compression in a relational database
US8655623B2 (en) * 2007-02-13 2014-02-18 International Business Machines Corporation Diagnostic system and method
US7636715B2 (en) * 2007-03-23 2009-12-22 Microsoft Corporation Method for fast large scale data mining using logistic regression
EP2497256B1 (de) * 2009-11-03 2014-01-08 Telefonaktiebolaget LM Ericsson (publ) Verringerund der rechenkomplexität während einer benutzerdatenanalyse
US8402027B1 (en) * 2010-02-11 2013-03-19 Disney Enterprises, Inc. System and method for hybrid hierarchical segmentation
US8417727B2 (en) 2010-06-14 2013-04-09 Infobright Inc. System and method for storing data in a relational database
US8521748B2 (en) 2010-06-14 2013-08-27 Infobright Inc. System and method for managing metadata in a relational database
US9002859B1 (en) 2010-12-17 2015-04-07 Moonshadow Mobile, Inc. Systems and methods for high-speed searching and filtering of large datasets
US8977656B2 (en) 2011-01-10 2015-03-10 Moonshadow Mobile, Inc. Inline tree data structure for high-speed searching and filtering of large datasets
EP2671168A4 (de) 2011-02-03 2017-03-08 Voxeleron LLC Verfahren und system für bildanalyse und -interpretation
US9026591B2 (en) 2011-02-28 2015-05-05 Avaya Inc. System and method for advanced communication thread analysis
US9547693B1 (en) 2011-06-23 2017-01-17 Palantir Technologies Inc. Periodic database search manager for multiple data sources
US9026560B2 (en) * 2011-09-16 2015-05-05 Cisco Technology, Inc. Data center capability summarization
US8886651B1 (en) 2011-12-22 2014-11-11 Reputation.Com, Inc. Thematic clustering
US9171054B1 (en) 2012-01-04 2015-10-27 Moonshadow Mobile, Inc. Systems and methods for high-speed searching and filtering of large datasets
US8990204B1 (en) 2012-01-17 2015-03-24 Roy W. Ward Processing and storage of spatial data
US8751499B1 (en) 2013-01-22 2014-06-10 Splunk Inc. Variable representative sampling under resource constraints
US8682906B1 (en) 2013-01-23 2014-03-25 Splunk Inc. Real time display of data field values based on manual editing of regular expressions
US10394946B2 (en) 2012-09-07 2019-08-27 Splunk Inc. Refining extraction rules based on selected text within events
US9753909B2 (en) 2012-09-07 2017-09-05 Splunk, Inc. Advanced field extractor with multiple positive examples
US20140208217A1 (en) 2013-01-22 2014-07-24 Splunk Inc. Interface for managing splittable timestamps across event records
US8751963B1 (en) 2013-01-23 2014-06-10 Splunk Inc. Real time indication of previously extracted data fields for regular expressions
US9594814B2 (en) 2012-09-07 2017-03-14 Splunk Inc. Advanced field extractor with modification of an extracted field
US10013477B2 (en) 2012-11-19 2018-07-03 The Penn State Research Foundation Accelerated discrete distribution clustering under wasserstein distance
US9720998B2 (en) * 2012-11-19 2017-08-01 The Penn State Research Foundation Massive clustering of discrete distributions
US8977589B2 (en) 2012-12-19 2015-03-10 International Business Machines Corporation On the fly data binning
US9152929B2 (en) 2013-01-23 2015-10-06 Splunk Inc. Real time display of statistics and values for selected regular expressions
US8909642B2 (en) 2013-01-23 2014-12-09 Splunk Inc. Automatic generation of a field-extraction rule based on selections in a sample event
KR102029055B1 (ko) * 2013-02-08 2019-10-07 삼성전자주식회사 고차원 데이터의 시각화 방법 및 장치
US10275778B1 (en) 2013-03-15 2019-04-30 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures
US8818892B1 (en) 2013-03-15 2014-08-26 Palantir Technologies, Inc. Prioritizing data clusters with customizable scoring strategies
US9965937B2 (en) 2013-03-15 2018-05-08 Palantir Technologies Inc. External malware data item clustering and analysis
CN103309940B (zh) * 2013-05-03 2017-03-08 上海证券交易所 一种对乱序数据流排序的方法
US9116975B2 (en) 2013-10-18 2015-08-25 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores
WO2015088780A1 (en) * 2013-12-10 2015-06-18 University Of Southern California Noise-enhanced clustering and competitive learning
US10579647B1 (en) 2013-12-16 2020-03-03 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US10356032B2 (en) 2013-12-26 2019-07-16 Palantir Technologies Inc. System and method for detecting confidential information emails
US8832832B1 (en) 2014-01-03 2014-09-09 Palantir Technologies Inc. IP reputation
US9535974B1 (en) 2014-06-30 2017-01-03 Palantir Technologies Inc. Systems and methods for identifying key phrase clusters within documents
US9619557B2 (en) 2014-06-30 2017-04-11 Palantir Technologies, Inc. Systems and methods for key phrase characterization of documents
US9202249B1 (en) 2014-07-03 2015-12-01 Palantir Technologies Inc. Data item clustering and analysis
US9256664B2 (en) 2014-07-03 2016-02-09 Palantir Technologies Inc. System and method for news events detection and visualization
CN104156418B (zh) * 2014-08-01 2015-09-30 北京系统工程研究所 一种基于知识重用的演化聚类方法
US9501851B2 (en) 2014-10-03 2016-11-22 Palantir Technologies Inc. Time-series analysis system
US9767172B2 (en) 2014-10-03 2017-09-19 Palantir Technologies Inc. Data aggregation and analysis system
US10255345B2 (en) * 2014-10-09 2019-04-09 Business Objects Software Ltd. Multivariate insight discovery approach
US9984133B2 (en) 2014-10-16 2018-05-29 Palantir Technologies Inc. Schematic and database linking system
US9043894B1 (en) 2014-11-06 2015-05-26 Palantir Technologies Inc. Malicious software detection in a computing system
US10362133B1 (en) 2014-12-22 2019-07-23 Palantir Technologies Inc. Communication data processing architecture
US9348920B1 (en) 2014-12-22 2016-05-24 Palantir Technologies Inc. Concept indexing among database of documents using machine learning techniques
US10552994B2 (en) 2014-12-22 2020-02-04 Palantir Technologies Inc. Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US9367872B1 (en) 2014-12-22 2016-06-14 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures
US20160253672A1 (en) * 2014-12-23 2016-09-01 Palantir Technologies, Inc. System and methods for detecting fraudulent transactions
US9817563B1 (en) 2014-12-29 2017-11-14 Palantir Technologies Inc. System and method of generating data points from one or more data stores of data items for chart creation and manipulation
US10103953B1 (en) 2015-05-12 2018-10-16 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US10191966B2 (en) * 2015-07-08 2019-01-29 Business Objects Software Ltd. Enabling advanced analytics with large data sets
US9454785B1 (en) 2015-07-30 2016-09-27 Palantir Technologies Inc. Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
US9456000B1 (en) 2015-08-06 2016-09-27 Palantir Technologies Inc. Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications
US10489391B1 (en) 2015-08-17 2019-11-26 Palantir Technologies Inc. Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface
US10102280B2 (en) * 2015-08-31 2018-10-16 International Business Machines Corporation Determination of expertness level for a target keyword
US10467204B2 (en) * 2016-02-18 2019-11-05 International Business Machines Corporation Data sampling in a storage system
US10521411B2 (en) 2016-08-10 2019-12-31 Moonshadow Mobile, Inc. Systems, methods, and data structures for high-speed searching or filtering of large datasets
US10318630B1 (en) 2016-11-21 2019-06-11 Palantir Technologies Inc. Analysis of large bodies of textual data
US10620618B2 (en) 2016-12-20 2020-04-14 Palantir Technologies Inc. Systems and methods for determining relationships between defects
US10691700B1 (en) * 2016-12-30 2020-06-23 Uber Technologies, Inc. Table replica allocation in a replicated storage system
US10325224B1 (en) 2017-03-23 2019-06-18 Palantir Technologies Inc. Systems and methods for selecting machine learning training data
US10606866B1 (en) 2017-03-30 2020-03-31 Palantir Technologies Inc. Framework for exposing network activities
US10579663B2 (en) * 2017-05-02 2020-03-03 International Business Machines Corporation Data insight discovery using a clustering technique
US10235461B2 (en) 2017-05-02 2019-03-19 Palantir Technologies Inc. Automated assistance for generating relevant and valuable search results for an entity of interest
US10482382B2 (en) 2017-05-09 2019-11-19 Palantir Technologies Inc. Systems and methods for reducing manufacturing failure rates
US10503526B2 (en) * 2017-06-13 2019-12-10 Western Digital Technologies, Inc. Method and system for user experience event processing and analysis
CN109685092B (zh) * 2018-08-21 2024-02-06 中国平安人寿保险股份有限公司 基于大数据的聚类方法、设备、存储介质及装置
US11507961B2 (en) 2018-09-14 2022-11-22 Raytheon Technologies Corporation Fabricated data detection method
CN110222030B (zh) * 2019-05-13 2021-08-06 福建天泉教育科技有限公司 数据库动态扩容的方法、存储介质
US11551024B1 (en) * 2019-11-22 2023-01-10 Mastercard International Incorporated Hybrid clustered prediction computer modeling

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5706503A (en) 1994-05-18 1998-01-06 Etak Inc Method of clustering multi-dimensional related data in a computer database by combining the two verticles of a graph connected by an edge having the highest score
US5832182A (en) 1996-04-24 1998-11-03 Wisconsin Alumni Research Foundation Method and system for data clustering for very large databases
US5884305A (en) 1997-06-13 1999-03-16 International Business Machines Corporation System and method for data mining from relational data by sieving through iterated relational reinforcement

Also Published As

Publication number Publication date
EP1062590B1 (de) 2008-03-12
EP1062590A4 (de) 2004-05-12
ATE389213T1 (de) 2008-03-15
WO1999048018A1 (en) 1999-09-23
DE69938339T2 (de) 2009-03-12
US6374251B1 (en) 2002-04-16
EP1062590A1 (de) 2000-12-27

Similar Documents

Publication Publication Date Title
DE69938339D1 (de) Ein skalierbares system zum gruppieren von grossen datenbänken
McKenzie Some simple models for discrete variate time series 1
Glymour et al. TETRAD: discovering causal Structure
Jacobs The pathologies of big data
Plattner A course in in-memory data management
Brinkhoff et al. A storage and access architecture for efficient query processing in spatial database systems
DE69712635T2 (de) Zustandsbasiertes cache für antivirensoftware
CN102629269B (zh) 一种嵌入式数据库的检索及存储方法
CN103324724A (zh) 数据处理方法及装置
DE60128704D1 (de) Intelligentes modellsystem und -verfahren für öffentliche räume
CN102779138B (zh) 实时数据的硬盘存取方法
CN107016501A (zh) 一种高效的工业大数据多维分析方法
CN108595664A (zh) 一种hadoop环境下的农业数据监控方法
KR20180096780A (ko) 코어 트레이스로부터 데이터 마이닝을 하기 위한 방법 및 장치
CN109154933A (zh) 分布式数据库系统以及分布和访问数据的方法
US20230141891A1 (en) Autonomous Column Selection for Columnar Cache
SE0300353D0 (sv) Method and system for managing energy information
WO2006130768A3 (en) Transactional file system with client partitioning
FR2746526B1 (fr) Procede pour conserver une base de donnees a organisation temporelle et spatiale
Xiong et al. SZTS: A novel big data transportation system benchmark suite
Bordawekar et al. Flexible workload-aware clustering of XML documents
Cardenas et al. Modeling and analysis of data base organization: the doubly chained tree structure
Orlandic Design, analysis and applications of compact zero-complete trees.
Magalhaes IMPROVING THE PERFORMANCE OF DATA BASE SYSTEMS.
Sloan et al. Partitioning of vector-topological data for parallel gis operations: Assessment and performance analysis

Legal Events

Date Code Title Description
8364 No opposition during term of opposition