ISSN 0253-2778

CN 34-1054/N

open

A density-based hierarchical clustering algorithm of gene data based on MapReduce

  • The amount of gene expression data scale is increasing sharply with the rapid development of bio-informatics technology, which poses a serious challenge for traditional clustering algorithms. Density-based hierarchical clustering (DHC) can solve the problem of the nested class of gene expression data and has good robustness, but for handling huge amounts of data. Therefore, a density-based hierarchical clustering algorithm on MapReduce(DisDHC) was proposed. It partitioned data sets into smaller blocks, clustered each block using DHC in parallel, gathered the result for re-clustering, and produced all density centers of each cluster. The experiments on GAL dataset, Cell cycle dataset, and Serum dataset show that DisDHC reduces clustering time and achieves high performance.
  • loading

Catalog

    {{if article.pdfAccess}}
    {{if article.articleBusiness.pdfLink && article.articleBusiness.pdfLink != ''}} {{else}} {{/if}}PDF
    {{/if}}
    XML

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return