ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Research on stream program task scheduling and cache optimization for X86 multi-core processor

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2016.03.004
  • Received Date: 27 August 2015
  • Accepted Date: 01 December 2015
  • Rev Recd Date: 01 December 2015
  • Publish Date: 30 March 2016
  • Stream programming as a kind of programming paradigm widely used in multi-core systems, in which the parallel scheduling and access latency of main memory has a great influence on the performance of the program. To solve this problem, a task scheduling and cache optimization method was proposed for the stream program by combining the characteristics of X86 multi-core processors. To improve the parallel scheduling granularity and locality of the target program, expanded scheduling in the pre-processing phase was used firstly. Then the compiler exploitsed the data parallelism and task parallelism to keep load balancing and construct the corresponding software pipeline scheduling. According to cache hierarchies of the target system, the interference were reduced among parallel scheduling cores were reduced by eliminating cache false sharing, and the optimized mapping between logic threads and physical cores in line with the inter-thread communication distribution were implemented. The compiler took a COStream dataflow program as input and outputs the optimized program. The common algorithms in media processing technology were chosen as the test programs for the experiment. The experimental result shows that the optimized test programs achieve linear speedup, which indicates the effectiveness of the compiling optimization system.
    Stream programming as a kind of programming paradigm widely used in multi-core systems, in which the parallel scheduling and access latency of main memory has a great influence on the performance of the program. To solve this problem, a task scheduling and cache optimization method was proposed for the stream program by combining the characteristics of X86 multi-core processors. To improve the parallel scheduling granularity and locality of the target program, expanded scheduling in the pre-processing phase was used firstly. Then the compiler exploitsed the data parallelism and task parallelism to keep load balancing and construct the corresponding software pipeline scheduling. According to cache hierarchies of the target system, the interference were reduced among parallel scheduling cores were reduced by eliminating cache false sharing, and the optimized mapping between logic threads and physical cores in line with the inter-thread communication distribution were implemented. The compiler took a COStream dataflow program as input and outputs the optimized program. The common algorithms in media processing technology were chosen as the test programs for the experiment. The experimental result shows that the optimized test programs achieve linear speedup, which indicates the effectiveness of the compiling optimization system.
  • loading
  • [1]
    THIES W, KARCZMAREK M, AMARASINGHE S. StreamIt: A language for streaming applications[C]// Proceedings of the 11th International Conference on Compiler Construction. London, UK: ACM Press, 2002: 179-196.
    [2]
    BUCK I, FOLEY T, HOM D, et al. Brook for GPUs: Stream computing on graphics hardware[J]. ACM Transactions on Graphics, 2004, 23(3): 777-786.
    [3]
    MARK W R, GLANVILLE R S, AKELEY K, et al. Cg: A system for programming graphics hardware in a C-like language[J]. ACM Transactions on Graphics, 2003, 22(3): 896-907.
    [4]
    KUDLUR M, MAHLKE S. Orchestrating the execution of stream programs on multicore platforms[J]. ACM SIGPLAN Notices, 2008, 43(6): 114-124.
    [5]
    GORDON M, THIES W, AMARASINGHE S. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs[C]// Proceedings of 14th International Conference on Architectural Support for Programming Languages and Operating Systems. San Jose, USA: ACM Press, 2006: 151-162.
    [6]
    ZHURAVLEV S, BLAGODUROV S, FEDOROVA A. Addressing shared resource contention in multicore processors via scheduling[C]// Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems. New York, USA: ACM Press, 2010: 129-142.
    [7]
    张维维,魏海涛,于俊清, 等. COstream:一种面向数据流的编程语言和编译器实现[J].计算机学报,2013,36(10): 1993-2006.
    ZHANG Weiwei, WEI Haitao, YU Junqing, et al. COstream: A language for datafloe application and complier[J]. Chinese Journal of Computers, 2013, 36(10): 1993-2006.
    [8]
    THIELE L, BACIVAROV I, HAID W, et al. Mapping applications to tiled multiprocessor embedded systems[C]// Proceedings of the 7th Application of Concurrency to System Design. Bratislava, Slovak: ACM Press, 2007: 29-40.
    [9]
    ABOU-RJEILI A, KARYPIS G. Multilevel algorithms for partitioning power-law graphs[C]// Proceedings of 20th International Parallel and Distributed Processing Symposium. Washington USA: IEEE Press, 2006: 1-10.
    [10]
    MOLKA D, HACKENBERG D, SCHONE R, et al. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system[C]// Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. Raleigh, USA: IEEE Press, 2009: 261-270.
    [11]
    TORRELLAS J, LAM H S, HENNESSY J L. False sharing and spatial locality in multiprocessor caches[J]. IEEE Transactions on Computers, 1994, 43(6): 651-663.
    [12]
    MELLOR-CRUMMEY J, SCOTT M. L Algorithms for scalable synchronization on shared-memory multiprocessors[J]. ACM Transactions on Computer Systems, 1991, 9(1): 21-65.)
  • 加载中

Catalog

    [1]
    THIES W, KARCZMAREK M, AMARASINGHE S. StreamIt: A language for streaming applications[C]// Proceedings of the 11th International Conference on Compiler Construction. London, UK: ACM Press, 2002: 179-196.
    [2]
    BUCK I, FOLEY T, HOM D, et al. Brook for GPUs: Stream computing on graphics hardware[J]. ACM Transactions on Graphics, 2004, 23(3): 777-786.
    [3]
    MARK W R, GLANVILLE R S, AKELEY K, et al. Cg: A system for programming graphics hardware in a C-like language[J]. ACM Transactions on Graphics, 2003, 22(3): 896-907.
    [4]
    KUDLUR M, MAHLKE S. Orchestrating the execution of stream programs on multicore platforms[J]. ACM SIGPLAN Notices, 2008, 43(6): 114-124.
    [5]
    GORDON M, THIES W, AMARASINGHE S. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs[C]// Proceedings of 14th International Conference on Architectural Support for Programming Languages and Operating Systems. San Jose, USA: ACM Press, 2006: 151-162.
    [6]
    ZHURAVLEV S, BLAGODUROV S, FEDOROVA A. Addressing shared resource contention in multicore processors via scheduling[C]// Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems. New York, USA: ACM Press, 2010: 129-142.
    [7]
    张维维,魏海涛,于俊清, 等. COstream:一种面向数据流的编程语言和编译器实现[J].计算机学报,2013,36(10): 1993-2006.
    ZHANG Weiwei, WEI Haitao, YU Junqing, et al. COstream: A language for datafloe application and complier[J]. Chinese Journal of Computers, 2013, 36(10): 1993-2006.
    [8]
    THIELE L, BACIVAROV I, HAID W, et al. Mapping applications to tiled multiprocessor embedded systems[C]// Proceedings of the 7th Application of Concurrency to System Design. Bratislava, Slovak: ACM Press, 2007: 29-40.
    [9]
    ABOU-RJEILI A, KARYPIS G. Multilevel algorithms for partitioning power-law graphs[C]// Proceedings of 20th International Parallel and Distributed Processing Symposium. Washington USA: IEEE Press, 2006: 1-10.
    [10]
    MOLKA D, HACKENBERG D, SCHONE R, et al. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system[C]// Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. Raleigh, USA: IEEE Press, 2009: 261-270.
    [11]
    TORRELLAS J, LAM H S, HENNESSY J L. False sharing and spatial locality in multiprocessor caches[J]. IEEE Transactions on Computers, 1994, 43(6): 651-663.
    [12]
    MELLOR-CRUMMEY J, SCOTT M. L Algorithms for scalable synchronization on shared-memory multiprocessors[J]. ACM Transactions on Computer Systems, 1991, 9(1): 21-65.)

    Article Metrics

    Article views (31) PDF downloads(80)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return