Research on stream program task scheduling and cache optimization for X86 multi-core processor

TANG Jiufei; LI He; YU Junqing

doi:10.3969/j.issn.0253-2778.2016.03.004

PDF( 1946 KB)

Open Access JUSTC Original Paper

Research on stream program task scheduling and cache optimization for X86 multi-core processor

1.
Center of Network and Computation, Huazhong University of Science and Technology, Wuhan 430074, China
2.
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2016.03.004

Received Date: 27 August 2015
Accepted Date: 01 December 2015
Rev Recd Date: 01 December 2015
Publish Date: 30 March 2016

Abstract Full text PDF

Abstract

Abstract

Stream programming as a kind of programming paradigm widely used in multi-core systems, in which the parallel scheduling and access latency of main memory has a great influence on the performance of the program. To solve this problem, a task scheduling and cache optimization method was proposed for the stream program by combining the characteristics of X86 multi-core processors. To improve the parallel scheduling granularity and locality of the target program, expanded scheduling in the pre-processing phase was used firstly. Then the compiler exploitsed the data parallelism and task parallelism to keep load balancing and construct the corresponding software pipeline scheduling. According to cache hierarchies of the target system, the interference were reduced among parallel scheduling cores were reduced by eliminating cache false sharing, and the optimized mapping between logic threads and physical cores in line with the inter-thread communication distribution were implemented. The compiler took a COStream dataflow program as input and outputs the optimized program. The common algorithms in media processing technology were chosen as the test programs for the experiment. The experimental result shows that the optimized test programs achieve linear speedup, which indicates the effectiveness of the compiling optimization system.

Abstract

Stream programming as a kind of programming paradigm widely used in multi-core systems, in which the parallel scheduling and access latency of main memory has a great influence on the performance of the program. To solve this problem, a task scheduling and cache optimization method was proposed for the stream program by combining the characteristics of X86 multi-core processors. To improve the parallel scheduling granularity and locality of the target program, expanded scheduling in the pre-processing phase was used firstly. Then the compiler exploitsed the data parallelism and task parallelism to keep load balancing and construct the corresponding software pipeline scheduling. According to cache hierarchies of the target system, the interference were reduced among parallel scheduling cores were reduced by eliminating cache false sharing, and the optimized mapping between logic threads and physical cores in line with the inter-thread communication distribution were implemented. The compiler took a COStream dataflow program as input and outputs the optimized program. The common algorithms in media processing technology were chosen as the test programs for the experiment. The experimental result shows that the optimized test programs achieve linear speedup, which indicates the effectiveness of the compiling optimization system.

FullText(HTML)

References(12)

References

[1]	THIES W, KARCZMAREK M, AMARASINGHE S. StreamIt: A language for streaming applications[C]// Proceedings of the 11th International Conference on Compiler Construction. London, UK: ACM Press, 2002: 179-196.
[2]	BUCK I, FOLEY T, HOM D, et al. Brook for GPUs: Stream computing on graphics hardware[J]. ACM Transactions on Graphics, 2004, 23(3): 777-786.
[3]	MARK W R, GLANVILLE R S, AKELEY K, et al. Cg: A system for programming graphics hardware in a C-like language[J]. ACM Transactions on Graphics, 2003, 22(3): 896-907.
[4]	KUDLUR M, MAHLKE S. Orchestrating the execution of stream programs on multicore platforms[J]. ACM SIGPLAN Notices, 2008, 43(6): 114-124.
[5]	GORDON M, THIES W, AMARASINGHE S. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs[C]// Proceedings of 14th International Conference on Architectural Support for Programming Languages and Operating Systems. San Jose, USA: ACM Press, 2006: 151-162.
[6]	ZHURAVLEV S, BLAGODUROV S, FEDOROVA A. Addressing shared resource contention in multicore processors via scheduling[C]// Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems. New York, USA: ACM Press, 2010: 129-142.
[7]	张维维，魏海涛，于俊清, 等. COstream：一种面向数据流的编程语言和编译器实现[J].计算机学报，2013，36(10): 1993-2006. ZHANG Weiwei, WEI Haitao, YU Junqing, et al. COstream: A language for datafloe application and complier[J]. Chinese Journal of Computers, 2013, 36(10): 1993-2006.
[8]	THIELE L, BACIVAROV I, HAID W, et al. Mapping applications to tiled multiprocessor embedded systems[C]// Proceedings of the 7th Application of Concurrency to System Design. Bratislava, Slovak: ACM Press, 2007: 29-40.
[9]	ABOU-RJEILI A, KARYPIS G. Multilevel algorithms for partitioning power-law graphs[C]// Proceedings of 20th International Parallel and Distributed Processing Symposium. Washington USA: IEEE Press, 2006: 1-10.
[10]	MOLKA D, HACKENBERG D, SCHONE R, et al. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system[C]// Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. Raleigh, USA: IEEE Press, 2009: 261-270.
[11]	TORRELLAS J, LAM H S, HENNESSY J L. False sharing and spatial locality in multiprocessor caches[J]. IEEE Transactions on Computers, 1994, 43(6): 651-663.
[12]	MELLOR-CRUMMEY J, SCOTT M. L Algorithms for scalable synchronization on shared-memory multiprocessors[J]. ACM Transactions on Computer Systems, 1991, 9(1): 21-65.)

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	THIES W, KARCZMAREK M, AMARASINGHE S. StreamIt: A language for streaming applications[C]// Proceedings of the 11th International Conference on Compiler Construction. London, UK: ACM Press, 2002: 179-196.
[2]	BUCK I, FOLEY T, HOM D, et al. Brook for GPUs: Stream computing on graphics hardware[J]. ACM Transactions on Graphics, 2004, 23(3): 777-786.
[3]	MARK W R, GLANVILLE R S, AKELEY K, et al. Cg: A system for programming graphics hardware in a C-like language[J]. ACM Transactions on Graphics, 2003, 22(3): 896-907.
[4]	KUDLUR M, MAHLKE S. Orchestrating the execution of stream programs on multicore platforms[J]. ACM SIGPLAN Notices, 2008, 43(6): 114-124.
[5]	GORDON M, THIES W, AMARASINGHE S. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs[C]// Proceedings of 14th International Conference on Architectural Support for Programming Languages and Operating Systems. San Jose, USA: ACM Press, 2006: 151-162.
[6]	ZHURAVLEV S, BLAGODUROV S, FEDOROVA A. Addressing shared resource contention in multicore processors via scheduling[C]// Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems. New York, USA: ACM Press, 2010: 129-142.
[7]	张维维，魏海涛，于俊清, 等. COstream：一种面向数据流的编程语言和编译器实现[J].计算机学报，2013，36(10): 1993-2006. ZHANG Weiwei, WEI Haitao, YU Junqing, et al. COstream: A language for datafloe application and complier[J]. Chinese Journal of Computers, 2013, 36(10): 1993-2006.
[8]	THIELE L, BACIVAROV I, HAID W, et al. Mapping applications to tiled multiprocessor embedded systems[C]// Proceedings of the 7th Application of Concurrency to System Design. Bratislava, Slovak: ACM Press, 2007: 29-40.
[9]	ABOU-RJEILI A, KARYPIS G. Multilevel algorithms for partitioning power-law graphs[C]// Proceedings of 20th International Parallel and Distributed Processing Symposium. Washington USA: IEEE Press, 2006: 1-10.
[10]	MOLKA D, HACKENBERG D, SCHONE R, et al. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system[C]// Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. Raleigh, USA: IEEE Press, 2009: 261-270.
[11]	TORRELLAS J, LAM H S, HENNESSY J L. False sharing and spatial locality in multiprocessor caches[J]. IEEE Transactions on Computers, 1994, 43(6): 651-663.
[12]	MELLOR-CRUMMEY J, SCOTT M. L Algorithms for scalable synchronization on shared-memory multiprocessors[J]. ACM Transactions on Computer Systems, 1991, 9(1): 21-65.)

TrendMD

Volume 46 Issue 3 page: 200-207

Cover

Keywords

Article Metrics

Article views (31) PDF downloads(80)

Research on stream program task scheduling and cache optimization for X86 multi-core processor

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Research on stream program task scheduling and cache optimization for X86 multi-core processor

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content