CN102968295A - Speculation thread partitioning method based on weighting control flow diagram - Google Patents

Speculation thread partitioning method based on weighting control flow diagram Download PDF

Info

Publication number
CN102968295A
CN102968295A CN2012104914563A CN201210491456A CN102968295A CN 102968295 A CN102968295 A CN 102968295A CN 2012104914563 A CN2012104914563 A CN 2012104914563A CN 201210491456 A CN201210491456 A CN 201210491456A CN 102968295 A CN102968295 A CN 102968295A
Authority
CN
China
Prior art keywords
thread
control flow
flow graph
weighting control
procedure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104914563A
Other languages
Chinese (zh)
Inventor
李川
杨洪斌
吴悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN2012104914563A priority Critical patent/CN102968295A/en
Publication of CN102968295A publication Critical patent/CN102968295A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a speculation thread partitioning method based on a weighting control flow diagram. The speculation thread partitioning method comprises the following steps: (1) the original serial program is scanned from the top down; (2) all procedure calls found out in Step (1) are processed; (3) the weighting control flow diagram T is established for the serial program processed in Step (2) according to profiling information; (4) the whole weighting control flow diagram T is traversed from the top down, all circulation zones are found out from the flow diagram T; (5) all the circulation zones found out in Step (4) are processed; and (6) executable speculation threads are extracted from a super control flow diagram F according to a staining method. According to the speculation thread partitioning method, the volumes of the threads, and control, data dependencies and memory access load balancing among the threads are taken into consideration; problems that are difficult to solve by the partitioning method are solved by a hardware technology; the partitioning method takes full advantage of resources of a computer; and the execution efficiency of the program is improved.

Description

Foresight thread division methods based on weighting control flow graph
Technical field
The present invention relates to the foresight thread division methods based on weighting control flow graph (WCFG), belong to field of computer technology.
Background technology
Along with the development of monolithic polycaryon processor (CMP), multi-core technology is all popularized in many application facet.But because present most of software all adopts serial mode, the execution that advantage therefore how to give full play to CMP is accelerated serial program has become the common problems of concern of numerous applications.
Thread-level prediction (Thread-Level Speculation, TLS) technology is in uncertain situation, create in advance and infer and carry out some threads that need to carry out in the future, carry out and corresponding prediction error detection mechanism by prediction, can remove redundant data synchronously, find and safeguard that real data are relevant, thereby reduce the difficulty of thread dividing, improve program implementation efficient, so that the serial program that accelerates to be difficult to traditionally craft or automatically parallelizing at multinuclear becomes possibility.Thread dividing method be unable to do without the support that the thread-level prediction is carried out efficiently.
The division of thread needs again to compile and the closely cooperating of architecture in addition, this is can reduce the thread dividing complexity because of some optimisation strategy by compilation process, and architecture is not only and is accepted the carrier that the thread dividing result realizes Thread-Level Parallelism, and some parameters wherein, when also being thread dividing, the delay expense that Thread control and prediction are carried out etc. determines the important evidence of heuristic rule.Simultaneously, also to take into account the volume of thread in the process of thread dividing, the control of cross-thread and data dependence, the memory access load balance of cross-thread etc.Only have volume close, control and data dependence are little, the performance of the thread competence exertion multi-threaded system of load balancing.
Summary of the invention
The object of the invention is to have problems for prior art, a kind of relatively preferably foresight thread division methods is provided---based on the foresight thread division methods of weighting control flow graph, the method can be combined closely compiling and architecture, and so that the foresight thread volume after dividing is close, control and data dependence are little, can take full advantage of the resource of polycaryon processor, improve the execution efficient of concurrent program.
In order to achieve the above object, design of the present invention is: repeatedly optimize before and after setting up weighting control flow graph, get rid of the circulation and the invocation of procedure that are not suitable for parallelization, when extracting executable candidate threads, just can obtain the very high foresight thread of degree of parallelism at last like this, whole partition process organically combines both macro and micro, can fully play the performance of monolithic multinuclear (CMP) through the foresight thread after dividing.
According to the design of invention, the present invention adopts following technical proposals:
1, the invention discloses a kind of foresight thread division methods based on weighting control flow graph, it is characterized in that operation steps is as follows:
(1) scans former serial program from top to bottom, find out the invocations of procedure all in the program;
(2) all invocations of procedure of finding out in the step (1) are processed, the undesirable invocation of procedure is inserted into directly operation in the original serial program;
(3) according to profiling information the serial program after processing in the step (2) is set up weighting control stream T;
(4) travel through whole complex weighting control flow graph T from top to bottom, find out race ways all among the figure;
(5) all loop processing to what find out in the step (4), undesirable circulation directly is inserted into directly operation in the serial program, and satisfactory circulation is summed up as the node of a single entry single exit, can form thus override flow graph F processed;
(6) from override flow graph F processed, extract executable foresight thread according to decoration method.
2, a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that all invocations of procedure to finding out in the step (1) described in the above-mentioned steps (2) process, and its concrete treatment step is as follows:
(21) volume lower limit LOWER_LIMIT and the upper limit UPPER_LIMIT thereof of regulation candidate threads, and extract all invocations of procedure.
(22) for the invocation of procedure of volume less than candidate threads volume lower limit LOWER_LIMIT, then its function body directly is inserted into directly operation in the original serial program;
(23) for the invocation of procedure of volume greater than candidate threads volume upper limit UPPER_LIMIT, then should be to its division of suitably looking forward to the prospect, can become a plurality of threads that can executed in parallel.
3, a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that described in the above-mentioned steps (5) all that travel through out in the step (4) being looped processing, and its concrete treatment step is as follows:
(51) travel through whole weighting control flow graph T from top to bottom, extract all circulations or race way;
(52) judge whether all circulations or race way are fit to parallelization, then it directly is not inserted into directly operation in the original serial program if meet.
4, a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that in the above-mentioned steps (6) that utilize decoration method to extract executable foresight thread from override flow graph F processed, its concrete treatment step is as follows:
(61) in digraph F, select summit and an output that does not have the forerunner;
(62) if the some summits among the regulation figure are chosen as thread, just adjacent with it so summit can not be re-used as thread;
(63) the deletion summit of having exported and other summit and limit adjacent with this summit;
(64) repeat above-mentioned three steps, until find out all executable candidate threads.
The present invention is based on the foresight thread division methods of WCFG compared with the prior art, have following apparent outstanding substantive distinguishing features and remarkable advantage: the method can be with compiling and the closely cooperating of architecture, taken into account the volume of thread, the control of cross-thread and data dependence, the memory access load balance of cross-thread etc.When setting up the thread describing method, considered the profiling information of program in addition, this can make thread dividing be easy in the finder to carry out the highest part of frequency and divide and optimize.And for self insoluble problem, given corresponding solution by hardware technology.
Description of drawings
By the description of following example to the foresight thread division methods that the present invention is based on WCFG in conjunction with its accompanying drawing, can further understand purpose of the present invention, specific structural features and advantage.Wherein accompanying drawing is:
Fig. 1 is based on the process flow diagram of the foresight thread division methods of WCFG.
Fig. 2 is the processing flow chart of the invocation of procedure of step 2 among Fig. 1.
Fig. 3 is the processing flow chart of the circulation of step 5 among Fig. 1.
Fig. 4 is the serial program weighting control flow graph T that step 3 is set up among Fig. 1.
What Fig. 5 represented is to control flow graph T to the derivation of the override of serial program flow graph F processed from the weighting of serial program, and the figure among Fig. 5 (b) is the flow graph F processed of the override after pushing over.
Fig. 6 is the leaching process figure that step 6 can be carried out candidate threads among Fig. 1.
Embodiment
The applied environment of the described scheme of present embodiment is the Thread-Level Parallelism execution environment of multi-core processor oriented, and present embodiment does not limit polycaryon processor framework and thread dividing mode in the described technical scheme applied environment.
The present invention is further detailed explanation below in conjunction with specification drawings and specific embodiments.
Embodiment one:
With reference to Fig. 1, this is characterized in that based on the foresight thread division methods of weighting control flow graph operation steps is as follows:
(1) scans former serial program from top to bottom, find out the invocations of procedure all in the program;
(2) all invocations of procedure of finding out in the step (1) are processed, the undesirable invocation of procedure is inserted into directly operation in the original serial program;
(3) according to profiling information the serial program after processing in the step (2) is set up weighting control stream T;
(4) travel through whole complex weighting control flow graph T from top to bottom, find out race ways all among the figure;
(5) all loop processing to what find out in the step (4), undesirable circulation directly is inserted into directly operation in the serial program, and satisfactory circulation is summed up as the node of a single entry single exit, can form thus override flow graph F processed;
(6) from override flow graph F processed, extract executable foresight thread according to decoration method.
Embodiment two:
Present embodiment and embodiment one are basic identical, and special feature is as follows:
The all invocations of procedure to finding out in the step (1) described in the above-mentioned steps (2) are processed, and its concrete treatment step is as follows:
(21) volume lower limit LOWER_LIMIT and the upper limit UPPER_LIMIT thereof of regulation candidate threads, and extract all invocations of procedure;
(22) for the invocation of procedure of volume less than candidate threads volume lower limit LOWER_LIMIT, then its function body directly is inserted into directly operation in the original serial program;
(23) for the invocation of procedure of volume greater than candidate threads volume upper limit UPPER_LIMIT, then should be to its division of suitably looking forward to the prospect, can become a plurality of threads that can executed in parallel.
3, a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that described in the above-mentioned steps (5) all that travel through out in the step (4) being looped processing, and its concrete treatment step is as follows:
(51) travel through whole weighting control flow graph T from top to bottom, extract all circulations or race way;
(52) judge whether all circulations or race way are fit to parallelization, then it directly is not inserted into directly operation in the original serial program if meet.
4, a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that in the above-mentioned steps (6) that utilize decoration method to extract executable foresight thread from override flow graph F processed, its concrete treatment step is as follows:
(61) in digraph F, select summit and an output that does not have the forerunner;
(62) if the some summits among the regulation figure are chosen as thread, just adjacent with it so summit can not be re-used as thread;
(63) the deletion summit of having exported and other summit and limit adjacent with this summit;
(64) repeat above-mentioned three steps, until find out all executable candidate threads.
Embodiment three:
With reference to Fig. 1, a kind of foresight thread division methods based on weighting control flow graph of the present invention, its concrete operation step is as follows:
Step 101, the serial program of dividing for needs are searched for, and extract the invocation of procedure of all functions.
Step 102, all invocations of procedure that extract in the step 101 are processed, the undesirable invocation of procedure is inserted into directly operation in the original serial program, concrete steps are as follows:
Step 1021, the former serial program of scanning traversal extract all invocations of procedure, and set up set P={P i| P iI invocation of procedure in the expression serial program };
Whether each invocation of procedure among step 1022, the judgement set P meets the requirements, and meets to jump to step 1023, does not meet and then jumps to step 1024;
Step 1023, for the volume of the invocation of procedure less than the thread volume LOWER_LIMIT that rolls off the production line, then be inserted into directly operation in the original serial program.
Step 1024, judge the invocation of procedure been scanned whether among the set P, do not have the invocation of procedure then to finish, otherwise directly jump to step 1022.
Step 103, to the serial program of above-mentioned processing, again according to profiling information, set up weighting control flow graph T, as shown in Figure 4.
Step 104, travel through whole weighting control flow graph T from top to bottom, search out all circulations, and set up set L={ L i| L iI circulation or i race way among the expression WCFG }.
All elements among step 105, the pair set L is processed, the circulation that is not suitable for parallelization directly is inserted into directly operation in the serial program, satisfactory circulation is summed up as one with the node of single entry single exit with it, after having scanned all elements among the set L, derive override flow graph F processed by weighting control flow graph T again, derivation as shown in Figure 5.With reference to Fig. 3, its concrete operation step is as follows:
Step 1051, travel through weighting control flow graph T from top to bottom, extract all circulations, and set up set L={ L i| L iI circulation or i race way among the expression WCFG };
Step 1052, judge whether each circulation among the L meets the requirement of parallelization, does not meet and then jumps to step 1053, and satisfactory circulation then jumps to step 1054;
Step 1053, then it directly is inserted in the serial program directly operation for undesirable circulation, as shown in Figure 4, zone 1 is undesirable, then it directly is inserted in the serial program, and it is deleted from T, jumps to afterwards step 1055 again;
Step 1054, then it is summed up as a node with single entry single exit for satisfactory circulation, as shown in Figure 5, zone 2 is satisfactory circulations, then its end can be able to be had the Node B of the single entrance of single outlet, jumps to afterwards step 1055 again;
Whether the circulation or the zone that are still waiting to process among step 1055, the judgement set L then do not jump to step 1056 if do not exist, and then jump to step 1052 if exist;
After step 1056, process above-mentioned steps are processed, then can set up a new override flow graph F processed, shown in the figure among Fig. 5 (b).
Step 106, according to decoration method, from override flow graph F processed, extract final executable foresight thread.With reference to Fig. 6, its concrete operation step is as follows:
Figure (b) among figure among step 1061, Fig. 6 (a) and Fig. 5 is same figure, utilize decoration method, we can at first select one not have forerunner's summit also to export (if a summit among the figure is chosen as thread, adjacent with it so summit just can not be re-used as thread), from figure (a), delete again afterwards summit and the limit adjacent with this summit, can form new figure figure (b).
Step 1062, from figure (b), again select a summit 8 that does not have the forerunner, adjacent summit and the limit in deletion and summit 8 from figure (b) again can form new figure figure (c) afterwards.
Step 1063, from figure (c), again select a summit 9 that does not have the forerunner, adjacent summit and the limit in deletion and summit 9 from figure (c) again can form new figure figure (d) afterwards.
Step 1064, from figure (d), again select a summit 15 that does not have the forerunner, deletion summit and the limit adjacent with summit 15 from figure (d) more so far can form one and can carry out foresight thread S set T={1 afterwards, 8,9,15} divides end.
More than the front thread dividing method based on WCFG is described in detail, this paper sets forth in conjunction with Figure of description and specific embodiment and just is used for helping to understand core concept of the present invention; For one of ordinary skill in the art, according to method of the present invention and thought, all will change on embodiment and range of application simultaneously, in sum, this description should not be construed as limitation of the present invention.

Claims (4)

1. based on the foresight thread division methods of weighting control flow graph, it is characterized in that operation steps is as follows:
(1) scans former serial program from top to bottom, find out the invocations of procedure all in the program;
(2) all invocations of procedure of finding out in the step (1) are processed, the undesirable invocation of procedure is inserted into directly operation in the original serial program;
(3) according to profiling information the serial program after processing in the step (2) is set up weighting control stream T;
(4) travel through whole complex weighting control flow graph T from top to bottom, find out race ways all among the figure;
(5) all loop processing to what find out in the step (4), undesirable circulation directly is inserted into directly operation in the serial program, and satisfactory circulation is summed up as the node of a single entry single exit, can form thus override flow graph F processed;
(6) from override flow graph F processed, extract executable foresight thread according to decoration method.
2. a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that all invocations of procedure to finding out in the step (1) described in the above-mentioned steps (2) process, and its concrete treatment step is as follows:
(21) volume lower limit LOWER_LIMIT and the upper limit UPPER_LIMIT thereof of regulation candidate threads, and extract all invocations of procedure;
(22) for the invocation of procedure of volume less than candidate threads volume lower limit LOWER_LIMIT, then its function body directly is inserted into directly operation in the original serial program;
(23) for the invocation of procedure of volume greater than candidate threads volume upper limit UPPER_LIMIT, then should be to its division of suitably looking forward to the prospect, can become a plurality of threads that can executed in parallel.
3. a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that described in the above-mentioned steps (5) all that travel through out in the step (4) being looped processing, and its concrete treatment step is as follows:
(51) travel through whole weighting control flow graph T from top to bottom, extract all circulations or race way;
(52) judge whether all circulations or race way are fit to parallelization, then it directly is not inserted into directly operation in the original serial program if meet.
4. a kind of foresight thread division methods based on weighting control flow graph according to claim 1 is characterized in that in the above-mentioned steps (6) that utilize decoration method to extract executable foresight thread from override flow graph F processed, its concrete treatment step is as follows:
(61) in digraph F, select summit and an output that does not have the forerunner;
(62) if the some summits among the regulation figure are chosen as thread, just adjacent with it so summit can not be re-used as thread;
(63) the deletion summit of having exported and other summit and limit adjacent with this summit;
(64) repeat above-mentioned three steps, until find out all executable candidate threads.
CN2012104914563A 2012-11-28 2012-11-28 Speculation thread partitioning method based on weighting control flow diagram Pending CN102968295A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104914563A CN102968295A (en) 2012-11-28 2012-11-28 Speculation thread partitioning method based on weighting control flow diagram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104914563A CN102968295A (en) 2012-11-28 2012-11-28 Speculation thread partitioning method based on weighting control flow diagram

Publications (1)

Publication Number Publication Date
CN102968295A true CN102968295A (en) 2013-03-13

Family

ID=47798454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104914563A Pending CN102968295A (en) 2012-11-28 2012-11-28 Speculation thread partitioning method based on weighting control flow diagram

Country Status (1)

Country Link
CN (1) CN102968295A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699365A (en) * 2014-01-07 2014-04-02 西南科技大学 Thread division method for avoiding unrelated dependence on many-core processor structure
CN104699464A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 Dependency mesh based instruction-level parallel scheduling method
CN110069347A (en) * 2019-04-29 2019-07-30 河南科技大学 A kind of thread dividing method of Kernel-based methods different degree
CN111459633A (en) * 2020-03-30 2020-07-28 河南科技大学 Irregular program-oriented self-adaptive thread partitioning method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030226133A1 (en) * 2002-05-30 2003-12-04 Microsoft Corporation System and method for improving a working set
CN1906578A (en) * 2003-11-14 2007-01-31 英特尔公司 Apparatus and method for an automatic thread-partition compiler
CN102063291A (en) * 2011-01-13 2011-05-18 上海大学 Multilevel parallel execution method of speculation thread

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030226133A1 (en) * 2002-05-30 2003-12-04 Microsoft Corporation System and method for improving a working set
CN1906578A (en) * 2003-11-14 2007-01-31 英特尔公司 Apparatus and method for an automatic thread-partition compiler
CN102063291A (en) * 2011-01-13 2011-05-18 上海大学 Multilevel parallel execution method of speculation thread

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699365A (en) * 2014-01-07 2014-04-02 西南科技大学 Thread division method for avoiding unrelated dependence on many-core processor structure
CN103699365B (en) * 2014-01-07 2016-10-05 西南科技大学 The thread dividing method of unrelated dependence is avoided in a kind of many-core processor structure
CN104699464A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 Dependency mesh based instruction-level parallel scheduling method
CN104699464B (en) * 2015-03-26 2017-12-26 中国人民解放军国防科学技术大学 A kind of instruction level parallelism dispatching method based on dependence grid
CN110069347A (en) * 2019-04-29 2019-07-30 河南科技大学 A kind of thread dividing method of Kernel-based methods different degree
CN110069347B (en) * 2019-04-29 2022-10-25 河南科技大学 Thread dividing method based on process importance
CN111459633A (en) * 2020-03-30 2020-07-28 河南科技大学 Irregular program-oriented self-adaptive thread partitioning method
CN111459633B (en) * 2020-03-30 2023-04-11 河南科技大学 Irregular program-oriented self-adaptive thread partitioning method

Similar Documents

Publication Publication Date Title
Nykiel et al. MRShare: sharing across multiple queries in MapReduce
CN107329828B (en) A kind of data flow programmed method and system towards CPU/GPU isomeric group
CN107247628B (en) Data flow program task dividing and scheduling method for multi-core system
Li et al. Research on clustering algorithm and its parallelization strategy
CN103970580A (en) Data flow compilation optimization method oriented to multi-core cluster
Gent et al. A preliminary review of literature on parallel constraint solving
CN102968295A (en) Speculation thread partitioning method based on weighting control flow diagram
CN101655783B (en) Forward-looking multithreading partitioning method
Anantpur et al. Runtime dependence computation and execution of loops on heterogeneous systems
CN113822173A (en) Pedestrian attribute recognition training acceleration method based on node merging and path prediction
Żurek et al. The comparison of parallel sorting algorithms implemented on different hardware platforms
Traoré et al. Deque-free work-optimal parallel STL algorithms
Li et al. Parallel computing: review and perspective
Chandrashekhar et al. Performance study of OpenMP and hybrid programming models on CPU–GPU cluster
Luo et al. Heterogeneity-aware asynchronous decentralized training
CN112559032B (en) Many-core program reconstruction method based on circulation segment
Wu et al. Exploiting more parallelism from applications having generalized reductions on GPU architectures
Paudel et al. Parallelization of plane sweep based voronoi construction with compiler directives
Zhao et al. Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their {Domain-Specific} Accelerators
Li et al. Tpaopi: a thread partitioning approach based on procedure importance in speculative multithreading
Agullo et al. Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method
Sasak-Okoń et al. Applying distributed application global states monitoring to speculative query processing in RDBMS
Nie et al. Parallel Region Reconstruction Technique for Sunway High-Performance Multi-core Processors
Chen et al. A static scheduling scheme of multicore compiler for loop load imbalance in OpenMP
Li et al. OpenMP automatic translation framework for Sunway TaihuLight

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C05 Deemed withdrawal (patent law before 1993)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130313