Instr Sched 1
-
Upload
dilip-thelip -
Category
Documents
-
view
228 -
download
0
Transcript of Instr Sched 1
-
7/28/2019 Instr Sched 1
1/20
Instruction Scheduling - Part 1
Y.N. Srikant
Department of Computer ScienceIndian Institute of Science
Bangalore 560 012
NPTEL Course on Compiler Design
Y.N. Srikant Instruction Scheduling
http://find/ -
7/28/2019 Instr Sched 1
2/20
Outline
Instruction Scheduling
Simple Basic Block Scheduling
Automaton-based SchedulingInteger programming based schedulingOptimal Delayed-load Scheduling (DLS) for treesTrace, Superblock and Hyperblock scheduling
Y.N. Srikant Instruction Scheduling
http://find/ -
7/28/2019 Instr Sched 1
3/20
Instruction Scheduling
Reordering of instructions so as to keep the pipelines of
functional units full with no stalls
NP-Complete and needs heuristcs
Applied on basic blocks (local)
Global scheduling requires elongation of basic blocks
(super-blocks)
Y.N. Srikant Instruction Scheduling
http://find/ -
7/28/2019 Instr Sched 1
4/20
Instruction Scheduling - Motivating Example
time: load - 2 cycles, op - 1 cycle
This code has 2 stalls, at i3 and at i5,
due to the loads
i1 i2 i4
i3
i5
i7
i6
load load load
add
sub
st
mult
Y.N. Srikant Instruction Scheduling
http://goforward/http://find/http://goback/ -
7/28/2019 Instr Sched 1
5/20
Scheduled Code - no stalls
There are no stalls, but dependences are indeed satisfied
Y.N. Srikant Instruction Scheduling
http://find/http://goback/ -
7/28/2019 Instr Sched 1
6/20
Definitions - Dependences
Consider the following code:
i1 : r1 load(r2)i2 : r3 r1 + 4i3 : r1 r4 + r5
The dependences are
i1 i2 (flow dependence) i2 i3 (anti-dependence)
i1 o i3 (output dependence)
anti- and ouput dependences can be eliminated by register
renaming
Y.N. Srikant Instruction Scheduling
G
http://find/ -
7/28/2019 Instr Sched 1
7/20
Dependence DAG
full line: flowdependence, dash line: anti-dependence
dash-dot line: outputdependence
some anti- and output dependences are because memorydisambiguation could not be done
st
add
mult st
add
ldld
add sub
i1
i3 i4
i7
i8
i6
i2
i5
i9
Y.N. Srikant Instruction Scheduling
B i Bl k S h d li
http://find/ -
7/28/2019 Instr Sched 1
8/20
Basic Block Scheduling
Basic block consists of micro-operation sequences (MOS),
which are indivisible
Each MOS has several steps, each requiring resources
Each step of an MOS requires one cycle for executionPrecedence constraints and resource constraints must besatisfied by the scheduled program
PCs relate to data dependences and execution delaysRCs relate to limited availability of shared resources
Y.N. Srikant Instruction Scheduling
Th B i Bl k S h d li P bl
http://find/ -
7/28/2019 Instr Sched 1
9/20
The Basic Block Scheduling Problem
Basic block is modelled as a digraph, G = (V, E)
R: number of resourcesNodes (V): MOS; Edges (E): PrecedenceLabel on node v
resource usage functions, v(i) for each step of the MOSassociated with v
length l(v) of node vLabel on edge e: Execution delay of the MOS, d(e)
Problem: Find the shortest schedule : V N such thate= (u, v) E, (v) (u) d(e) and
i,
vVv(i (v)) R, where
length of the schedule is maxvV
{(v) + l(v)}
Y.N. Srikant Instruction Scheduling
I t ti S h d li P d d R
http://goforward/http://find/http://goback/ -
7/28/2019 Instr Sched 1
10/20
Instruction Scheduling - Precedence and Resource
Constraints
Y.N. Srikant Instruction Scheduling
A Si l Li t S h d li Al ith
http://find/ -
7/28/2019 Instr Sched 1
11/20
A Simple List Scheduling Algorithm
Find the shortest schedule : V N, such that precedenceand resource constraints are satisfied. Holes are filled with
NOPs.FUNCTION ListSchedule (V,E)
BEGIN
Ready = root nodes of V; Schedule= ;
WHILE Ready= DO
BEGINv = highest priority node in Ready;
Lb = SatisfyPrecedenceConstraints (v, Schedule, );(v) = SatisfyResourceConstraints (v, Schedule, , Lb);
Schedule = Schedule + {v};Ready = Ready {v} + {u | NOT (u Schedule)AND (w, u) E, w Schedule};
END
RETURN ;
ENDY.N. Srikant Instruction Scheduling
List Scheduling Ready Queue Update
http://find/ -
7/28/2019 Instr Sched 1
12/20
List Scheduling - Ready Queue Update
Y.N. Srikant Instruction Scheduling
Constraint Satisfaction Functions
http://find/ -
7/28/2019 Instr Sched 1
13/20
Constraint Satisfaction Functions
FUNCTION SatisfyPrecedenceConstraint(v, Sched, )
BEGINRETURN ( max
uSched(u) + d(u, v))
END
FUNCTION SatisfyResourceConstraint(v, Sched, , Lb)BEGIN
FOR i := Lb TO DO
IF 0 j < l(v), v(j) +uSched
u(i + j (u)) R THEN
RETURN (i);END
Y.N. Srikant Instruction Scheduling
Precedence Constraint Satisfaction
http://find/ -
7/28/2019 Instr Sched 1
14/20
Precedence Constraint Satisfaction
Y.N. Srikant Instruction Scheduling
http://find/ -
7/28/2019 Instr Sched 1
15/20
List Scheduling - Priority Ordering for Nodes
-
7/28/2019 Instr Sched 1
16/20
List Scheduling - Priority Ordering for Nodes
1 Height of the node in the DAG (i.e., longest path from the
node to a terminal node2 Estart, and Lstart, the earliest and latest start times
Violating Estartand Lstartmay result in pipeline stallsEstart(v) = max
i=1, ,k(Estart(ui) + d(ui, v))
where u1, u2, , uk are predecessors of v. Estart value ofthe source node is 0.Lstart(u) = min
i=1, ,k(Lstart(vi) d(u, vi))
where v1, v2, , vk are successors of u. Lstart value of thesink node is set as its Estart value.
Estart and Lstart values can be computed using atop-down and a bottom-up pass, respectively, eitherstatically (before scheduling begins), or dynamically duringscheduling
Y.N. Srikant Instruction Scheduling
Estart and Lstart Computation
http://find/http://goback/ -
7/28/2019 Instr Sched 1
17/20
Estart and Lstart Computation
Y.N. Srikant Instruction Scheduling
List Scheduling - Slack
http://find/ -
7/28/2019 Instr Sched 1
18/20
List Scheduling Slack
1 A node with a lower Estart (or Lstart) value has a higher
priority2 Slack
=Lstart
Estart
Nodes with lower slack are given higher priorityInstructions on the critical path may have a slack value ofzero and hence get priority
Y.N. Srikant Instruction Scheduling
http://find/ -
7/28/2019 Instr Sched 1
19/20
Simple List Scheduling - Example - 2
-
7/28/2019 Instr Sched 1
20/20
Simple List Scheduling Example 2
latenciesadd,sub,store: 1 cycle; load: 2 cycles; mult: 3 cycles
path lengthand slackare shown on the left side and right
side of the pair of numbers in parentheses
ld
sub
mult st
add
st
add
ld
add
5(3, 3)0
6(2, 2)0
8(0, 0)0
3(2, 5)3 1(2, 7)5
7(0, 1)1
0(8, 8)0
2(6, 6)0
1(7, 7)0
i1
i3 i4
i7
i8
i6
i2
i5
i9
Y.N. Srikant Instruction Scheduling
http://find/