About Maurice Tutor

Levels Tought:
Elementary,Middle School,High School,College,University,PHD

Expertise:

Algebra,Applied Sciences See all

Algebra,Applied Sciences,Biology,Calculus,Chemistry,Economics,English,Essay writing,Geography,Geology,Health & Medical,Physics,Science Hide all

Teaching Since:	May 2017
Last Sign in:	399 Weeks Ago, 1 Day Ago
Questions Answered:	66690
Tutorials Posted:	66688

Education

MCS,PHD
Argosy University/ Phoniex University/
Nov-2005 - Oct-2011

Experience

Professor
Phoniex University
Oct-2001 - Nov-2016

Category > Computer Science Posted 22 Sep 2017 My Price 8.00

abstract version

suppose we wish to write a procedure that computes the inner product of two vectors. An abstract version of the function has a CPE of 54 for both integer and floating point data.

void inner4(vec_ptr u, vec_ptr v, data t *dest)
{
int i;
int length = vec_length(u);
data_t *udata = get_vec_start(u);
data_t *vdata = get_vec_start(v);
data_t sum = (data_t) 0;

for (i = 0; i
sum = sum + udata[i] * vdata[i];
}

*dest = sum;
}

Our measurements show that this fucntion requires 3.11 cycles per iteration for integer data. The assembly code for the inner loop is as follows:

.L24:
movl (%esi,%edx,4),%eax Get udate[i]
imull (%ebx,%edx,4),%eax Multiply by vdata[i]
addl %eax,%ecx Add to sum
incl %edx i++
cmpl %edi,%edx Compare i:length
jl .L24 If

Assume that integer multiplcation is performed by the general integer functional unit and that this unit is pipelined. This means that one cycle after a multiplication has started, a new integer operation (multiplication or otherwise) can begin. Assume also that the Integer/Branch function unit can perform simple integer operations.

A) show the translation of these lines of assembly code into a sequence of operations. The movl instruction translates into a single load operation. Register %eax gets updated twice in the loop. Label the different versions %eax.1a and %eax.1b.

B) Explain how the function can go faster than the number of cycles required for integer muiltiplication.

C) Explain what factor limits the performance of this code to at best a CPE of 2.5.

D) For floating-point data, we get a CPE of 3.5. Without needing to examine the assembly code, describe a factor that will limit the performance to at best 3 cycles per iteration.

---------------------------------------------------

Write a version of the inner product procedure described in the previous problem that uses four-way loop unrolling.
Our measurement for this procedure gives a CPE of 2.20 for integer data and 3.50 for floating point.

A) explain why any version of any inner product procedure cannoy achieve a CPE greater than 2.
B) Explain why the performance for floating point did not improve with loop unrolling.

-------------------------------------------------------

Write a version of the inner product procedure described in the first problem that uses four-way loop unrolling and two-way parrallelism.
Our measurements for this procedure give a CPE of 2.25 for floating-point data. Describe two factors that limit the performance to a CPE of at best 2.0

Answers

Maurice Tutor

(5)

Status NEW Posted 22 Sep 2017 08:09 PM My Price 8.00

Hel-----------lo -----------Sir-----------/Ma-----------dam-----------Tha-----------nk -----------You----------- fo-----------r u-----------sin-----------g o-----------ur -----------web-----------sit-----------e a-----------nd -----------and----------- ac-----------qui-----------sit-----------ion----------- of----------- my----------- po-----------ste-----------d s-----------olu-----------tio-----------n.P-----------lea-----------se -----------pin-----------g m-----------e o-----------n c-----------hat----------- I -----------am -----------onl-----------ine----------- or----------- in-----------box----------- me----------- a -----------mes-----------sag-----------e I----------- wi-----------ll

Not Rated(0)

Buy Answer

Hire Dedicated Virtual Team / Business Solution for SMEs.