ComputerScienceExpert

(11)

$18/per page/

About ComputerScienceExpert

Levels Tought:
Elementary,Middle School,High School,College,University,PHD

Expertise:
Applied Sciences,Calculus See all
Applied Sciences,Calculus,Chemistry,Computer Science,Environmental science,Information Systems,Science Hide all
Teaching Since: Apr 2017
Last Sign in: 103 Weeks Ago, 3 Days Ago
Questions Answered: 4870
Tutorials Posted: 4863

Education

  • MBA IT, Mater in Science and Technology
    Devry
    Jul-1996 - Jul-2000

Experience

  • Professor
    Devry University
    Mar-2010 - Oct-2016

Category > Programming Posted 04 May 2017 My Price 9.00

intensive application A runs on computer C.

1 )Bandwidth-intensive application A runs on computer C. A’s performance in ops/s on C equals the interconnection network’s bisection bandwidth in words/s. During execution, the processor aggregate dissipates 25 MWs, the memory aggregate dissipates 20 MWs, and the interconnection network dissi- pates a whopping 160 MWs. As a rule, for this network, power grows as (bisection bandwidth)^1.5. Appalled by the power consumption, the designers build computer D with half of C’s bisection band- width. By what factor has D improved on C’s energy efficiency, i.e., its figure of merit ops/J? K = 2^10

2)The 'M' machine is a memory-memory architecture.  For example, its floating-point multiply instruction is: 'mmul.d a,b,c', meaning "take the 64-bit floating-point values starting at memory addresses 'b' and 'c', respectively, multiply them, and store the 64-bit floating-point result starting at memory address 'a'".  The 'M' machine is implemented by a smaller, embedded 'J' machine.  A program translates each 'M' machine instruction into one or more 'J' machine instructions, listed below.lw     r1,a      // load 32 bits starting at 'a'sw     r1,a      // store 32 bits starting at 'a'pack   f0,r1,r2  // pack two 'r' registers into one 'f' registerunpack f0,r1,r2  // unpack one 'f' register into two 'r' registersmul.d  f0,f2,f4  // perform floating-point multiply f2 * f4Write a J-machine program that implements 'mmul.d a,b,c'.3) Imagine a computer with no cache, but with a reasonable-size register file. The computer has a single floating-point multiplier. Theeffect of these assumptions is that each floating-point multiply (operation) will, with probability 1, find one of its two operands in the register file, but will need its other operand delivered from memory, and this for each floating-point multiply. Let the floating-point multiplier have a peak performance of 16 GFs/s. At present, the achievable bandwidth from the memory to the processor is 7 GWs/s. (Here, 'W' stands for word, not Watt, and, by assumption, one word can hold one floating-point value). K = 10^3
a) [5 marks] Describe this situation as _compute bound_ or _bandwidthbound_.
b) [5 marks] We buy a second DRAM module and more interconnection links withthe same aggregate capacities as the first. Describe this new situation as_compute bound_ or _bandwidth bound_.
c) What would it take to achieve sustained performance equal to the peakperformance?

Answers

(11)
Status NEW Posted 04 May 2017 08:05 AM My Price 9.00

-----------

Not Rated(0)