The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,High School,College,University,PHD
| Teaching Since: | May 2017 |
| Last Sign in: | 356 Weeks Ago, 5 Days Ago |
| Questions Answered: | 20103 |
| Tutorials Posted: | 20155 |
MBA, PHD
Phoniex
Jul-2007 - Jun-2012
Corportae Manager
ChevronTexaco Corporation
Feb-2009 - Nov-2016
Â
Â
Consider a bandit problem with two arms. It is known that one of the arms
leads to rewards that are homogeneously distributed in the interval [0, 1],
while for the other one rewards in [0, 2] are possible. How many exploratory
actions will you need take on average in order to identify which of the two
arms which has a higher reward average? In the above example, give a value
for the optimistic initialisation of the reward estimate which is sufficient to
identify the optimal solution in at least 80% of all cases.
Hel-----------lo -----------Sir-----------/Ma-----------dam----------- T-----------han-----------k Y-----------ou -----------for----------- us-----------ing----------- ou-----------r w-----------ebs-----------ite----------- an-----------d a-----------cqu-----------isi-----------tio-----------n o-----------f m-----------y p-----------ost-----------ed -----------sol-----------uti-----------on.----------- Pl-----------eas-----------e p-----------ing----------- me----------- on----------- ch-----------at -----------I a-----------m o-----------nli-----------ne -----------or -----------inb-----------ox -----------me -----------a m-----------ess-----------age----------- I -----------wil-----------l