ComputerScienceExpert

(11)

$18/per page/

About ComputerScienceExpert

Levels Tought:
Elementary,Middle School,High School,College,University,PHD

Expertise:
Applied Sciences,Calculus See all
Applied Sciences,Calculus,Chemistry,Computer Science,Environmental science,Information Systems,Science Hide all
Teaching Since: Apr 2017
Last Sign in: 103 Weeks Ago, 3 Days Ago
Questions Answered: 4870
Tutorials Posted: 4863

Education

  • MBA IT, Mater in Science and Technology
    Devry
    Jul-1996 - Jul-2000

Experience

  • Professor
    Devry University
    Mar-2010 - Oct-2016

Category > Programming Posted 24 May 2017 My Price 9.00

Jaccard coecient

Question:

1. [14] Data preprocessing.

(a) [8] Present the value range for each of the following

measures:

(1) Jaccard coecient

Answer:

(2) covariance

Answer:

(3) F-measure

Answer:

(4) Kulczynski measure

Answer:

 

(b) [6] Give three example d i s t a n c e m e a s u r e s f o r each of

the following two kinds:

(1) the distance between two objects

Answer:

 

(2) the distance between two clusters

Answer:

 

2. [16] Data Warehousing, OLAP and Data Cube Computation

(a) [8] The standard deviation of n observations x1 , x2 , . . . , xn

is dened as

 

Where x¯ is the average (i.e., mean) value of x1 , . . . , xn .

i. [3] What kind of measure d o e s standard deviation belong to:

distributive, algebraic, or holistic? Justify your answer.

Answer:

 

ii. [5] Outline an ecient algorithm that computes an iceberg cube

with standard deviation

as the measure, whe re the iceberg condition is n

≥ 100 and σ ≥ 2.

Answer:

 

(b) [8] It is desirable t o construct an AlbumCube to facilitate

multidimensional search through digital photo collections, s u c h as by date,

photographer, location, theme, content, color, etc.

i. [2] What should be the dimensions and measures for such a data

cube?

Answer

 

ii. [3] What analytical functions can you provide?

Answer:

 

iii. [3] What are the major challenges on implementing AlbumCube, and

how would you propose to handle them?

Answer:

3. [19] Frequent pattern and association m in in g

(a) [6] Since items have dierent expected frequencies of sales, it

is desirable to use group-based minimum support thresholds

set up by users. For example, one may set up a small min

support for the group of cameras but a rather large one for

the group of bread. Outline an FP growth-like algorithm that

derive the set of frequent items eciently in a transaction

database.

Answer: Suppose each item is associated with a group ID.

(b) [7] Suppose a BestBuy analyst is interested in only the

frequent patterns (i.e., itemsets) from the sales transactions that

satisfy certain constraints. For the following cases, state the

characteristics (i.e., categories) of every constraint in each

case and how to mine such patterns most eciently.

i. The prot range for the items in each pattern must be

within

$50.

Answer:

 

ii. The sum of the price of all the items with prot over $5 in

each pattern is at least $100.

Answer:

 

iii. The average prot for those items priced over $50 in each pattern

must be less than $10.

Answer:

 

(c) [6] Frequent p a t t e r n mining often generates m a n y somewhat

" similar" patterns that carry little new information. Give one such

example. Then outline one method that may generate l e s s number (i.e.,

compressed) b u t interesting patterns.

Answer:

4. [27] Classification and Prediction

(a) [6] Given a training s e t of 10 million tuples with 10

attributes each taking 8 bytes space. One attribute is a class

label with two distinct values, w h e r e a s for other attributes each

has 50 distinct va l u e s . Assume your machine cannot hold all the

dataset in the main memory. Outline an ecient method that

constructs Naive Bayes classier eciently, and answer the

following questions explicitly:

 

(i) how many scans of the database does your algorithm take?

Answer:

(ii) what is the maximum m e m o r y space your algorithm w i l l use

in your induction?

Answer:

 

(b) [6] Give each situation that o n e of the following measures i s

most appropriate for measuring the quality of classication:

(1) sensitivity

Answer:

0

(2) specicity

Answer:

(3) ROC curve

Answer:

(c) [5] People say that if each classier is better than

random guess, ensemble of multiple such classiers will lead to

a nontrivial increase of classication accuracy. Do you agree

with this statement? Give reasoning on it.

Answer:

 

(d) [5] What are the similarities and dierences between semisupervised classication and active l e a r n i n g ?

Answer:

 

(e) [5] If one would like to work out a model to classify U . of Michigan

webpages based on the model you have learned from the UIUC web- site.

Is it easy to do it by transfer learni ng ? How would you suggest the

person to proceed?

Answer:

 

5. [24] Clustering

(a) [6] Use one sentence t o distinguish

methods:

(1) k-means vs. KNN

Answer:

 

(2) STING v s . CLIQUE

Answer:

 

each o f the following pairs o f

 

(3) BIRCH v s . CHAMELEON.

Answer:

(b) [6] Outline the best clustering m e t h o d for the following

tasks (and briey reason on why you make such a design):

(i) nding oil spills along a coast line

Answer:

 

(ii) clustering e m p l o y e e s i n a company b a s e d o n their

salaries a n d years of working experience

Answer:

 

(c) [6] why subspace clustering i s a good choice for highdimensional data? Outline one ecient and eective subspace

clustering method that can cluster a very hi g h dimensional ( e.g.,

thousands of dimensions) data set.

Answer:

 

(d) [6] Cross-validation can be useful in both classication and

clustering.

What are the dierences in these two cases?

Answers

(11)
Status NEW Posted 24 May 2017 02:05 AM My Price 9.00

-----------

Not Rated(0)