The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,Middle School,High School,College,University,PHD
| Teaching Since: | Apr 2017 |
| Last Sign in: | 103 Weeks Ago, 3 Days Ago |
| Questions Answered: | 4870 |
| Tutorials Posted: | 4863 |
MBA IT, Mater in Science and Technology
Devry
Jul-1996 - Jul-2000
Professor
Devry University
Mar-2010 - Oct-2016
using python 2.7 solve the problem. comments and directions are included.
​
Project 5 (Markov Model)
Clarifications and Hints 1 / 11 Prologue
Project goal: use a Markov chain to create a statistical model of a piece of English text
and use the model to generate stylized pseudo-random text and decode noisy messages
The zip file (http://www.swamiiyer.net/cs110/markov_model.zip) for the project contains
• project specification (markov_model.pdf)
• starter files
• markov_model.py
• text_generator.py
• fix_corrupted.py • test script (run_tests.py)
• test data (data/)
• report template (report.txt) This checklist will help only if you have read the writeup for the project and have
a good understanding of the problems involved. So, please read the project
writeupm before you continue with this checklist. 2 / 11 Prologue
Understanding how dictionaries work is crucial for this project, so make sure you
understand the following example which illustrates how you create a dictionary of
dictionaries and how you manipulate them
>>> M = {}
>>> M . setdefault ( ’ ba ’ , {})
{}
>>> M
{ ’ ba ’: {}}
>>> M [ ’ ba ’]. setdefault ( ’n ’ , 0)
0
>>> M
{ ’ ba ’: { ’n ’: 0}}
>>> M [ ’ ba ’][ ’ n ’] += 1
>>> M
{ ’ ba ’: { ’n ’: 1}}
>>> M [ ’ ba ’]. setdefault ( ’n ’ , 42)
1 >>> M
{ ’ ba ’: { ’n ’: 1}}
>>> M . setdefault ( ’ an ’ , {})
{}
>>> M
{ ’ ba ’: { ’n ’: 1} , ’an ’: {}} #
#
#
#
# create an empty dictionary M
add key / value pair ’ba ’/{} to M
since ’ba ’ didn ’ t exist in M ,
{} ( the value just added ) is returned
check M #
#
#
#
# add key / value pair ’n ’/0 to the
dictionary M [ ’ ba ’]
since ’n ’ didn ’ t exist in M [ ’ ba ’] ,
0 ( the value just added ) is returned
check M # increment the value corresponding to the
# key ’n ’ in the dictionary M [ ’ ba ’] by 1
# check M
#
#
#
#
#
# add key / value pair ’n ’/42 to the
dictionary M [ ’ ba ’]
since ’n ’ exists in M [ ’ ba ’] ,
setdefault () simply returns ( without
changing ) the corresponding value , 1
check M # add key / value pair ’an ’/{} to M
# check M 3 / 11 Prologue >>> M [ ’ an ’]. setdefault ( ’a ’ , 0)
0
>>> M
{ ’ ba ’: { ’n ’: 1} , ’an ’: { ’a ’: 0}}
>>> M [ ’ an ’][ ’ a ’] += 1
>>> M
{ ’ ba ’: { ’n ’: 1} , ’an ’: { ’a ’: 1}}
>>> M [ ’ an ’][ ’ a ’] += 1
>>> M
{ ’ ba ’: { ’n ’: 1} , ’an ’: { ’a ’: 2}}
>>> M . keys ()
[ ’ ba ’ , ’an ’]
>>> M . values ()
[{ ’n ’: 1} , { ’a ’: 2}]
>>> M [ ’ ba ’]. keys ()
[ ’n ’]
>>> M [ ’ ba ’]. values ()
[1]
>>> M [ ’ an ’]. keys ()
[ ’a ’]
>>> M [ ’ an ’]. values ()
[2] # add key / value pair ’a ’/0 to the
# dictionary M [ ’ an ’]
# check M
# increment the value corresponding to the
# key ’a ’ in the dictionary M [ ’ an ’] by 1
# check M
# increment the value corresponding to the
# key ’a ’ in the dictionary M [ ’ an ’] by 1
# check M
# get the keys of M
# get the values of M
# get the keys of M [ ’ ba ’]
# get the values of M [ ’ ba ’]
# get the keys of M [ ’ an ’]
# get the values of M [ ’ an ’] 4 / 11 Problems Problem 1 (Markov Model Data Type) Create a data type MarkovModel to represent a
Markov model of order k from a given text string, and supporting the following API:
method
MarkovModel(text, k)
model.order()
model.kgram_freq(kgram)
model.char_freq(kgram, c)
model.rand(kgram) model.gen(kgram, T) description
create a Markov model model of order k from text
order k of Markov model
number of occurrences of kgram in text
number of times that character c follows kgram
a random character following the given kgram
a string of length T characters generated by simulating
a trajectory through the
corresponding Markov chain, the
first k characters of which is kgram Hints
• Instance variables
• Order of the Markov model, _k
• A dictionary to keep track of character frequencies, _st 5 / 11 Problems
• MarkovModel(text, k)
• Initialize instance variables appropriately
• Construct circular text circ_text from text by appending the first k characters to the
end; for example, if text = ’gagggagaggcgagaaa’ and k = 2, then
circ_text = ’gagggagaggcgagaaaga’
• For each kgram from circ_text, and the character next_char that immediately follows
kgram, increment the frequency of next_char in the dictionary _st[kgram] by 1; for the
above example, the dictionary _st, at the end of this step, should look like the following:
{ ’ aa ’:
’ag ’:
’cg ’:
’ga ’:
’gc ’:
’gg ’: { ’a ’:
{ ’a ’:
{ ’a ’:
{ ’a ’:
{ ’g ’:
{ ’a ’: 1 , ’g ’:
3 , ’g ’:
1} ,
1 , ’g ’:
1} ,
1 , ’c ’: 1} ,
2} ,
4} ,
1 , ’g ’: 1}} • model.order()
• Return the order of the Markov model • model.kgram_freq(kgram)
• Return the frequency of kgram, which is simply the sum of the values of _st[kgram] • model.char_freq(kgram, c)
• Return the number of times c immediately follows kgram, which is simply the value of c
in _st[kgram] 6 / 11 Problems • model.rand(kgram)
• Use stdrandom.discrete() to randomly select and return a character that immediately
follows kgram • model.gen(kgram, T)
• Initialize a variable text to kgram
• Perform T - _k iterations, where each iteration involves appending to text a random
character obtained using a call to self.rand() and updating kgram to the last _k
characters of kgram
• Return text 7 / 11 Problems Problem 2 (Random Text Generator) Write a client program text_generator.py that
takes two command-line integers k and T , reads the input text from standard input
and builds a Markov model of order k from the input text; then, starting with the
k-gram consisting of the first k characters of the input text, prints out T characters
generated by simulating a trajectory through the corresponding Markov chain, followed
by a new line.
Hints
• Read command-line arguments k and T
• Initialize text to text read from standard input using sys.stdin.read()
• Create a Markov model model using text and k
• Use model.gen() to generate a random text of length T and starting with the first k characters of text • Write the random text to standard output 8 / 11 Problems
Problem 3 (Noisy Message Decoder) Write a client program fix_corrupted.py that takes
an integer k (model order) and a string s (noisy message) as command-line arguments,
reads the input text from standard input, and prints out the most likely original string.
Hints
• Main idea behind model.replace_unknown(corrupted) When we fix the corrupted messages, we have to look at the missing letter in the
context of what comes before it and what comes after it. For example, let the
corrupted text be ’it w~s th’, k = 4, and let the characters that follow the 4-gram
’it w’ be ’a’, ’b’, and ’c’. So you want to pick the best of three hypotheses (call
them Ha , Hb , and Hc ). Let’s use the notation ’abcd’|’e’ to mean the probability
of finding an ’e’ after the 4-gram ’abcd’. This probability is 0 if ’e’ does not
follow ’abcd’ in the text.
The likelihood of Ha is the product of (k + 1) probabilities:
’ was’|’ ’, ’was ’|’t’, and ’as t’|’h’. ’it w’|’a’, ’t wa’|’s’, The likelihood of Hb is the product of the following (k +
’it w’|’b’, ’t wb’|’s’, ’ wbs’|’ ’, ’wbs ’|’t’, and ’bs t’|’h’. 1) probabilities: The likelihood of Hc is the product of the following (k +
’it w’|’c’, ’t wc’|’s’, ’ wcs’|’ ’, ’wcs ’|’t’, and ’cs t’|’h’. 1) probabilities: Now, the character that you use to replace ~ with is the one with the maximum
likelihood. So if max(Ha , Hb , Hc ) = Ha , then you would replace ~ by the
character ’a’. Use the argmax() function for this.
9 / 11 Problems • Pseudocode for model.replace_unknown(corrupted)
if corrupted [ i ] == ’~ ’:
kgram_before = kgram before ~
kgram_after = kgram after ~
probs = for each hypothesis from hypotheses ( characters that can replace ~):
context = kgram_before + hypothesis + kgram_after
p = 1.0
for i from 0 to _k + 1:
kgram = kgram from context starting at i
char = character from context that follows kgram
if kgram or char is non - existent , then set p to 0 and break
Otherwise , multiply p by probability of char following kgram
add p to probs
add to original the hypothesis that maximizes probs ( use argmax ()) • Implement fix_corrupted.py as follows:
• Read command-line arguments k and s
• Initialize text to text read from standard input using sys.stdin.read()
• Create a Markov model model using text and k
• Use model.replace_unknown() to decode the corrupted text s
• Write the decoded text to standard output 10 / 11 Epilogue
Your project report (use the given template, report.txt) must include • time (in hours) spent on the project
• short description of how you approached each problem, issues you encountered, and how you resolved those issues
• acknowledgement of any help you received
• other comments (what you learned from the project, whether or not you enjoyed working on it, etc.)
Before you submit your files
• make sure your programs meet the input and output specifications by running the following command on the terminal
$ python run_tests . py -v [ < problems >] • make sure your programs meet the style requirements by running the following command on the terminal
$ pep8 < program > • make sure your report isn’t too verbose, doesn’t contain lines that exceed 80 characters, and doesn’t contain spelling/grammatical mistakes 11 / 11