The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,Middle School,High School,College,University,PHD
| Teaching Since: | Apr 2017 |
| Last Sign in: | 56 Weeks Ago, 5 Days Ago |
| Questions Answered: | 7570 |
| Tutorials Posted: | 7352 |
BS,MBA, PHD
Adelphi University/Devry
Apr-2000 - Mar-2005
HOD ,Professor
Adelphi University
Sep-2007 - Apr-2017
JAVA PROGRAM
Â
Normally text files are encoded in binary by translating each character to a fixed-length binary pattern. For example, ASCII uses an 8-bit string to encode each character, and Unicode uses a 16-bit string. Thus the number of bits to encode a file would be 8 * number of characters for an ASCII encoded file. One simple method for compressing text files is Huffman Encoding. Instead of using fixed-length bit patterns for the characters, the patterns vary in length, so that words that are more frequently used have shorter patterns and words that are less commonly used have longer patterns. Thus the overall number of bits to encode a file may be less than in the original coding scheme. In Huffman Encoding no character's bit pattern is a prefix for another character's. It is known where the bit pattern for one character ends and the bit pattern for the next character begins. Thus there's no need for spaces between the characters. Huffman encoding can also be done on words rather than characters.
To implement Huffman encoding we put all the characters (or words) into the leaves of a binary tree, with the more frequent characters higher up in the tree, and less frequent characters lower down in the tree. Every internal node in the tree has two children, and again, the characters are stored in the leaves. To find the bit pattern for a character, just trace the path from the root to the node containing that character and note the direction taken at each level. Start at the root and set the initial bit pattern to "". If you go left toward the leaf, add a '0' to the end of the bit pattern, and if you go right, add a '1' to the end of the bit pattern. When you get to the leaf, the bit pattern is the representation of the character in that leaf. Note that this ensures that no character's bit pattern is a prefix of another, as no path to a leaf is a prefix to another leaf's path.
To construct the tree, we begin by putting each character (or word) into a single-node tree by itself. The node contains the character and its frequency count. We combine two nodes together into a tree by creating a root (with empty key as only leaves have keys), setting the frequency count to be the sum of the frequency counts of the two nodes, making one node the left node of the root, and the other node the right node of the root (it doesn't matter which is which). Similarly, we can combine two trees (with multiple leaves) into one tree by making a new root (with no key), setting its frequency count to the sum of the counts in the roots of the two trees, and making the two trees the children of this new root.
To construct the whole tree, we start with the single-node trees, pick the two nodes with the smallest two frequency counts, and combine them into a single tree. We then repeat - always finding the two trees with the smallest frequency counts and combining them in to a single tree. This is repeated until there's only one tree, and that tree is used to create the bit patterns as noted above.
We can then use the bit patterns to encode the file, by replacing each character (or word) by its bit pattern. Again, there's no need for spaces being the patterns.
Â
Â
Â
Â
Â
Â
An example
Here's an example of using Huffman encoding on words. Given the input:
this is a test this is only a test in the the event of an
actual emergency you would be told where to tune on your radio dial
this is only a test
We would first find the frequency of the words:
to 1
is 3
test 3
your 1
told 1
a 3
emergency 1
dial 1
you 1
the 2
tune 1
where 1
in 1
of 1
would 1
event 1
an 1
radio 1
on 1
only 2
actual 1
this 3
be 1
and then assign bit patterns to the words as follows:
to 10100
is 001
test 1110
your 10010
told 01101
a 000
emergency 11000
dial 10011
you 01000
the 1000
tune 01011
where 11001
in 111110
of 111111
would 11110
event 01110
an 01111
radio 01001
on 01100
only 1011
actual 01010
this 1101
be 10101
In this example, the original file has 113 characters (not including spaces), for a total of 904 bits (using 8-bit ASCII), while the encoded file would have only 145 bits.
Program
For this assignment, write a Java program that will read in a file, count the word frequencies, and then assign Huffman encodings to all the (unique) words based on those frequencies. The program should print out the number of bits need to encoded the original file (you may exclude counting spaces), and the number of bits for the encoded file. Note that you need not encode file, but just say how many bits it would take. the program should read from standard input and write to standard output (can you read the file just once?). Run your program on the bible.The program should be efficient. State what ADT(s) you use and how they are implemented. Give the asymptotic running time of your program.
Â
Before you start tell me what ADT(s)andData Structureyou use and how they are implemented.
Â
Â
JAV-----------A P-----------ROG-----------RAM----------- An-----------swe-----------r-----------