Assignment 5 - Spelling Checking

due Tuesday, February 23rd

Output updated 2/18

The goal of this assignment is to give you practice in function and class templates and applying other concepts covered in the course.

You are to create a generic (class template) hash table. You will use this template to create a "Dictionary" of words. The words will be stored in a (library) class type that you will derive from string.  You will use your hash table Dictionary to spell check a document and perform some analysis of the efficiency of your hash table. 

The Dictionary file
The document to be spell checked

Program Steps

  1. Derive a class from string, call Mystring.  You will use this class to store words from the Dictionary file.  The class should contain necessary constructors, a conversion operator to convert Mystrings to unsigned int, a tolower function to convert the letters in a word to lower case, and a removePunctuation function to strip off non-alphabetic characters from both ends of the word.  It should also remove any 's from the word.  This class should be tested and then placed in a library.
  2. Create a generic linked list.  Do this by creating a Node and List class template.  You will use this by instantiating a template class of type Mystring.
  3. Create a hash-table class template, called Myhash.  You may use the hash function shown below for your hashing algorithm.  To resolve "collisions" in your hash table, use "chaining" implemented with your generic linked list from the previous step. You will instantiate your hash table class template to create a template class of Mystring.  The Myhash class template should have size, insert and find member functions and member functions necessary to produce the 3 statistics shown in the output below (percent of buckets used, average bucket size, and largest bucket size).  The Myhash insert function should throw a DuplicateError exception when a duplicate Mystring is attempted to be inserted into the Myhash template class object (the dictionary).
  4. Create a DuplicateError class template, derived from the logic_error standard exception.  This class template will be instantiated to create a template class of Mystring.
  5. Run the code found in the suggested main function below.  Your output should identify duplicate words (type Mystring) that were attempted to insert into the Dictionary, the number of words in the Dictionary, the hash-table bucket statistics (shown in the sample output), and misspelled words (plus total).

     // hash function adapted from: Thomas Wang https://gist.github.com/badboy/6267743
    unsigned hash(unsigned key) {

        int c2 = 0x27d4eb2d; // a prime or an odd constant
        key = (key ^ 61) ^ (key >> 16);
        key = key + (key << 3);
        key = key ^ (key >> 4);
        key = key * c2;
        key = key ^ (key >> 15);
        return key % buckets;
    }

Note: if you use this function as a member function, make it a const member function.

Submission Requirements

Submit all of your source files, header files, and your archived Mystring library file in a compressed (zipped) file, contained in one folder.  The instructor will unzip your files, compile and execute your program.  Also, make sure you identify your compiler and OS in a comment in your main function.

Suggested main function

int main()
{
    Myhash<Mystring,1500> Dictionary;
    Mystring buffer;

    const string DictionaryFileName = "ass5words";
    const string DocumentFileName = "ihaveadream.txt";

    ifstream fin(DictionaryFileName.c_str());
    if (!fin)
    {
        cerr << "Can't find " << DictionaryFileName << endl;
        exit(-1);
    }
    while (getline(fin, buffer))
    {
        // remove \r if present (this for Mac/Linux)
        if (buffer[buffer.size()-1] == '\r')
            buffer.resize(buffer.size() - 1);
        buffer.tolower();
        try
        {
            Dictionary.insert(buffer);
        }
        catch (const DuplicateError<Mystring>& error)
        {
            cout << error.what() << endl;
        }
    }

    cout << "Number of words in the dictionary = " << Dictionary.size() << endl;
    cout << "Percent of hash table buckets used = " << setprecision(2) << fixed << 100 * Dictionary.percentOfBucketsUsed() << '%' << endl;
    cout << "Average non-empty bucket size = " << Dictionary.averageNonEmptyBucketSize() << endl;
    cout << "Largest bucket size = " << Dictionary.largestBucketSize() << endl;

    fin.close();
    fin.clear();

    // Spellcheck
    unsigned misspelledWords = 0;

    fin.open(DocumentFileName.c_str());
    if (!fin)
    {
        cerr << "Can't find " << DocumentFileName << endl;
        exit(-1);
    }
    while (fin >> buffer)
    {
        buffer.tolower();
        buffer.removePunctuation();
        if (!buffer.size())
            continue;
        if (!Dictionary.find(buffer))
        {
            misspelledWords++;
            cout << "Not found in the dictionary: " << buffer << endl;
        }
    }
    cout << "Total mispelled words = " << misspelledWords << endl;
}


Sample Output

Your output should look "similar" to the following:   (Output updated 2/18)

Duplicate Mystring: clone
Duplicate Mystring: duplicate
Duplicate Mystring: resting
Duplicate Mystring: triplet
...
Number of words in the dictionary = 24044
Percent of hash table buckets used = 100.00%
     <--- You might see something different here (like 57.60%)
Average non-empty bucket size = 16.03
Largest bucket size = 43
Not found in the dictionary: seared
Not found in the dictionary: flames
Not found in the dictionary: withering
Not found in the dictionary: captivity
Not found in the dictionary: sadly
Not found in the dictionary: manacles
Not found in the dictionary: segregation
Not found in the dictionary: discrimination
Not found in the dictionary: lonely
Not found in the dictionary: prosperity
Not found in the dictionary: languishing
Not found in the dictionary: finds
Not found in the dictionary: dramatize
Not found in the dictionary: appalling
Not found in the dictionary: architects
Not found in the dictionary: independence
Not found in the dictionary: guaranteed
Not found in the dictionary: happiness
Not found in the dictionary: defaulted
Not found in the dictionary: concerned
Not found in the dictionary: marked
Not found in the dictionary: funds
Not found in the dictionary: funds
Not found in the dictionary: vaults
Not found in the dictionary: opportunity
...
Not found in the dictionary: snowcapped
Not found in the dictionary: peaks
Not found in the dictionary: jews
Not found in the dictionary: gentiles
Not found in the dictionary: protestants
Not found in the dictionary: catholics
Total mispelled words = ???                 <--- The number should be between 100 and 125