The goal of this assignment is to give you practice in function and class templates and applying other concepts covered in the course.
You are to create a generic (class template) hash table. You will use this template to create a "Dictionary" of words. The words will be stored in a (library) class type that you will derive from string. You will use your hash table Dictionary to spell check a document and perform some analysis of the efficiency of your hash table.
The Dictionary file
// hash
function
adapted from: Thomas Wang https://gist.github.com/badboy/6267743 unsigned hash(unsigned key) { int c2 = 0x27d4eb2d; // a prime or an odd constant key = (key ^ 61) ^ (key >> 16); key = key + (key << 3); key = key ^ (key >> 4); key = key * c2; key = key ^ (key >> 15); return key % buckets; } |
int main() { Myhash<Mystring,1500> Dictionary; Mystring buffer; const string DictionaryFileName = "ass5words"; const string DocumentFileName = "ihaveadream.txt"; ifstream fin(DictionaryFileName.c_str()); if (!fin) { cerr << "Can't find " << DictionaryFileName << endl; exit(-1); } while (getline(fin, buffer)) { // remove \r if present (this for Mac/Linux) if (buffer[buffer.size()-1] == '\r') buffer.resize(buffer.size() - 1); buffer.tolower(); try { Dictionary.insert(buffer); } catch (const DuplicateError<Mystring>& error) { cout << error.what() << endl; } } cout << "Number of words in the dictionary = " << Dictionary.size() << endl; cout << "Percent of hash table buckets used = " << setprecision(2) << fixed << 100 * Dictionary.percentOfBucketsUsed() << '%' << endl; cout << "Average non-empty bucket size = " << Dictionary.averageNonEmptyBucketSize() << endl; cout << "Largest bucket size = " << Dictionary.largestBucketSize() << endl; fin.close(); fin.clear(); // Spellcheck unsigned misspelledWords = 0; fin.open(DocumentFileName.c_str()); if (!fin) { cerr << "Can't find " << DocumentFileName << endl; exit(-1); } while (fin >> buffer) { buffer.tolower(); buffer.removePunctuation(); if (!buffer.size()) continue; if (!Dictionary.find(buffer)) { misspelledWords++; cout << "Not found in the dictionary: " << buffer << endl; } } cout << "Total mispelled words = " << misspelledWords << endl; } |
Your output should look "similar" to the following: (Output updated 2/18)
Duplicate
Mystring: clone Duplicate Mystring: duplicate Duplicate Mystring: resting Duplicate Mystring: triplet ... Number of words in the dictionary = 24044 Percent of hash table buckets used = 100.00% <--- You might see something different here (like 57.60%) Average non-empty bucket size = 16.03 Largest bucket size = 43 Not found in the dictionary: seared Not found in the dictionary: flames Not found in the dictionary: withering Not found in the dictionary: captivity Not found in the dictionary: sadly Not found in the dictionary: manacles Not found in the dictionary: segregation Not found in the dictionary: discrimination Not found in the dictionary: lonely Not found in the dictionary: prosperity Not found in the dictionary: languishing Not found in the dictionary: finds Not found in the dictionary: dramatize Not found in the dictionary: appalling Not found in the dictionary: architects Not found in the dictionary: independence Not found in the dictionary: guaranteed Not found in the dictionary: happiness Not found in the dictionary: defaulted Not found in the dictionary: concerned Not found in the dictionary: marked Not found in the dictionary: funds Not found in the dictionary: funds Not found in the dictionary: vaults Not found in the dictionary: opportunity ... Not found in the dictionary: snowcapped Not found in the dictionary: peaks Not found in the dictionary: jews Not found in the dictionary: gentiles Not found in the dictionary: protestants Not found in the dictionary: catholics Total mispelled words = ??? <--- The number should be between 100 and 125 |