Assignment 1 - A Spell Checker

due Tuesday, 1/12  1:30 pm

Follow these instructions for Programming Assignments:

Programming Assignments

This assignment is meant to be a review of some of the introductory concepts covered in CIS22B. It involves writing some classes, constructors, destructors, memory allocation, file I/O, and program planning. You are to use the two classes, Word and Dictionary, the suggested main() (or a main() that is very similar), the dictionary "word" file, and the input file, "theroadnottaken.txt".

Your program should adhere to the following specifications:

  1. You are to use the Word and Dictionary classes defined below and write all member functions and any necessary supporting functions to achieve the specified result.
  2. The Word class should dynamically allocate memory for each word to be stored in the dictionary.
  3. The Dictionary class should contain an array of pointers to Word. Memory for this array must be dynamically allocated. You will have to read the words in from the file. Since you do not know the "word" file size, you do not know how large to allocate the array of pointers. You are to let this grow dynamically as you read the file in. Start with an array size of 8, When that array is filled, double the array size, copy the original 8 words to the new array and continue. You can see the expected behavior in the sample program output listed below.
  4. You can assume the "word" file is sorted, so your Dictionary::find() function must contain a binary search algorithm. You might want to save this requirement for later - until you get the rest of your program running.
  5. Make sure you store words in the dictionary as lower case and that you convert the input text to the same case - that way your Dictionary::find() function will successfully find "Two" even though it is stored as "two" in your Dictionary.
  6. Make sure you remove leading or trailing punctuation from a word, such as "wood,".
  7. Do not use the string class at all for this assignment.  All text data should be stored and managed as C-strings.
Warning: If you are using a Mac or Linux compiler, you may want to remove the \r at the end of each line in both input files.  You can do that like this:

    strtok(line,"\r");

class Word
{
    char* word;
public:
    Word(const char* text = nullptr);
    ~Word();
    const char* getWord() const;
};

class Dictionary
{
    Word** words;
    unsigned int capacity;
    unsigned int numberOfWordsInDictionary;
    void resize();
    void addWordToDictionary(char* word);
public:
    Dictionary(const char* filename);
    ~Dictionary();
    bool find(const char* word);
};

const unsigned short MaxWordSize = 32;

int main()
{
   char buffer[MaxWordSize];
   Dictionary Websters(wordfile);
   ifstream fin(document);
   if (!fin)
   {
      cerr << "Unable to open file: " << document << endl;
      exit(-2);
   }
   cout << "\nSpell checking " << document << "\n\n";
   while (fin >> buffer) {
      // remove leading/trailing punctuation, change to lowercase
      if (cleanupWord(buffer)) {
         if (!Websters.find(buffer)) {
            cout << buffer << " not found in the Dictionary\n";
         }
      }
   }
}

Program output

Dictionary resized to capacity: 16      <--- This illustrates dictionary resizing     
Dictionary resized to capacity: 32
Dictionary resized to capacity: 64
Dictionary resized to capacity: 128
Dictionary resized to capacity: 256
Dictionary resized to capacity: 512
Dictionary resized to capacity: 1024
Dictionary resized to capacity: 2048
Dictionary resized to capacity: 4096
Dictionary resized to capacity: 8192
Dictionary resized to capacity: 16384
Dictionary resized to capacity: 32768

Spell checking c:/temp/theroadnottaken.txt    <--- Spell checking begins

looked not found in the Dictionary
     <--- Words not found in the dictionary
undergrowth not found in the Dictionary
having not found in the Dictionary
better not found in the Dictionary
wanted not found in the Dictionary
passing not found in the Dictionary
really not found in the Dictionary
morning not found in the Dictionary
...

Hint: there are between 10 and 20 misspelled words 

"word" file and "gettysburg.txt" (this is a zip file)