In this project students work either individually or in pairs to write a text identification class that uses the Naive Bayes Classifier to determine what author is likely to have written a given text. Students are required to design a class that compiles features from a given text, such as word lengths, sentence lengths, and distribution of word-stems. They then use these features in the Naive Bayes algorithm to classify texts. In addition to reviewing string manipulation, this project requires students to design and document a class. It is particularly useful for students who need additional practice with strings and class design.
Integrate pair programming when it is appropriate for students to collaborate on a lab, assignment, or project.
Uses text from a well-known author (J.K. Rowling) as a way of creating Meaningful and Relevant Context. Incorporates Student Choice by allowing students to create a word cloud for extra credit and encouraging students to indicate which features and texts to use for their program.