Textual Analysis Lab Series

Refactor the Textual Analysis Project

Practice with Classes, Refactoring


One of the most important tasks in designing and implementing object-oriented programs is deciding which classes or objects are responsible for executing which behavior.

Up until now, all of the code you've written has been in the main method. By now, however, the main method is starting to get long and tedious to work with.

It's time to refactor your program to make it more object-oriented.

As a program evolves, it is not uncommon to decide that a different way of organizing the classes or methods in the program would make it easier to read, to understand, or to enhance. It is not a good idea to change the structure of existing code at the same time as adding new functionality, because if the program stops working correctly it will be difficult to know where things went wrong. It is better to rearrange or redesign the existing code first, test that it still works correctly, and only then add any new functionality. The restructuring or redesigning phase, without changing the program's behavior, is called refactoring.

Getting Started

The first thing is to create 2 new classes. You could do this using the "New Class" button in BlueJ, but instead we recommend that you download our templates for new classes, which are simple and show the type of class documentation we look for when grading student work.

Refactoring Step One: Using the New Classes

This first step is similar to the first step in the first lab in this series: create a WordReader object, read the first line of your book, and print some basic information about it.

Improving (Refactoring) Step One

Usually a constructor just initializes the object's state; it doesn't start doing the object's work. In this exercise, you will take some of the code from the constructor in your Analyzer class and move it to a separate method.

Step Two: Skipping Lines

Refactoring Step One Again!

The next step after skipping a bunch of lines is to read the next line and print information about it. That is essentially the same as what processFirstLine already does!

Getting Averages

Word Analysis

Base Word Frequency

This method, like the one above (and several to come), uses the list of distinct words. Rather than get that list over and over, we could get it once in the constructor and store it as an instance variable.

Character Analysis

Most Frequently-Used Words

By now, you have figured out the pattern for bringing code from your old, monolithic main method to your new class that provides the same functionality in smaller, more coherent chunks. Apply those techniques to create two more methods:

Remember to update the javadoc comments for each new method and to call the method from your new main method.

Refactoring the Refactored Class (optional)

Ideally, a main method does very little: just constructs a couple of objects and calls a couple of methods to get everything going. Then the objects take over. In this case, the main method constructs a single object and then calls a bunch of methods. An alternative would be to create a new analyzeBook method in the Analyzer class that calls all of the individual methods, and then replace all of those method calls in main with a single call to analyze book. Note that the method would need a set of parameters representing all of the parameters that were sent to individual methods.

Multiple Books

Once you have your program working for 1 book, try it with other books. Download 2 more books, if you hadn't already. (The directions for doing this are in the original Textual Analysis mini-lab.) Edit your mainmethod to construct three Analyzer objects, not just one, passing each the name of a different book. Ask all three objects to analyze their book contents.

Look at the results to see if you can find differences that are indicative of real differences between the books. Obviously the names of the chief characters will be different, but are there other differences, especially in the number of distinct words, the number of singletons, the number of long words, and the most-frequently used words (which pronouns are most frequently used, for example) that indicate a difference in the target audience or main topics of the books?

Zip and Submit Your Program.