Textual Analysis Lab Series

Lab: Reading from a File

Employing Loops and Strings


In this lab you will construct an object of the WordReader class (provided to you), giving it the name of a book to read. Over the course of several labs and mini-labs, you will do some very simple digital humanities computational textual analysis of the book. In this lab, to get started, you will be reading in lines from a piece of literature, printing some, skipping some, and reporting on the length of some.

Before Getting Started

The first step is to choose a book that is available in plain text format. Project Gutenberg is a well-established library of over 60,000 free eBooks, focusing mostly on books published before 1924, whose copyright has expired. A good place to start is with their Top 100 or with their Recently added eBooks. Another interesting source of accessible materials is Wikisource, which has documents in many languages.

Getting Started

In the first exercise, you will download some useful code and write a program that reads and prints out the first non-empty line in the book you chose. For example, your program's output at the end of this exercise might be something like:

    Welcome to the Word Counting Program.
    The first line in SherlockHolmes.txt is: 
       Project Gutenberg's The Adventures of Sherlock Holmes, by Arthur Conan Doyle
    The length of the first line is 76.

Implementation: A good software development practice is to start by writing the smallest amount of code that you can test, test it, then continue by adding small, incremental changes and testing all along the way. (This is sometimes known as Agile Development, or Iterative, Incremental Development, or "always have working code.")

Your first testable step will be to get the classes you need and to use them to read the first line

Skipping Ahead

Print An Extended Quote

Average Line Length

One simple type of digital humanities analysis is to measure the difficulty and variety of the vocabulary used in a work of literature. For example, how many different words appear? How long are the words? Etc. We'll see how to break a line of text up into individual words in a future lab, but for now we could ask simpler questions: how many lines are there, and what is the average length of a line?

Optional: Reading the Whole Work

Zip and Submit Your Program.

You will be adding to this program, but submitting it at this point will allow you to get feedback before you submit a final version.