Patterns for Linear Indexed Data Structures

Version 1

Alyce Faulstich Brady
Kalamazoo College, Kalamazoo, MI
abrady@kzoo.edu

This paper presents elementary patterns for constructing and using Linear Indexed Data Structures (often known as Vectors and Arrays).

Patterns for building and using these structures include:



Linear Indexed Data Structure (Container)

(Also known as: Array, Vector, Contiguous Homogenous Data Structure)

You are in a situation in which you need a collection of objects of a given type, such that at least one of the following conditions is true.

Therefore, use a Linear Indexed Data Structure in a contiguous block of memory. Examples in C-based languages include the STL or Java Vector and String classes, apvector and apstring, and built-in arrays.

A linear indexed data structure is either a Fixed-Size Linear Data Structure or an Expandable Linear Data Structure.

Patterns for building and using these structures include Declare-Construct-Initialize (Variables), Indexed Random Access, Valid Index, Appended Item, Linear Indexed Traversal, and Reverse Linear Indexed Traversal.

A Linear Indexed Data Structure might not be the best choice if



Fixed-Size Linear Data Structure (Linear Indexed Data Structure)

You are in a situation that calls for a Linear Indexed Data Structure and you know up front how many such objects there will be.

Therefore, specify the number of items that will be in the collection as you construct it, using the Declare-Construct-Initialize pattern.

For example, the syntax for creating a C++ STL Vector of known capacity, or an apvector of known size, whose elements are of type T are essentially the same:

Built-in arrays in C, C++, and Java are always Fixed-Size Linear Data Structures. In C and C++, the fixed size for a statically-defined array must be a constant expression known at compile time. The syntax for creating arrays of type T is

Note that whether the items in the collection are actually constructed and initialized when the data structure is constructed depends on the particular data structure and the type of the objects in it. In the case of an STL Vector, for example, the number of objects in the Vector is zero until Appended Items are added to it explicitly. On the other hand, with C++ built-in arrays and apvectors, the data structure is filled to capacity with "default items" of the underlying data element type. If that type is a primitive type, the individual items are uninitialized. If the individual items are objects, they are constructed and initialized using the default constructor for their class. Whether these data elements are meaningful depends on the type of the objects, the behavior of the default constructor, and what the program wishes to do with the data items.

(The C++ AP classes are available online from the College Board's web site.)



Pattern: Expandable Linear Indexed Data Structure (Linear Indexed Data Structure)

You are in a situation that calls for a Linear Indexed Data Structure and you don't know up front how many objects it will need to hold.

Therefore, declare and construct an empty collection, using the Declare-Construct-Initialize pattern. Resize the collection when you have items to add. You may need to resize it explicitly, or this may be done automatically by methods that add items to the data structure.

You may want to keep all the meaningful items (items that contain meaningful, initialized values) in the Expandable Linear Indexed Data Structure. together at the beginning of the collection. Doing this means that if you know how many meaningful items are in the collection, you also know where they are (at which indices). If the items are not contiguous, in other words, there are holes in the collection, then you don't know which items in the collection are meaningful and which are not without asking them. If the order of the items in the collection does not matter but you want to keep them together at the front of the data structure, use the Appended Item pattern to add items.

Some examples of Expandable Linear Indexed Data Structures include the STL Vector and Advanced Placement program apvector classes in C++ and the ArrayList and Vector classes in Java.

The STL Vector and Java Vector and ArrayList classes support expanding the data structure by simply appending items to it (see Appended Item). An apvector object must be expanded using the resize operation. Other classes also often allow the user to explicitly resize the capacity of an Expandable Linear Indexed Data Structure to reduce the time necessary to append items.



Pattern: Appended Item (Linear Indexed Data Structure)

You want to add an element to an Expandable Linear Indexed Data Structure. You either want the new element to be the last item in the list or you don't care where it's added but you want all meaningful items (items that contain meaningful, initialized values) together at the beginning of the collection, so that knowing how many meaningful items you have also tells you the indices of the meaningful items. (If they are not contiguous, in other words, there are holes in the collection, then you wouldn't know which items are meaningful and which are not without asking them.) Thus, an Expandable Linear Indexed Data Structure with n meaningful items would store those items in the first n entries in the data structure (for example, at indices 0 through n - 1 in C, C++, or Java). If you add an item to such a data structure, it should have n + 1 items in the first n + 1 entries in the data structure (for example, at indices 0 through n in C, C++, or Java).

Therefore, add your new object to the collection by appending it to the end.

Many Expandable Linear Indexed Data Structures provide a method for doing this, automatically growing the size of the data structure if necessary. Some, however, grow only when their capacity is increased explicitly (such as apvector). In such a case, you must check the capacity of the data structure and resize it if necessary before appending a new item.

For example, a Java Vector is an Expandable Linear Indexed Data Structure that supports dynamic growth as items are appended. If deck is a Java Vector of playing cards, then the following statement adds a new Card to the end of the deck:

Note that this examples uses an Anonymous Object. It is not necessary to create a named variable for every element in the collection. The collection is a single named object; we refer to the individual elements in it by specifying its location using Indexed Random Access rather than by using a unique variable name. Of course, it is also possible to append an already existing object to a collection.

If an Expandable Linear Indexed Data Structure with n elements does not provide a method for appending items, use an Indexed Random Access to add a new element to the end. If the index for the new element would not be a Valid Index, resize the data structure first.

NOTE: This pattern is also appropriate for a Fixed-Size Linear Indexed Data Structure, but only if the data structure is already large enough to accomodate the appended item.



Indexed Random Access (Linear Indexed Data Structure)

You want to access (retrieve or set) a specified entry in a Linear Indexed Data Structure.

Therefore, access it directly, specifying the index of the desired entry. The index should be a Valid Index for the specific Linear Indexed Data Structure.

For example, if deck is an STL Vector of playing cards, then one might use the following statement to display the 1st card in the deck (provided that there is at least 1 card in the deck):

One could display the 6th card just as easily (provided that there are at least 6 cards in the deck): (Note that indexing of Linear Indexed Data Structures in C, C++, and Java, along with some other languages, begins at 0. See Valid Index for more details.)

Using a Java Vector, one could display the 6th card of a deck with at least 6 cards with

Since a Java Vector does not know what type of objects it contains, you need to specify (by casting) the type of the element you are retrieving before you can use it as an element of that type. In this example, the 6th element must be cast as a Card before the display method can be invoked.

This pattern can also be used to modify an element in the data structure. For example, to swap the 1st and 52nd cards in the C++ deck above, the following code might be used:

Using the Java Vector above, the code would be



Valid Index (Linear Indexed Data Structure)

You want to access a valid element in a Linear Indexed Data Structure.

Therefore, use an index in the range determined by the programming language and the size of the data structure. In many languages, such as C, C++, and Java, valid indices for a structure with N elements ranges from 0 to N-1, where 0 is the index of the 1st element in the Linear Indexed Data Structure and N-1 is the index of the last element. In other languages, such as Pascal, valid indices for a Linear Indexed Data Structure with N elements ranges from 1 to N.

How to determine the size of a data structure (the number of elements in it) also depends on the language and the particular data structure. For example, if your data structure is an STL or Java Vector, then the size of the data structure refers to the number of Appended Items that have been explicitly added to it. You can use

to obtain the number of elements in any STL or Java Vector V.

Technically, the "number of elements" in a built-in array or apvector depends on its capacity (which depends only on the amount of memory that has been allocated to it). In fact, when a built-in array or apvector is constructed, it is filled with objects created using the default constructor for that type. Whether these are meaningful data elements or not depends on the type of the objects and what the program wishes to do with them. Thus, it is possible with these data structures to have an index that is valid as far as the language and the specification of the data structure is concerned, but refers to an element that never had meaningful data assigned to it. Such an index is, in fact, a Valid Index for the left-hand side of an assignment, and a Linear Indexed Traversal is an appropriate way to add new elements to this kind of data structure. The range of valid indices for assignment purposes, therefore, is 0 to C-1, where C is the capacity of the data structure. (For an apvector V, you can use

to refer to its capacity.)

On the other hand, an index to a data structure element that has not been initialized with meaningful data (as far as the program is concerned) is not a Valid Index for retrieving data from the data structure. In fact, with these data structures, there is no rule that requires that all meaningful data precede the uninitialized data. In other words, it is possible to have a Valid Index that is preceded by an invalid index, at least for data retrieval purposes. In order to have a meaningful range of valid indices for data access (which would be 0 to N-1, where N is the number of elements explicitly added as new elements to the data structure), one must add each new element to the data structure as an Appended Item, or add all the elements to entries 0 through N-1 in the data structure, in whatever order, before accessing them. It is also necessary to explicitly keep track of N, as these data structures do not do that.



Acknowledgements

Autumn Spaulding (Kalamazoo College class of '01) wrote the initial versions of Expandable Linear Data Structure and Appended Item.



Copyright Alyce Faulstich Brady. Last modified on February 17, 2001.