File Processing
In the previous installment of this column (November 2000) I illustrated the classes in java.lang.io that provide basic stream and character I/O. A distinguishing feature of many of these classes is their tiered relationship, implemented via the Decorator pattern. For example, a low-level class such as FileReader opens a file. To add line-oriented I/O capability, you can wrap the FileReader object in a BufferedReader, like this:
FileReader f = new FileReader(“file.txt”);
BufferedReader b = new BufferedReader(f);
The basic classes discussed last time all extend an input or output superclass, either InputStream or OutputStream, for byte streams, or Reader and Writer for character streams. One thing I didn’t mention last time was that the file output classes have an overloaded constructor with a second boolean argument for appending data to files. The program in Listing 1 uses such a FileWriter to implement file logging. Often when writing to a log file it’s necessary to open and close the file each time you access it. Although certainly more costly than keeping the file open continuously, it is often the only way to guarantee that all the log data gets written. For this reason the LogFile constructor just stores the name of the log file.
The log() method opens a FileWriter in append mode by using the two-arg constructor with a second argument of true. If the file doesn’t exist already, it is created. I decorate that FileWriter with a BufferedWriter not for buffering (I really don’t want any!), but for the newline() method. You might be tempted to just use a FileWriter and write a ‘\n’ to it to terminate a line, but not all platforms use that as a line terminator. The technique in Listing 1 ports nicely because newline() queries the system property line.separator for the correct character to push onto the output stream. The test program in Listing 2 uses both LogFile methods to write to a log file.
So much for the basics. I’ll now cover the “rest of the story” for file I/O in Java 2.
Random Access Files
You’ve probably noticed by now that there is no basic steam class that provides both input and output capability simultaneously, like iostream does in C++. C++ can do that because it has multiple inheritance (although iostream also requires virtual inheritance, one of the most confusing features in the C++ language – count your blessings Java people!). What Java does offer is RandomAccessFile, a class that supports input and output as well as file positioning. A RandomAccessFile traffics in bytes, not characters, so there are methods for reading and writing single bytes and byte arrays, although it also can read and write strings (converting to and from bytes, of course). RandomAccessFile also implements the DataInput and DataOutput interfaces, so you can also work with primitive types.
A traditional application for random access files processes fixed-length data records, so you can access particular record directly with file positioning. (Database systems used this technique in days of yore). The program in Listing 3 defines a fixed-size Employee class with the following layout:
Employee number 1 int (4 bytes)
Last name 15 characters (30 bytes)
First name 15 characters (30 bytes)
For the convenience of users of the Employee class it stores the name fields as String objects, but when it comes time to read or write Employee objects, these fields need to be treated as byte arrays. Furthermore, strings over 15 characters must be truncated and shorter ones need to be filled out (I chose a fill byte of 0xFF, something that wouldn’t occur in user data). You can see this technique illustrated in the stringToBytes() method. To write an Employee record to a RandomAccessFile, Employee.write() calls stringsToBytes(), which builds a buffer large enough for both name fields and calls stringToBytes() to fill them, after which it writes the employee number. To read a record back in, RandomAccessFile.read() calls RandomAccessFile.readFully(), which fills the fixed-length byte array with the name data. To correctly build each name field string I have to search for the first occurrence of fillByte to determine its length.
As you can see in Listing 4, to open a RandomAccessFile for both reading and writing, you need to specify a second argument of “rw” in the constructor. After writing a couple of Employee records to the file I move the file pointer between record boundaries by calling RandomAccessFile.seek() with the size of the record as argument. (Seek positions are always relative to the beginning of the file). This particular example writes two employee records and then swaps them by reading them backwards.
Although this is the first month that this column appears in this Java Solutions supplement, and therefore I am not obliged to mention C or C++ at all, I still can’t resist showing how to do the above in C for comparison. The program in Listing 5 accomplishes the same thing as Listings 3 and 4, but in 50 lines instead of 152! In fairness to the Java, however, I must admit a lot of safety is inherent in the Java version. For example, there is no danger of overflowing a String or even an array in Java, but if I make an error in my array access in C, I’m dog meat! The C version also lacks the advantages of object-orientation, and if I had implemented a C Employee class, then more lines would have resulted as well. Nonetheless, if you’re coming from the C world one of the first things you notice about Java is its verbosity. Like it or lump it.
The complement to the seek() method is RandomAccessFile.getFilePointer(), which returns the offset of the current file position as a long[1]. As a final example of the file positioning methods, the program in Listings 6 through 8 illustrate a file viewer – an application that scrolls through a file a screen at a time, both forward and backward[2]. The FileViewer class in Listing 6 uses a read-only RandomAccessFile so it can move around, and a stack to keep track of where it’s been so it can scroll backwards. The constructor opens the file and displays the first screen. The topPos field keeps track of the file position of the first line currently in the display. To scroll down, the next() method pushes topPos on the stack and then displays the next screen, while previous() undoes that operation.
You might think it strange that I bother to separate the read and display operations, storing the current screen’s lines in an ArrayList (which is like a Vector), instead of just displaying the lines immediately. The reason is to support the last() method, which scrolls immediately to the end of the file. I need to read sequentially, stacking each screen as I go, so I can scroll backwards once I reach the end, but I certainly don’t need to display as I go.
The program in Listing 7 provides a simple command-line interface for viewing a file with FileViewer. Just to be useful it allows redundant commands for each operation (such as ‘n’ and ‘d’ (down) for viewing the next screen). I must admit that I like the way Java forces me to design in a higher-level, object-oriented fashion. The C version of this program I wrote years ago, while shorter, doesn’t separate the file positioning from the viewing, like the FileViewer and ViewFile classes do. It just came automatically now that I’ve been using Java for a number of years.
In Listing 8 you can see that I implemented a stack with Java 2’s LinkedList class. For more on LinkedList, ArrayList, and other collections, see the September 2000 issue of this column.
Exploring the File System
Working with files is often more than just doing input and output. Sometimes you need to know what files are in a directory, or whether a certain file exists at all, or you may need to delete a file. All this and more is possible with the methods of the File class. A File object represents a path, not a file stream. In fact, the corresponding doesn’t even have to exist, although subsequent operations may fail if that is the case. File objects are based on hierarchical directory structures such as are found in UNIX and DOS/Windows[3]. Since UNIX uses a forward slash to separate components of a path, and Windows uses a backslash, you can determine these characters at runtime via the file.separator system property. The program in Listing 9 shows the properties of interest for file processing; the output is for a Windows 2000 system.
A File object can represent either a directory name or a file name, since both are valid path names. You can query which is the case with the isDirectory() and isFile() methods respectively. You can retrieve the name of the path in two basic forms: absolute and relative. The absolute name of a path is the full path name from its root (e.g., C:\), and the relative name is the last component of the absolute name (such as PropTest.java). An alternate form of absolute name, called the canonical path, is a system-dependent rendition of an absolute path name. Most of the time it is just the same as the absolute path, but on UNIX systems, if the absolute path has symbolic links, then the canonical path will resolve those links to give the true physical path. In other words, a canonical path is more “real” than an absolute path.
The File class has methods for listing the contents of a directory, deleting and renaming files, requesting file attributes such as size, time last modified, and a user’s read and write permissions, and for navigating directories. The program in Listing 10 lists the names of the entries in an entire subdirectory tree. If you don’t specify a starting directory, it uses the current user directory. The File.listFiles() method returns an array of File objects representing the contents of the given directory; getName() returns the relative pathname of an entry. If the entry is a directory, I call the list() method recursively. This particular example shows the files form this article, and a subdirectory named “temp”.
Listing 11 shows how you can control which files come back from a call to listFiles(). The nested class SuffixFilter implements the FilenameFilter interface, which has a single method: accept(File dir, String name). When you call the overloaded version of listFiles() that takes a FilenameFilter, it calls accept for each entry and only returns those for which your accept method returns true. This example reads a suffix from the command line, stores it in the static field ListSomeFiles.suffix, and displays only the matching files from the current directory.
The ListFiles class in Listing 12 illustrates the informational methods in the File class. It is basically a traditional directory lister that displays directory information in fixed-length columns. If you’re a little rusty on the format classes in java.text, see my article in the ??? issue of this column. The program in Listing 13 shows how easy it is to find a file in a subdirectory tree by applying listFiles() recursively. It uses File.getCanonicalPath to print the full pathname of where it found the file.
Summary
Java gives you as much control over files and the file system as a “write once run anywhere” language can claim. Although not necessarily fit for implementing a DBMS, the RandomAccessFile class gives you simultaneous input and output on a file of bytes (more or less an expandable byte array on disk), which can be useful. The File class gives you almost everything you need for navigating and tweaking your file system. It’s not POSIX, but it’s close. Magazine real estate won’t allow me to explore it in this issue, but Java does supply classes that support ZIP and JAR[4] files. Just as a teaser, the program in Listing 14 displays information for each entry in a ZIP file.
Listing 1 - LogFile.java: A Class for Writing Log Files
import java.io.*; class LogFile { String fileName; public LogFile(String fileName) { this.fileName = fileName; } public void log(String message) throws IOException { FileWriter file = new FileWriter(fileName, true); BufferedWriter w = new BufferedWriter(file); w.write(message); w.newLine(); w.close(); } public void log(String prefix, String message) throws IOException { log(prefix + ": " + message); } }
Listing 2 - LogFileTest.java: Tests the LogFile Class
import java.io.*; class LogFileTest { public static void main(String[] args) { LogFile log = new LogFile(args[0]); try { log.log("A First message"); log.log("WARNING", "A second message"); } catch (IOException e) { System.out.println("Error: " + e.getMessage()); } } } /* Contents of args[0]: A First message WARNING: A second message */
Listing 3 - Employee.java: A Fixed-length Employee Data Record Class
import java.io.*; // Illustrates fixed-length-record I/O public class Employee { // Attributes: int empno; String last; String first; // Class constants: static final int LAST_MAX = 15; static final int FIRST_MAX = 15; static final int size = LAST_MAX*2 + FIRST_MAX*2 + 4; static final byte fillByte = (byte) 0xFF; public Employee(String last, String first, int empno) { this.last = last; this.first = first; this.empno = empno; } static void stringToBytes(String s, int max, byte[] dest, int offset) { // Note that max must be even, so we // don't get half a char. byte[] bytes = s.getBytes(); for (int i = 0; i < max; ++i) { if (i < bytes.length) dest[i + offset] = bytes[i]; else dest[i + offset] = fillByte; } } public byte[] stringsToBytes() { byte[] buffer = new byte[LAST_MAX*2 + FIRST_MAX*2]; stringToBytes(last, LAST_MAX*2, buffer, 0); stringToBytes(first, FIRST_MAX*2, buffer, LAST_MAX*2); return buffer; } public void write(RandomAccessFile f) throws IOException { f.write(stringsToBytes()); f.writeInt(empno); } public void read(RandomAccessFile f) throws IOException { byte[] buffer = new byte[LAST_MAX*2 + FIRST_MAX*2]; f.readFully(buffer); last = new String(buffer, 0, findDelim(buffer, 0, LAST_MAX*2)); first = new String(buffer, LAST_MAX*2, findDelim(buffer, LAST_MAX*2, FIRST_MAX*2)); empno = f.readInt(); } public String toString() { return "{" + last + "," + first + "," + empno + "}"; } int findDelim(byte[] buffer, int start, int max) { // Find first occurrence of 'fillbyte' in // a trailing substring: int i; for (i = 0; i < max; ++i) if (buffer[i + start] == fillByte) break; return i; // 0 <= i <= max } }
Listing 4 - ProcessRecords.java: Processes a Random Access File of Employee Records
import java.io.*; class ProcessRecords { public static void main(String[] args) { Employee e1 = new Employee("doe", "john", 1); Employee e2 = new Employee("dough", "jane", 2); RandomAccessFile f = null; try { // Create file; add two records: System.out.println("Populating file..."); f = new RandomAccessFile("employees.dat", "rw"); e1.write(f); e2.write(f); System.out.println("e1 = " + e1); System.out.println("e2 = " + e2); System.out.println(); // Swap on re-reading: System.out.println("Reading file..."); f.seek(Employee.size); e1.read(f); f.seek(0); e2.read(f); System.out.println("e1 = " + e1); System.out.println("e2 = " + e2); } catch (IOException e) { e.printStackTrace(); return; } finally { if (f != null) { try { f.close(); } catch (IOException e) { System.out.println( "File close error: " + e ); } } } } } /* Output: Populating file... e1 = {doe,john,1} e2 = {dough,jane,2} Reading file... e1 = {dough,jane,2} e2 = {doe,john,1} */
Listing 5 - records.c: A C Version of Listings 3 and 4
#include#define LAST_MAX 15 #define FIRST_MAX 15 typedef struct { char last[LAST_MAX+1]; char first[FIRST_MAX+1]; int empno; } Employee; void toString(Employee* e, FILE* out) { fprintf(out, "{%s, %s, %d}", e->last, e->first, e->empno); } int main() { Employee e1 = {"doe", "john", 1}; Employee e2 = {"dough", "jane", 2}; FILE* f; /* Build 2 records: */ toString(&e1, stdout); putchar('\n'); toString(&e2, stdout); putchar('\n'); /* Create file: */ if ((f = fopen("employees.dat","w+b")) == NULL) return -1; if (fwrite(&e1,sizeof(Employee),1,f) != 1) return -1; if (fwrite(&e2,sizeof(Employee),1,f) != 1) return -1; /* Swap on re-reading: */ fseek(f, sizeof(Employee), SEEK_SET); fread(&e1, sizeof(Employee), 1, f); rewind(f); fread(&e2, sizeof(Employee), 1, f); toString(&e1, stdout); putchar('\n'); toString(&e2, stdout); putchar('\n'); fclose(f); return 0; }
Listing 6 - FileViewer.java: A Class for Scrolling through a Random Access File
import java.io.*; import java.util.*; public class FileViewer { private RandomAccessFile f; private Stack stk; private long topPos; ArrayList lines; private static final int SCREEN_SIZE = 24; public FileViewer(String fileName) throws IOException { f = new RandomAccessFile(fileName, "r"); stk = new Stack(); topPos = f.getFilePointer(); lines = new ArrayList(); readAndDisplay(); } public void next() throws IOException { stk.push(new Long(topPos)); topPos = f.getFilePointer(); readAndDisplay(); } public void previous() throws IOException { topPos = ((Long)stk.pop()).longValue(); f.seek(topPos); readAndDisplay(); } public void first() throws IOException { stk.clear(); topPos = 0; f.seek(topPos); readAndDisplay(); } public void last() throws IOException { do { stk.push(new Long(topPos)); topPos = f.getFilePointer(); } while (read()); display(); } public void close() throws IOException { f.close(); } boolean read() throws IOException { String line = null; lines.clear(); for (int i = 0; i < SCREEN_SIZE && (line = f.readLine()) != null; ++i) { lines.add(line); } return line != null; } void display() { for (int i = 0; i < lines.size() && i < SCREEN_SIZE; ++i) System.out.println((String) lines.get(i)); } void readAndDisplay() throws IOException { read(); display(); } }
Listing 7 - ViewFile.java: Uses FileViewer to View a File on the Console
import java.io.*; class ViewFile { public static void main(String[] args) throws Exception { FileViewer fv = new FileViewer(args[0]); BufferedReader in = new BufferedReader( new InputStreamReader(System.in) ); boolean stillViewing = true; while (stillViewing) { switch (getCommand(in)) { case 'n': case 'd': fv.next(); break; case 'p': case 'u': fv.previous(); break; case 'f': case 't': fv.first(); break; case 'l': case 'b': fv.last(); break; case 'q': case 'e': case 'c': case 'x': stillViewing = false; break; default: System.out.println("=== Try again ==="); } } fv.close(); } static char getCommand(BufferedReader in) throws IOException { // Prompt for a user command: System.out.print("===> Command? "); System.out.flush(); String line = in.readLine(); if (line.length() == 0) return 'n'; // defaults to Next return Character.toLowerCase(line.charAt(0)); } }
Listing 8 - Stack.java: A Stack Class based on the LinkedList Collection
import java.util.*; class Stack { private LinkedList data; public Stack() { data = new LinkedList(); } public void push(Object o) { data.addFirst(o); } public Object pop() throws NoSuchElementException { return data.removeFirst(); } public int size() { return data.size(); } public void clear() { data.clear(); } }
Listing 9 - PropTest.java: Illustrates Common File-related System Properties
class PropTest { static void displayProperty(String name) { String prop = System.getProperty(name); System.out.println(name + ": " + "\"" + prop + "\""); } public static void main(String[] args) { displayProperty("file.separator"); displayProperty("path.separator"); displayProperty("user.name"); displayProperty("user.home"); displayProperty("user.dir"); displayProperty("line.separator"); // Display bytes of line.separator: String lineSep = System.getProperty("line.separator"); byte[] bytes = lineSep.getBytes(); for (int i = 0; i < bytes.length; ++i) System.out.print(bytes[i] + " "); System.out.println(); } } /* Output: file.separator: "\" path.separator: ";" user.name: "Administrator" user.home: "C:\Documents and Settings\Administrator" user.dir: "C:\CUJ" line.separator: " " 13 10 */
Listing 10 - ListAllFiles.java: Lists a Subdirectory Recursively
import java.io.*; import java.util.*; import java.text.*; class ListAllFiles { static int indentLevel = 0; public static void main(String[] args) throws IOException { if (args.length > 0) list(new File(args[0])); else list(new File(System.getProperty("user.dir"))); } static void list(File dir) throws IOException { ++indentLevel; File[] files = dir.listFiles(); for (int i = 0; i < files.length; ++i) { display(files[i].getName()); if (files[i].isDirectory()) list(files[i]); } --indentLevel; } static void display(String name) { for (int i = 0; i < indentLevel; ++i) System.out.print(" "); System.out.println(name); } } /* Output: Compare.java Employee.java employees.dat FileViewer.java FindFile.java ListAllFiles.class ListAllFiles.java ListFiles.java ListSomeFiles.java LogFile.java logfile1.txt LogFileTest.java ProcessRecords.java PropTest.java records.c Stack.java temp bar baz foo Test.java ViewFile.java */
Listing 11 - ListSomeFiles.java: Uses a FilenameFilter to list only certain files
import java.io.*; import java.util.*; import java.text.*; class ListSomeFiles { static int indentLevel = 0; static String suffix = null; public static void main(String[] args) throws IOException { suffix = args[0]; list(new File(System.getProperty("user.dir"))); } static void list(File dir) throws IOException { ++indentLevel; File[] files = dir.listFiles(new SuffixFilter()); for (int i = 0; i < files.length; ++i) { display(files[i].getName()); if (files[i].isDirectory()) list(files[i]); } --indentLevel; } static void display(String name) { for (int i = 0; i < indentLevel; ++i) System.out.print(" "); System.out.println(name); } static class SuffixFilter implements FilenameFilter { public boolean accept(File dir, String name) { return name.endsWith(suffix); } } } /* Output from 'ListSomeFiles .java': Employee.java FileViewer.java FindFile.java ListAllFiles.java ListFiles.java ListSomeFiles.java ListZip.java LogFile.java LogFileTest.java ProcessRecords.java PropTest.java Stack.java Test.java ViewFile.java */
Listing 12 - ListFiles.java: Lists Directory Entries with Full Information
import java.io.*; import java.util.*; import java.text.*; class ListFiles { public static void main(String[] args) throws IOException { listRoots(); // Print current directory name: String curDir = System.getProperty("user.dir"); File dir = new File(curDir); System.out.println(dir.getCanonicalPath() + ":"); System.out.println("\trelative path: " + dir.getPath()); System.out.println("\tabsolute path: " + dir.getAbsolutePath()); System.out.println("\tas URL: " + dir.toURL()); System.out.println("=========="); // List files: File[] files = dir.listFiles(); SimpleDateFormat dateFormat = new SimpleDateFormat( "MM-dd-yyyy kk:mm:ss" ); DecimalFormat sizeFormat = new DecimalFormat("########"); for (int i = 0; i < files.length; ++i) { String name = buildColumn(files[i].getName(), 20); System.out.print(name + " "); String size = buildColumn( sizeFormat.format(files[i].length()), 8 ); System.out.print(size + " "); Date when = new Date(files[i].lastModified()); System.out.print(dateFormat.format(when) + " "); if (files[i].isDirectory()) System.out.print("d"); if (files[i].canRead()) System.out.print("r"); if (files[i].canWrite()) System.out.print("w"); if (files[i].isHidden()) System.out.print("h"); System.out.println(); } } static void listRoots() { File[] roots = File.listRoots(); System.out.println("Roots on system:"); for (int i = 0; i < roots.length; ++i) System.out.println("\t" + roots[i].getAbsolutePath()); System.out.println(); } static String buildColumn(String s, int len) { // Force a string into a fixed-size column: if (s.length() >= len) return s.substring(0, len); else { StringBuffer buf = new StringBuffer(s); for (int i = s.length(); i < len; ++i) buf.append(' '); return buf.toString(); } } } /* Output: Roots on system: C:\ D:\ C:\CUJ: relative path: C:\CUJ absolute path: C:\CUJ as URL: file:/C:/CUJ/ ========== Compare.java 414 11-22-2000 11:47:39 rw Employee.java 2308 11-21-2000 15:51:27 rw employees.dat 128 11-18-2000 17:59:28 rw FileViewer.java 1830 11-18-2000 18:03:21 rw FindFile.java 962 11-21-2000 15:50:08 rw ListAllFiles.java 1310 11-22-2000 12:04:11 rw ListFiles.class 2504 11-22-2000 12:10:51 rw ListFiles.java 2631 11-21-2000 15:47:07 rw ListSomeFiles.java 1400 11-22-2000 11:26:44 rw ListZip.java 470 11-18-2000 23:48:01 rw LogFile.java 578 11-21-2000 15:14:11 rw logfile1.txt 44 11-21-2000 15:15:55 rw LogFileTest.java 471 11-21-2000 15:37:27 rw ProcessRecords.java 1341 11-21-2000 16:05:52 rw PropTest.java 990 11-22-2000 11:07:15 rw records.c 1059 11-21-2000 15:54:37 rw Stack.java 463 11-08-2000 12:00:04 rw temp 0 11-22-2000 12:01:27 drw Test.java 947 11-20-2000 18:36:01 rw ViewFile.java 1538 11-21-2000 15:55:47 rw */
Listing 13 - FindFile.java: Searches a Subdirectory Tree for an Entry
import java.io.*; class FindFile { public static void main(String[] args) { String dir = null; if (args.length < 2) dir = new String("."); else dir = args[1]; try { search(new File(dir), args[0]); } catch (IOException e) { System.out.println(e.getMessage()); } } static void search(File dir, String name) throws IOException { File[] files = dir.listFiles(); if (files == null) throw new IOException("not a valid directory"); for (int i = 0; i < files.length; ++i) { if (files[i].getName().compareToIgnoreCase(name) == 0) { System.out.println(files[i].getCanonicalPath()); } if (files[i].isDirectory()) search(files[i], name); } } } /* Output of 'java FindFile foo': C:\CUJ\temp\foo */
Listing 14 - List the contents of a ZIP file
import java.util.*; import java.util.zip.*; class ListZip { public static void main(String[] args) throws Exception { ZipFile zf = new ZipFile(args[0]); Enumeration files = zf.entries(); while (files.hasMoreElements()) { ZipEntry z = (ZipEntry)files.nextElement(); System.out.println(z.getName() + "," + z.getSize() + "," + z.getCompressedSize() + "," + new Date(z.getTime())); } } } /* Output from 'java ListZip cuj.zip': ViewFile.java,1573,533,Sat Nov 18 18:03:42 MST 2000 Employee.java,2433,811,Sat Nov 18 17:57:38 MST 2000 FileViewer.java,1830,533,Sat Nov 18 18:03:22 MST 2000 ListFiles.java,1785,523,Thu Nov 09 17:29:50 MST 2000 ListZip.java,295,204,Sat Nov 18 23:42:56 MST 2000 ProcessRecords.java,1370,519,Sat Nov 18 17:59:20 MST 2000 PropTest.java,216,126,Wed Nov 08 23:26:46 MST 2000 records.c,974,398,Tue Nov 07 22:22:50 MST 2000 Stack.java,463,206,Wed Nov 08 12:00:04 MST 2000 */
Notes
- C/C++ programmers: remember that a long in Java is much larger (64 bits!), so there is no practical need for a special type like
filepos_tas in C.- Yes, I know it's an antiquated command-line style example, but it's fun, so bear with me.
- Much of File's functionality is a no-op on the MacIntosh.
- Jar files are ZIP files that also contain manifest information. See the September 1999 issue of this column.