Monday, July 20, 2009

Experiment with building your own CFDirectory List in Java - Part 2

Commons Way
When I first explored this idea, I considered using the apache.commons.io package. It has a lot of interesting classes like DirectoryWalker. DirectoryWalker is a nifty class that handles all the tedious work of traversing directories. So you can focus on what you want to do with the results, such as storing the matching file names in a List.

The filefilter package also has a nice set of time-saving classes that handle many common filtering needs. For example AgeFileFilter allows you to search files by date and the aptly named SizeFileFilter allows searching by file size.

You can also chain multiple filters together. The FileFiltersUtils class provides an easy way to combine multiple filters into one. Using the andFileFilter and orFileFilter methods you can create a new filter that will find files that match all of the input filters or any one of the filters. If you need to combine more than two filters, you can also use the AndFileFilter and OrFileFilter classes directly. Both classes have a method that accepts a List of filters.


FileFiltersUtil Example
<cfscript>
Utils = createObject("java", "org.apache.commons.io.filefilter.FileFilterUtils");
// Find files modified on or after July 1st
earliestDate = parseDateTime("2009-07-01");
laterThan = Utils.ageFileFilter( earliestDate.getTime() );
// Find files that are at least 10,000 bytes in size
minimumSize = Utils.sizeFileFilter(10000);

// Create a filter that finds files matching BOTH conditions
findBoth = Utils.andFileFilter(laterThan, minimumSize);
// Create a filter that finds files matching EITHER condition
findEither = Utils.orFileFilter(laterThan, minimumSize);
//...
</cfscript>


My Way
Ultimately, I decided to roll-my-own filter, borrowing a bit of the logic from the DirectoryWalker.walk() method. What I ended up with is pretty simple. The bulk of the class is just your basic getter/setter methods. The only interesting sections are the walk() method, which traverses the directory tree, and the accept() which filters the results to match whatever criteria you supply. You can filter on lastModifiedDate, file size, file type and perform a simple name search or a use a regular expression. The class also allow you to limit the search to a specific number of subdirectories.

protected void walk(File file, int depth, SearchResult results) {
int childDepth = depth + 1;
File[] contents = file.listFiles();
// if this directory level should be processed ..
if (contents != null && (maxDepth < i =" 0;" child =" contents[i];" isallowed =" true;" size =" file.length();" isallowed =" false;" isallowed =" false;" isallowed =" false;"> maxSize) {
isAllowed = false;
}
// File date is EARLIER than the minimum date desired
else if (startDate != null && file.lastModified() < isallowed =" false;"> endBeforeDate.getTime()) {
isAllowed = false;
}
// The name does not match our filter
else if (simpleFilter != null && !simpleFilter.accept(file)) {
isAllowed = false;
}
else if (regexFilter != null && !regexFilter.accept(file) ) {
isAllowed = false;
}

return isAllowed;
}

I wrapped the whole thing in a function and called the class with createObject. Since there is not an easy way to create a CF query object from java, I used the QueryAddColumn hack to convert the results to a query for easy usage.

If you are interested in viewing the code, you can download it from the box.net widget in the right menu. (Since it is loosely based on the DirectoryWalker class, I threw the Apache license in there). The java code is pretty well documented, but questions or suggestions on how to improve it are always welcome.


<!--- USAGE --->
<cfset results = DirectoryList(
directory = "c:\myFiles\",
maxDepth = -1,
type="Dir",
startDate = "01/01/2009",
endBeforeDate = "07/20/2009"
) />

<!--- FUNCTION --->
<cffunction name="DirectoryList" returntype="query" access="public" output="false">
<cfargument name="directory" type="string" required="true" />
<cfargument name="maxDepth" type="numeric" required="false" hint="Maximum directory level to search. Default = 1" />
<cfargument name="type" type="string" required="false" hint="Include only this type of object: File, Dir or All" />
<cfargument name="startDate" type="date" required="false" hint="Include only files modified on or after this date" />
<cfargument name="endBeforeDate" type="date" required="false" hint="Include only files modified on or before this date" />
<cfargument name="minSize" type="numeric" required="false" hint="Minimum file size (in bytes)" />
<cfargument name="maxSize" type="numeric" required="false" hint="Maximum file size (in bytes). Note, directories have a size of zero" />
<cfargument name="regex" type="string" required="false" hint="Regular expression search pattern" />
<cfargument name="filters" type="string" required="false" hint="List of one or more file name filters" />
<cfargument name="delim" type="string" default="," hint="Delimiter for file name filters. Default is a comma ','" />
<cfargument name="makeExclusive" type="boolean" default="false" hint="If true, return files that do NOT match the filters" />
<cfargument name="filterCase" type="string" default="system" hint="Case sensitivity used for file filters" />

<cfset var dir = "" />
<cfset var patterns = "" />
<cfset var results = "" />
<cfset var qry = "" />

<cfif NOT directoryExists( arguments.directory )>
<cfthrow type="InvalidArgument" message="The specified directory cannot be found">
</cfif>

<cfscript>
dir = createObject("java", "org.cfsearching.DirectoryList");

// maximum directory level to search
if (structKeyExists(arguments, "maxDepth")) {
dir.setMaxDepth( arguments.maxDepth );
}

// find only objects of this type
if (structKeyExists(arguments, "type")) {
dir.setFileType( arguments.type );
}

// find only files modified on or after this date
if (structKeyExists(arguments, "startDate")) {
dir.setStartDate( arguments.startDate );
}

// find only files modified on or before this date
if (structKeyExists(arguments, "endBeforeDate")) {
dir.setEndBeforeDate( arguments.endBeforeDate );
}

// find only files at least this size
if (structKeyExists(arguments, "minSize")) {
dir.setMinSize( arguments.minSize );
}

// find only files no larger than this size
if (structKeyExists(arguments, "maxSize")) {
dir.setMaxSize( arguments.maxSize );
}

if (structKeyExists(arguments, "filters")) {
patterns = listToArray(arguments.filters, arguments.delim);
dir.setSimpleNameFilter( patterns, arguments.makeExclusive, arguments.filterCase );
}

if (structKeyExists(arguments, "regex")) {
dir.setRegexNameFilter( arguments.regex );
}

// do the search
result = dir.search( arguments.directory );

// convert results into a query object
qry = queryNew("");
queryAddColumn( qry, "NAME", result.getNameList() );
queryAddColumn( qry, "DIRECTORY", result.getDirectoryList() );
queryAddColumn( qry, "SIZE", result.getSizeList() );
queryAddColumn( qry, "TYPE", result.getTypeList() );
queryAddColumn( qry, "DATELASTMODIFIED", result.getModifiedList() );
queryAddColumn( qry, "FULLPATH", result.getPathList() );
</cfscript>

<cfreturn qry />
</cffunction>


0 comments:

  © Blogger templates The Professional Template by Ourblogtemplates.com 2008

Header image adapted from atomicjeep