MS Word metadata with POI and ColdFusion
I have been playing around with POI's HWPF library (Horrible Word Processing Format) and found it provides an easy way to extract metadata from an MS Word file. Now CF and java gurus were probably aware of this already ;) but it was a cool find for me.
I used the JavaLoader.cfc and POI 3.0.1 from poi.apache.org. As you can see it returns the key summary information. Everything from subject and comments to number of words in the document.
Here is the sample code I used. If you already have the right version of POI installed in your classpath, you can simply replace the javaLoader.create(..) statement with a call to the createObject(..) function.
Time to see what else POI can do ;)
Code
...Read More
<!--- NOTE I am storing my javaLoader in the server scope --->
<!--- read why here; http://www.compoundtheory.com/?action=displayPost&ID=212 --->
<cfset javaLoader = server[MyUniqueKeyForJavaLoader]>
<!--- open a word document with POI and get the summary information --->
<cfset inputFilePath = ExpandPath('fromMSWord.doc')>
<cfset inputStream = createObject("java", "java.io.FileInputStream").init( inputFilePath )>
<cfset document = createObject("java", "org.apache.poi.hwpf.HWPFDocument").init( inputStream )>
<cfset summary = document.getSummaryInformation()>
<b>HWPF Summary Information:</b><br>
<cfoutput>
getSubject = #summary.getSubject()# <br>
getTemplate = #summary.getTemplate()# <br>
getAuthor = #summary.getAuthor()# <br>
getTitle = #summary.getTitle()# <br>
getSecurity = #summary.getSecurity()# <br>
getApplicationName = #summary.getApplicationName()# <br>
getKeywords = #summary.getKeywords()# <br>
getComments = #summary.getComments()# <br>
getLastAuthor = #summary.getLastAuthor()# <br>
getRevNumber = #summary.getRevNumber()# <br>
getEditTime = #summary.getEditTime()# <br>
getLastPrinted = #summary.getLastPrinted()# <br>
getCreateDateTime = #summary.getCreateDateTime()# <br>
getLastSaveDateTime = #summary.getLastSaveDateTime()# <br>
getPageCount = #summary.getPageCount()# <br>
getWordCount = #summary.getWordCount()# <br>
getCharCount = #summary.getCharCount()# <br>
</cfoutput>