So I have foolishly wandered into the open source jungle, in search of non-commercial Word/RTF to PDF converters. While I have not found the holy grail (or perfect solution) yet, it has been interesting learning about some of the different projects. I particularly liked the flying saucer project. (Though it essentially serves the same purpose as cfdocument, you have gotta' love that name!)
For those that have already gone down this route, you probably will not read anything new here. But I thought I would write up some of my findings and experiments in the event it is helpful to others on a similar quest.
Where to start
I rolled the dice and started with google and sourceforge.net. After looking over about a gazillion projects I came to the conclusion there were two main possibilities:
- Using a single application (typically an executable) to do a direct conversion: rtf to pdf
- Using a combination of tools: converting rtf to html/xml -> then html/xml to pdf
RTFEditorKit/HTMLEditorKit (Poor man's converter)
RTFEditorKit and HTMLEditorKit are two core java classes that can be used to read an rtf file and convert it to html, and finally to pdf using cfdocument.
This is probably one of the simplest options. It can be used right out of the box. No additional jars or programs are required.
There are some bugs and limitations to these classes. For example, the RTFEditorKit does not handle images in rtf files. But in a pinch, it might work for a down and dirty converter.
<cfscript> inputFile = ExpandPath("sample.rtf"); outputFile = ExpandPath("./sample_converted.pdf"); // create editor objects used for the conversion fis = createObject("java", "java.io.FileInputStream").init( inputFile ); rtfEditor = createObject("java", "javax.swing.text.rtf.RTFEditorKit"); htmlEditor = createObject("java", "javax.swing.text.html.HTMLEditorKit"); // create a default document and load the rtf file document = rtfEditor.createDefaultDocument(); rtfEditor.read(fis, document, 0); // convert the document to html stringWriter = createObject("java", "java.io.StringWriter").init( document.getLength() ); htmlEditor.write(stringWriter, document, 0, document.getLength()); // get the html content as a string htmlOutput = stringWriter.getBuffer().toString(); </cfscript> <cfoutput> <cfdocument format="pdf" filename="#outputFile#" overwrite="true"> #htmlOutput# </cfdocument> Finished converting file: #outputFile# </cfoutput>
Majix is a java library for converting rtf files to xhtml. The resulting xhtml can be tweaked and used to create a pdf with the help of cfdocument.
As there were several rtf to html/xml converters, I decided to pick only one. Since Seth Duffey had already posted a great example on using Majix with ColdFusion, I used it rather than reinventing the wheel. (Just keep in mind it is an older blog entry)
Overall, it worked relatively well in my small tests, and does not require installing an executable program. Behavior can be customized somewhat via xsl.
The project has not been updated in a few years. It was not always able to handle some of the more exotic rtf's file I threw at it.
There was also one thing in the source that bothered me a bit: the application's use of System.exit(...). In my initial testing, I used some wrong parameters and ending up shutting down the jvm, and ColdFusion with it. Now in all fairness, the application was probably geared towards desktop usage, where exiting is less critical. Plus your typical CF sandbox settings might not allow this behavior anyway. However, I would probably modify the source, or add a SecurityManager , just to be safe.
I looked into these two options for possible doc to pdf conversion. But unfortunately I have not had much luck so far. I had problems with several documents right- off-the-bat. Though that may well have been due to my own ignorance of the api's. So I decided to come back to these tools after I have studied them a bit more.iText
Since iText is part of what powers cfdocument, and supports the creation of rtf files, I had a look around to see if it provided any options for converting rtf to pdf format. Though it is not quite complete, this feature is apparently in the works. So I am keeping this option in mind for the future.
More stable/mature product already part of ColdFusion.
It is not ready yet ;)
Update November 19, 2009: Unfortunately, it looks like RTF will be abandoned and moved to an incubator project.
JODConverter ( ..all roads lead to rome)
Searching the java and ColdFusion forums led me back to the JODConverter on sourceforge.net and eventually a helpful example on Todd Sharp's blog. For basic file conversion, the JODConverter is a lot simpler to use than OpenOffice API. Plus their project site has excellent documentation on how to install and use the JODConverter.
Within minutes I was converting different file formats to pdf: including doc, xls, and docx files. While I am sure it has some quirks too, overall, the output quality was the best of all the options I tried.
Simple interface and supports the conversion of multiple formats, not just doc or rtf.
It requires installing OpenOffice and running the program as a service. Conversions from one format to another may be a bit slow on less powerful machines.
Update: There is a newer version of the JODConverter available on Google Code
Example - (Using command line option)
Note: OpenOffice must be running as a service or this will not work
<!--- grab the path to java.exe ---> <cfset system = createObject("java", "java.lang.System")> <cfset pathToJavaExe = system.getProperty("java.home") &"/bin/java.exe"> <!--- create path to input and output files ---> <cfset inputFile = ExpandPath("./CFFAQ_Test.rtf")> <cfset outputFile = ExpandPath("./CFFAQ_Test_Jar_converted.pdf")> <!--- Construct the path to jodconverter jars. Example: My jar files are stored beneath the web root c:\coldFusion8\wwwroot\jodconverter-2.2.2\lib\jodconverter-cli-2.2.2.jar c:\coldFusion8\wwwroot\jodconverter-2.2.2\lib\jodconverter-2.2.2.jar ... etcetera ... ---> <cfset pathToJodJar = ExpandPath("/jodconverter-2.2.2/lib/jodconverter-cli-2.2.2.jar")> <!--- Construct the command to convert a single file ie: java -jar c:\pathTo\jodconverter-cli-2.2.0.jar c:\myInputFile.doc myOuputFile.pdf ---> <cfset argString = '-jar #pathToJodJar# -v "#inputFile#" "#outputFile#"'> <cfexecute name="#pathToJavaExe#" arguments='#argString#' timeout="60" /> <cfif FileExists(outputFile)> Success. File Created <cfoutput>#outputFile#</cfoutput> <cfelse> Error. Unable to create file </cfif>
That is all she wrote ..
I will probably post more about POI and docx4j later. (Once I figure it out.) Hopefully my trials, tribulations and explorations were helpful to someone.
Update: Related Entries
ColdFusion: Experiment Converting MS Word to HTML/PDF (At Last) - Part 1