Friday, April 24, 2009

ColdFusion: In search of Word/RTF to PDF Converters

So I have foolishly wandered into the open source jungle, in search of non-commercial Word/RTF to PDF converters. While I have not found the holy grail (or perfect solution) yet, it has been interesting learning about some of the different projects. I particularly liked the flying saucer project. (Though it essentially serves the same purpose as cfdocument, you have gotta' love that name!)

For those that have already gone down this route, you probably will not read anything new here. But I thought I would write up some of my findings and experiments in the event it is helpful to others on a similar quest.

Where to start
I rolled the dice and started with google and sourceforge.net. After looking over about a gazillion projects I came to the conclusion there were two main possibilities:

  1. Using a single application (typically an executable) to do a direct conversion: rtf to pdf
  2. Using a combination of tools: converting rtf to html/xml -> then html/xml to pdf
I decided to try a few approaches and compare the results, pros and cons of each. The tools I tested were a combination of old and new. I suspected some would work better than others, but decided to try them all for comparison purposes. Note, the advantages and disadvantages are just my own opinions based on my tests.


RTFEditorKit/HTMLEditorKit (Poor man's converter)
RTFEditorKit and HTMLEditorKit are two core java classes that can be used to read an rtf file and convert it to html, and finally to pdf using cfdocument.
Advantages:
This is probably one of the simplest options. It can be used right out of the box. No additional jars or programs are required.

Disadvantages:
There are some bugs and limitations to these classes. For example, the RTFEditorKit does not handle images in rtf files. But in a pinch, it might work for a down and dirty converter.
Example:
<cfscript>
inputFile = ExpandPath("sample.rtf");
outputFile = ExpandPath("./sample_converted.pdf");

// create editor objects used for the conversion
fis = createObject("java", "java.io.FileInputStream").init( inputFile );
rtfEditor = createObject("java", "javax.swing.text.rtf.RTFEditorKit");
htmlEditor = createObject("java", "javax.swing.text.html.HTMLEditorKit");

// create a default document and load the rtf file
document = rtfEditor.createDefaultDocument();
rtfEditor.read(fis, document, 0);

// convert the document to html
stringWriter = createObject("java", "java.io.StringWriter").init( document.getLength() );
htmlEditor.write(stringWriter, document, 0, document.getLength());

// get the html content as a string
htmlOutput = stringWriter.getBuffer().toString();
</cfscript>

<cfoutput>
<cfdocument format="pdf" filename="#outputFile#" overwrite="true">
#htmlOutput#
</cfdocument>
Finished converting file: #outputFile#
</cfoutput>


Majix
Majix is a java library for converting rtf files to xhtml. The resulting xhtml can be tweaked and used to create a pdf with the help of cfdocument.

As there were several rtf to html/xml converters, I decided to pick only one. Since Seth Duffey had already posted a great example on using Majix with ColdFusion, I used it rather than reinventing the wheel. (Just keep in mind it is an older blog entry)

Advantages:
Overall, it worked relatively well in my small tests, and does not require installing an executable program. Behavior can be customized somewhat via xsl.

Disadvantages:
The project has not been updated in a few years. It was not always able to handle some of the more exotic rtf's file I threw at it.

There was also one thing in the source that bothered me a bit: the application's use of System.exit(...). In my initial testing, I used some wrong parameters and ending up shutting down the jvm, and ColdFusion with it. Now in all fairness, the application was probably geared towards desktop usage, where exiting is less critical. Plus your typical CF sandbox settings might not allow this behavior anyway. However, I would probably modify the source, or add a SecurityManager , just to be safe.

POI/docx4j
I looked into these two options for possible doc to pdf conversion. But unfortunately I have not had much luck so far. I had problems with several documents right- off-the-bat. Though that may well have been due to my own ignorance of the api's. So I decided to come back to these tools after I have studied them a bit more.
iText
Since iText is part of what powers cfdocument, and supports the creation of rtf files, I had a look around to see if it provided any options for converting rtf to pdf format. Though it is not quite complete, this feature is apparently in the works. So I am keeping this option in mind for the future.

Advantages:
More stable/mature product already part of ColdFusion.

Disadvantages:
It is not ready yet ;)

Update November 19, 2009: Unfortunately, it looks like RTF will be abandoned and moved to an incubator project.
http://www.mail-archive.com/itext-questions%40lists.sourceforge.net/msg47892.html

JODConverter ( ..all roads lead to rome)
Searching the java and ColdFusion forums led me back to the JODConverter on sourceforge.net and eventually a helpful example on Todd Sharp's blog. For basic file conversion, the JODConverter is a lot simpler to use than OpenOffice API. Plus their project site has excellent documentation on how to install and use the JODConverter.

Within minutes I was converting different file formats to pdf: including doc, xls, and docx files. While I am sure it has some quirks too, overall, the output quality was the best of all the options I tried.

Advantages:
Simple interface and supports the conversion of multiple formats, not just doc or rtf.

Disadvantages:
It requires installing OpenOffice and running the program as a service. Conversions from one format to another may be a bit slow on less powerful machines.

Update: There is a newer version of the JODConverter available on Google Code


Example - (Using command line option)
Note: OpenOffice must be running as a service or this will not work

<!--- grab the path to java.exe --->
<cfset system = createObject("java", "java.lang.System")>
<cfset pathToJavaExe = system.getProperty("java.home") &"/bin/java.exe">

<!--- create path to input and output files --->
<cfset inputFile  = ExpandPath("./CFFAQ_Test.rtf")>
<cfset outputFile = ExpandPath("./CFFAQ_Test_Jar_converted.pdf")>

<!---
Construct the path to jodconverter jars.

Example:  My jar files are stored beneath the web root
c:\coldFusion8\wwwroot\jodconverter-2.2.2\lib\jodconverter-cli-2.2.2.jar
c:\coldFusion8\wwwroot\jodconverter-2.2.2\lib\jodconverter-2.2.2.jar
... etcetera ...
--->
<cfset pathToJodJar  = ExpandPath("/jodconverter-2.2.2/lib/jodconverter-cli-2.2.2.jar")>

<!---
Construct the command to convert a single file

ie: java -jar c:\pathTo\jodconverter-cli-2.2.0.jar c:\myInputFile.doc myOuputFile.pdf
--->
<cfset argString = '-jar #pathToJodJar# -v "#inputFile#" "#outputFile#"'>

<cfexecute name="#pathToJavaExe#"
arguments='#argString#'
timeout="60" />

<cfif FileExists(outputFile)>
Success. File Created <cfoutput>#outputFile#</cfoutput>
<cfelse>
Error. Unable to create file
</cfif>


That is all she wrote ..

I will probably post more about POI and docx4j later. (Once I figure it out.) Hopefully my trials, tribulations and explorations were helpful to someone.

Update: I finally had to give up on using docx4j with Adobe ColdFusion. It is nothing against docx4j. But there were just too many "jar hell" type conflicts with CF's own internal jars

Cheers



...Read More

Embedding PDF form in CFDOCUMENT - java.lang.NullPointerException

I came across an interesting post on the adobe forums, about embedding a pdf form in cfdocument. Never having that exact need before, I was a little surprised to discover it was possible.

But when I tested it, I did encounter one issue. As mentioned in the documentation, you must use at least one cfdocumentsection tag alongside the cfpdfform tag. But apparently it also needs to be placed before the cfpdform tag. When I placed it after cfpdfform tag and tried to write the output to a file, an error occurred

An exception occurred when performing document processing. The cause of this exception was that:

java.lang.NullPointerException at com.adobe.internal.pdftoolkit.services.manipulations.impl.PMMPages.propagateAnnots(Unknown Source)


Save pdf to a file:

<!---
This FAILS
--->
<cfdocument format="pdf" filename="#pathToOutputFile#" overwrite="true">
<cfpdfform action="populate" source="#pathToForm#">
<cfpdfformparam name="firstName" value="Cookie">
<cfpdfformparam name="lastName" value="Monster">
</cfpdfform>
<cfdocumentsection>
<p>Some content</p>
</cfdocumentsection>
</cfdocument>

<!---
This works
--->
<cfdocument format="pdf" filename="#pathToOutputFile#" overwrite="true">
<cfdocumentsection>
<p>Some content</p>
</cfdocumentsection>
<cfpdfform action="populate" source="#pathToForm#">
<cfpdfformparam name="firstName" value="Cookie">
<cfpdfformparam name="lastName" value="Monster">
</cfpdfform>
</cfdocument>

When I attempted to display the pdf in the browser, no output appeared in Firefox. Though when I checked the CF logs they contained a java.lang.NullPointerException error.

Display in browser:

<!---
This FAILS
--->
<cfdocument format="pdf">
<cfpdfform action="populate" source="#pathToForm#">
<cfpdfformparam name="firstName" value="Cookie">
<cfpdfformparam name="lastName" value="Monster">
</cfpdfform>
<cfdocumentsection>
<p>Some content</p>
</cfdocumentsection>
</cfdocument>

<!---
This works
--->
<cfdocument format="pdf">
<cfdocumentsection>
<p>Some content</p>
</cfdocumentsection>
<cfpdfform action="populate" source="#pathToForm#">
<cfpdfformparam name="firstName" value="Cookie">
<cfpdfformparam name="lastName" value="Monster">
</cfpdfform>
</cfdocument>

The one exception was when I saved the output to a variable. That did work without error.

Save output to variable:

<!---
This works
--->
<cfdocument format="pdf" name="pdfOutput">
<cfpdfform action="populate" source="#pathToForm#">
<cfpdfformparam name="firstName" value="Cookie">
<cfpdfformparam name="lastName" value="Monster">
</cfpdfform>
<cfdocumentsection>
<p>Some content</p>
</cfdocumentsection>
</cfdocument>

So I guess the options are either move the cfdocumentsection before the cfpdfform tag, or write the output to a variable.

...Read More

iText - Preview of things to come .. someday (RTF to PDF)

Update November 19, 2009: Unfortunately, it looks like RTF will be abandoned and moved to an incubator project.
http://www.mail-archive.com/itext-questions%40lists.sourceforge.net/msg47892.html

So I have been searching around for non-commercial tools for converting rtf/word documents to pdf. (Yes, I know some of you are chuckling as you read this). The search has been interesting, and though I have not found the magic bullet yet, I did learn about some neat tools along the way, which I may write about later.

While commericial tools are probably still the best option at this point, I did come across some promising updates on the iText site. They mentioned expanding the rtf functionality to include partial support for
  • reading rtf files
  • converting rtf to pdf format.

Of course it is still under development right now. But I am looking forward to the first official release. I played around with version 2.1.5 a bit, and surprisingly I was actually able to convert an rtf file to pdf. Now since the feature is not finished, I was pleased to get any output at all. Mind you some of the rtfs I tested worked a bit better than others. A few were a bit garbled, but I am impressed. It is definitely coming along nicely. I do not know where the developers find the time ..

Now, examples for the new RTF jars are understandably a little sparse at this point. (The code has undergone some drastic changes). But for the curious, here is the code I came up with. Keep in mind I am still learning from the api. So do not take this as a model for correct code usage ;)

Java Example

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.rtf.parser.RtfParser;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

public class ConvertRTFToPDF {


public static void main(String[] args) {
String inputFile = "sample.rtf";
String outputFile = "sample_converted.pdf";

// create a new document
Document document = new Document();

try {
// create a PDF writer to save the new document to disk
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputFile));
// open the document for modifications
document.open();

// create a new parser to load the RTF file
RtfParser parser = new RtfParser(null);
// read the rtf file into a compatible document
parser.convertRtfDocument(new FileInputStream(inputFile), document);

// save the pdf to disk
document.close();

System.out.println("Finished");

} catch (DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

}

ColdFusion
Requirements: iText 2.1.5 and JavaLoader

Instructions for installing a newer version of iText without breaking ColdFusion


<h1>Convert RTF to PDF Example</h1>
As of 04/24/2009, the feature is <b>not</b> fully implemented
in iText. (In other words, do not expect it to work perfectly at this
time)<br><hr>

<cfscript>
savedErrorMessage = "";

// get a reference to the javaLoader
javaLoader = server[application.myJavaLoaderKey];

// initliaze file paths
pathToInputFile = ExpandPath("./sample.rtf");
pathToOutputFile = ExpandPath("./sample.pdf");

// create a new document
document = javaLoader.create("com.lowagie.text.Document").init();

try {
// create a PDF writer to save the new document to disk
outStream = createObject("java", "java.io.FileOutputStream").init( pathToOutputFile );
PdfWriter = javaLoader.create("com.lowagie.text.pdf.PdfWriter");
writer = PdfWriter.getInstance( document, outStream );

// open the document for modifications first
document.open();

// create a new parser to load the RTF file
parser = javaLoader.create("com.lowagie.text.rtf.parser.RtfParser").init( javacast("null", "") );

// read the rtf file into the document
inStream = createObject("java", "java.io.FileInputStream").init( pathToInputFile );
parser.convertRtfDocument( inStream, document );

// save the converted document (ie pdf) to disk
document.close();

WriteOutput("Finished! File saved: "& pathToOutputFile);

}
catch (Exception e) {
savedErrorMessage = e;
}

// always close the streams
if ( isDefined("inStream") )
{
inStream.close();
}
if ( isDefined("outStream") )
{
outStream.close();
}
</cfscript>

<!--- show any errors --->
<cfif len(savedErrorMessage) gt 0>
Error - unable to create document
<cfdump var="#savedErrorMessage#">
</cfif>

...Read More

Tuesday, April 21, 2009

SOT: Watch those java jars (follow the bouncing server)

Well this is a new one for me. Earlier I was testing a java jar I had read about on an older blog entry. But when I ran the test code the sample output I was expecting never appeared. Just a lot of whitespace and the standard debug output. At first I thought maybe I had used the wrong parameters or that the jar had changed. But there was no error anywhere. Not even in the logs. At least as far as I could tell.

Finally, I checked the windows services panel and noticed the ColdFusion 8 service said "Starting ...". After it started, I ran the script again and re-checked the services panel. Sure enough the status of the ColdFusion service was back to "Starting..". What the heck ...?

After scratching my head for a few minutes, I finally checked the jar source. (Thank goodness for open source!). That is when I noticed System.exit(..) was called in several places, for some reason. Perhaps because it has a gui component as well. So instead of throwing an exception, the code shut down the jvm ... and of course the ColdFusion Server! Shutting down might be okay for desktop usage, but it is usually an undesirable behavior on a server.

Today's note to self: Always check the java source of new jars. Especially those with gui components. Bad System.exit(..)!

...Read More

Sunday, April 19, 2009

ColdFusion: Small tip for using java objects

While searching the documentation the other day, I came across this small item that falls under the "I cannot believe I never knew that" category.

ColdFusion can automatically invoke getPropertyName() and setPropertyName(value) methods if a Java class conforms to the JavaBeans pattern. As a result, you can set or get the property by referencing it directly, without having to explicitly invoke a method.

In other words, in some cases you can just use myObject.propertyName instead of myObject.getPropertyName() or myObject.setPropertyName(). I do not know that I will use it just yet. But it is good to know, and might come in handy down the road ;)


Example using java.net.URL

<cfset urlString = "http://www.google.com/intl/en_ALL/images/logo.gif">
<cfset imageURL = createObject("java", "java.net.URL").init(urlString)>
<cfset img = createObject("java", "javax.imageio.ImageIO").read(imageURL)>
<cfdump var="#imageURL#" label="Methods of java.net.URL class"><br>

<b>Using Methods: getPropertyName</b><hr>
<cfoutput>
imageURL.getHost() = #imageURL.getHost()#<br>
imageURL.getPath() = #imageURL.getPath()#<br>
imageURL.getPort() = #imageURL.getPort()#<br>
imageURL.getProtocol() = #imageURL.getProtocol()#<br>
imageURL.getFile() = #imageURL.getFile()#<br>
</cfoutput>
<br><br>

<b>Using Properties: PropertyName</b><hr>
<cfoutput>
imageURL.Host = #imageURL.Host#<br>
imageURL.Path = #imageURL.Path#<br>
imageURL.Port = #imageURL.Port#<br>
imageURL.Protocol = #imageURL.Protocol#<br>
imageURL.File = #imageURL.File#<br>
</cfoutput>

...Read More

Friday, April 17, 2009

MS SQL 2005: More on the underused OUTPUT clause

In a previous entry, I mentioned the sometimes under-used OUTPUT clause as an alternative to CF8's result.IDENTITYCOL. If you have used MS SQL for a while, you are no doubt aware of it already. But for those that are not, OUTPUT can be very useful. Especially when working with multiple records.

When manipulating sets of data, I often need to answer questions like "which records were updated?", "what were the original values, before the update?", "which records were was deleted?". The OUTPUT clause lets you do just that.

When using OUTPUT, two special prefixes are available: Inserted and Deleted. The Inserted prefix is used to obtain values that were added. Such as a newly inserted record, or the new value of updated records. The Deleted prefix is used to grab information about deleted records, or the original values of updated records.

Grabbing the new ids of multiple records
Say you are inserting multiple records into a table, and need to grab the ids of the new records. Simply add an OUTPUT clause and use the Inserted prefix, to grab the new ids. You can either return the values as a query, or insert them into another table:

Example


-- Create sample table
-- Use a table variable (no messy clean up)
DECLARE @Customer TABLE
(
CustomerID INT IDENTITY(1,1),
FirstName VARCHAR(50),
LastName VARCHAR(50)
)

--- Create a table for storing the new id's
DECLARE @NewRecords TABLE
(
InsertedCustomerID INT
)

--- A) Insert the new ids into a table variable
INSERT INTO @Customer ( FirstName, LastName )
OUTPUT Inserted.CustomerID INTO @NewRecords ( InsertedCustomerID )
SELECT 'Alan', 'Mercury' UNION ALL
SELECT 'Michelle', 'Dupree' UNION ALL
SELECT 'Josh', 'Echor' UNION ALL
SELECT 'Alan', 'Roberts'

--- Display the id's of the inserted records
SELECT InsertedCustomerID
FROM @NewRecords

--- B) Return the new ids as a resultset
INSERT INTO @Customer ( FirstName, LastName )
OUTPUT Inserted.CustomerID
SELECT 'Alan', 'Mercury' UNION ALL
SELECT 'Michelle', 'Dupree' UNION ALL
SELECT 'Josh', 'Echor' UNION ALL
SELECT 'Alan', 'Roberts'


Finding out which records changed
You can apply the same technique to UPDATE statements. But with updates, you can take advantage of both prefixes: using the Deleted prefix to grab the original values (before the update) and Inserted to grab the new values (after the update).



--- Change a few records
UPDATE @Customer
SET FirstName = 'Adam'
OUTPUT Deleted.CustomerID AS UpdatedCustomerID,
Deleted.FirstName AS OldFirstNameValue,
Inserted.FirstName AS NewFirstNameValue
WHERE FirstName = 'Alan'


Determing which records were deleted
Of course the same applies to DELETE statements. Just use the Deleted prefix to return information about the deleted records.


--- Delete some records randomly
DELETE FROM @Customer
OUTPUT Deleted.CustomerID AS DeletedCustomerID,
Deleted.FirstName AS DeletedFirstName,
Deleted.LastName AS DeletedLastName
WHERE LastName LIKE '%o%'


Just another tool to keep in mind when using 2005 ;)

...Read More

MS SQL 2005: The under-used OUTPUT clause

While not new, some of the nicer features first introduced with MS SQL 2005 are often overlooked. After spending time working with different, or older, databases I sometimes forget about them myself. One of the more useful features worth remembering is the OUTPUT clause.

If you are not familiar with it, OUTPUT clause allows you to retrieve information about records that were just inserted, updated or deleted. This information can be returned as resultset or even inserted into another table (or table variable). It is incredibly useful when working multiple records, but comes in handy in other situations as well.

result.IDENTITYCOL alternative
One example is with INSERT statements. While ColdFusion 8 introduced the much needed result attribute (which can return the ID of a newly inserted record) it has a few limitations. It only works for single record inserts, and also has a few quirks. The OUTPUT clause offers an interesting alternative to using CF 8's result.IDENTITYCOL value.

Take the simple insert below that adds a record into a table with an identity column. By adding an OUTPUT clause, and using the special Inserted prefix, you can access the id of the newly created record (or any other value in the record).

By default, OUTPUT returns the information as resultset. So you end up with a query containing the new record ID. Just as if you were using a SELECT statement instead. On the "upside", this seems to work consistently with the CF 8, MS SQL JDBC 1.0 and 1.2 drivers, unlike the result attribute. Unfortunately ... it did not work with the jtds driver. At least not in my tests. Still an interesting alternative to result.IDENTITYCOL.

Using OUTPUT


CREATE TABLE Customer
(
CustomerID INT IDENTITY(1,1),
FirstName VARCHAR(50),
LastName VARCHAR(50)
)

<cfquery name="addRecord" datasource="#dsn#">
INSERT INTO Customer ( FirstName, LastName )
OUTPUT Inserted.CustomerID
VALUES
(
<cfqueryparam value="Alan" cfsqltype="cf_sql_varchar"> ,
<cfqueryparam value="Mercury" cfsqltype="cf_sql_varchar">
)
</cfquery>

<cfdump var="#addRecord#" label="Show the new Record ID">


One of these days I will have chart out all of the options, the various issues with each, and see which one equates to the least headaches.

* Improving inserts, updates and deletes with OUTPUT

...Read More

SOT: jQuery Sparklines

I just came across this cool jQuery plugin for creating sparklines and thought I would pass it along for anyone else living under a jQuery rock like me.

http://omnipotent.net/jquery.sparkline/

...Read More

Tuesday, April 14, 2009

Concatenating values in SQL

A relatively common need is concatenating a bunch values in a query. While cfoutput with "group" is one option, there are also some elegant sql options. Ones that do not involve temp tables or descending into cursor psychosis. There are different reasons for each, but if you are not aware of the sql options, Barney Boisvert sums them up quite nicely in this recent post at houseoffusion.com:

http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:59194

...Read More

  © Blogger templates The Professional Template by Ourblogtemplates.com 2008

Header image adapted from atomicjeep