Automating Microsoft Word with .NET

7 07 2009

Well this week I was in the technical world and getting my hands dirty. A couple of our products seamlessly integrate with Microsoft Office, which works great for the business at the end of the day. So because of this we have our own technical service that effectively acts as the service that integrates with Word. However, something a little different was required. A client required us to automatically generate ordering catalogues, and we are talking word documents with around some 200 pages. This automation would happen within our LOB .NET application.

Their formatting was very specific and in some cases dependent on the type of products being added to the catalogue. Because of this, there isn’t the typical option of using bookmarks and or data fields within a template, especially as I wanted to use a single generic template for all their catalogues (they have 5 which they produce every 6 months)…

 

Referencing the Word Object Library

This is quick and easy, and I am presuming you know how to use Visual Studio. In your project choose to add a reference. Select the COM tab, then make your way down to “Microsoft Office 12.0 Object Library” (Your version depends on your version of word installed on your machine).

 

Referencing Word from .NET

Referencing Word from .NET

NB: Notice once you have added this you will have two references in your project to Microsoft Word. Look at the path and you will see they have 2 different locations, one of which is within “Microsoft.Office.Interop.Word”

 

Imports statement

Its good practice to import in the namespace you are working with into your class. So at the top of your class add the imports statement (VB.NET):

Imports Microsoft.Office.Interop

 

Opening a word document / template

Some posts talk about how to create a word document on the fly. This is great, but it’s much better and easier, if you have a document to work with from the beginning. A template is great as you can then utilise your own styles from within that document from your .NET code. In my own case, I created a word document that contained only a table of contents and a second page. The document is blank, but does consist of some customised styles that I have created. Also, our template is stored within our workFile ECM repository, so I can grab a copy of the template from the repository anytime with our application and use it.

You will need to create a Word.Application object in your code. I like to set up a module variable that holds my word application object and my word document. In addition I choose to open the word document in its own procedure, again coding practice. Also remember I am inserting this code into an already existing word service layer that we have written ….

Private Sub openWordDocument()
 
        'check if we have an active word application object
        If myWordApplication Is Nothing Then
            myWordApplication = New Word.Application
        End If
 
        'set the word document
        myWordDocument = myWordApplication.Documents.Open(myWordDocFile)
 
End Sub

MyWordDocFile in this case is a string value, which is the location and name of the document I wish to work with.

 

Inserting a paragraph, text and style

Now if you don’t have the luxury of using bookmarks or data fields, simply because you are not sure what text you are inserting, you are going to need to create paragaphs, text lines and give them some form of style.

Creating a paragraph object is easy, however, make sure you are inserting it into the document where you want. Typically this will be the end of your document as you are appending to it. Remember in the code below, myWordDocument is the actual word document itself we opened earlier.

Public Sub insertParagraph(ByVal pText As String, Optional ByVal pStyleName As String = vbNullString)
 
        Dim para As Word.Paragraph = myWordDocument.Content.Paragraphs.Add(myWordDocument.Bookmarks.Item("\endofdoc").Range)
 
        Try
            para.Range.Text = pText
            para.Range.Style = pStyleName
            para.Range.InsertParagraphAfter()
        Catch ex As Exception
            System.Diagnostics.Debug.WriteLine(ex.Message)
            Throw ex
        Finally
            Marshal.ReleaseComObject(para)
            para = Nothing
            GC.WaitForPendingFinalizers()
            GC.Collect()
        End Try
 
    End Sub

Points to notice here.

  1. A paragraph is a “range”, if you like, a selection within the word document.
  2. We simply set the text and style value for the paragraph
  3. Insert a paragraph after so we are again at the end of the document and working in a new paragraph. (next time we call this method)
  4. I choose to marshal out of memory the paragraph object. This is because it is a com based object and as such we can get some memory issues and weird errors being raised by the dll when dealing with larger document generation.
  5. I use the garbage can to ensure everything is cleaned up properly (this isnt over the top, as without it I received error messages for larger documents – such as “The callee refused the call”. Nice…)

 

Inserting text in a range without a new paragraph

If you simply want to add text and dont want to create a new paragraph, then again, you need to create a range, however this time it is just a range (not a paragraph).

Public Sub insertTextLine(ByVal pText1 As String, ByVal pText2 As String, ByVal pText3 As String)
 
        Dim textPart1 As Word.Range = myWordDocument.Bookmarks.Item("\endofdoc").Range
        Dim textPart2 As Word.Range
        Dim textPart3 As Word.Range
 
        Try
            textPart1.Style = "BookTitle"
            textPart1.Bold = True
            textPart1.InsertAfter(pText1)
            textPart1.Bold = True
 
            If pText2 <> vbNullString Then
                textPart2 = myWordDocument.Bookmarks.Item("\endofdoc").Range
                textPart2.Bold = False
                textPart2.InsertAfter(" " & pText2)
            End If
 
            If pText3 <> vbNullString Then
                textPart3 = myWordDocument.Bookmarks.Item("\endofdoc").Range
                textPart3.Bold = False
                'two tabs to the correct location
                '----------
                textPart3.InsertAfter(vbTab & pText3)
            End If
 
            'insert a new paragraph...
            textPart3.InsertParagraphAfter()
 
        Catch ex As Exception
            System.Diagnostics.Debug.WriteLine(ex.Message)
            Throw ex
        Finally
            If Not textPart1 Is Nothing Then
                Marshal.ReleaseComObject(textPart1)
                textPart1 = Nothing
            End If
            If Not textPart2 Is Nothing Then
                Marshal.ReleaseComObject(textPart2)
                textPart2 = Nothing
            End If
            If Not textPart3 Is Nothing Then
                Marshal.ReleaseComObject(textPart3)
                textPart3 = Nothing
            End If
            GC.WaitForPendingFinalizers()
            GC.Collect()
        End Try
 
    End Sub

What we have done here is effecitvely appended 3 text values into a single text entry within our word document. Notice that by using “InsertAfter” from our Range object, we are literally inserting text, no paragraphs. Also I have used vbTab to space out the value. My word document has a set location for a tab entry within the selected style, this ensures I know where the text will be inserted in that line.

Again ensure you clean up your objects and marshal them out of memory.

 

Saving your document

In my case, we are working with a temporary file that has been copied locally from the workFile repository. You may be working with just a template though sitting on your hard drive somewhere, so make sure you dont save your document over the top of that template! School boy error if you do…..

Saving the file is real easy, provide your directory and file name and you are almost complete:

Public Sub saveCatalogue (ByVal pCatalogueName As String, ByVal pCatalogueLocation As String)
 
        If Not System.IO.Directory.Exists(pCatalogueLocation) Then
            System.IO.Directory.CreateDirectory(pCatalogueLocation)
        End If
 
        myWordDocument.SaveAs(pCatalogueLocation & pCatalogueName)
        myWordDocument.Close(False)
 
    End Sub

 

Tidy up memory and word

Your file has been saved, but you have yet to finish. If you look in task manager you will notice that WINWORD.exe is still running, and its memory size could be quite large. If you don’t kill this off correctly and you continue to create word documents in this fashion you will cause havoc with performance. So, we have to clean up after ourseleves.

    Private Sub closeWord()
 
        On Error Resume Next
 
        'quit the word application
        '----------------------------------------------------------
        myWordApplication.Quit()
 
      
        'marshal out the com objects, dont want any memory leaks here...
        If Not IsNothing(myWordDocument) Then
 
            If Not IsNothing(myWordDocument.Fields) Then
                Marshal.ReleaseComObject(myWordDocument.Fields)
            End If
 
        End If
 
 
        Marshal.ReleaseComObject(myWordDocument)
        Marshal.ReleaseComObject(myWordApplication)
 
 
        myWordDocument = Nothing
        myWordApplication = Nothing
 
 
    End Sub

Quit word then clean up….Again we are marshalling out objects from memory and cleaning up everyting.

 

Using fields

If you have the luxury of knowing the format of the document (such as populating an invoice, a letter etc), then you can use fields to make life a lot easier for you. Again, set up a template word document with the content you desire. For that content which is to be added dynamically, insert a data field. (See help within your version of Office to do this).

From .NET when you have the document open, you can now loop round or search for those fields on the document. Fields are found within the word document object itself, and is a collection of Word.Field objects.

You can then update the field text and carry on….See the sample line of code below, which is using an invoice reference to insert into the data field.

field.Result.Text = CStr(invoiceRef)

 

Conclusion

Word is great to automate and can be very powerful for your .NET applications. Sometimes you may struggle to find great documentation on this, however, its worth searching for…Just remember, always clean up your code and look after your memory, if you dont, you will get some weird and wonderful error messages once processing larger files…