Friday 30 January 2009

Using Microsoft Excel to feed batches of files to a command-line utility

Overview:

This is a kind of Swiss Army Knife: whenever I need to process batches of files, I pull out this simple application and tweak the code. I don't recommend it as production code - there are far better ways of doing this on a server - but ad-hoc jobs are the daily bread of a desktop developer. You could write it as a script - and a sysadmin would do exactly that - but Excel was the tool to hand when I realised that I had to do this job every week or so.


Highlights for novice VBA coders:

  • Looping through the files in a folder using VBA.FileSystem.Dir()
  • Selecting and opening folders with the Office.FileDialog Object
  • Using the Windows Scripting Host Object Model - wshom.ocx:
    • The File System Object - IWshRuntimeLibrary.FileSystemObject
    • The Execution Object - IWshRuntimeLibrary.WshExec
    • The Shell Object - IWshRuntimeLibrary.WshShell
  • Reading STDIO, the standard input/output stream of a command-line executable
  • Reading and logging STERR, the error message stream of a command-line executable.

The background to this...

Processing batches of files... Don't we just love the daily grind of the COBOL era? Except that in the real world, almost every trading and invoicing system has some kind of back-end audit and reconciliation function, separated from the end-to-end streams of XML that you hope are running smoothly in the foreground. These audit systems tend to run on FTP file drops from counterparties - brokers, custodians, banks providing prime brokerage services etc - and they form part of every company's overnight processing.


Mostly, we don't see any of this ('We' as in developers and users: sysadmins see a lot of it). But every now and again, you'll be asked for a one-off job of loading or reprocessing a bunch of files where the regular process has failed, or was never written because someone in the Back Office only does it once a year.

The worst of these jobs consist of feeding files into a command-line executable; this particular example involves decrypting them with pgp.exe - and no, I didn't choose to do it that way, but getting a pgp add-in and transferring the keyring was just too much work. The short version is, every single file in a daily batch of 27 needs a long and fiddly command-line with five parameters, and I've automated the process with a cuddly point-and-click interface.

You can adapt this code to work with any command-line executable, even the good old DOS COPY command if you feel that the exercise will be good for you. Ambitious students can implement a drag-and drop, using controls available in MSForms.2 - please put the code into a comment when you do, so we can all see how you did it: the drag-and-drop events don't quite do what you expect in Excel!


Here's a picture of a spreadsheet used as a form:


The sheet or userform looks like this

The VBA code behind the one and only worksheet:


Feel free to put it in your own userform instead of a worksheet. Also: watch out when you copy-and-paste. I've tested this, but there's always a possibility that Blogger has reformatted a line break or inserted HTML that breaks the VBA syntax.





Option Explicit





' This is a simple batch-processing application: a wrapper for a command-line
' programme that processes all the files in a folder. The example application
' is for batch decryption using pgp.exe, but this can be adapted to almost any
' command-line application, script or shell command.

' This was built into a worksheet; if you prefer, you can put all this code
' into a userform. But watch out for the dependency on named ranges; you'll
' need to recode all this to use Textboxes or Win32 common-dialog controls.

' The following components are assumed to exist and are mandatory:

' Excel Named Ranges:


' SourceFolder
' TargetFolder

' The following components are optional but the code is difficult to understand
' if they are not included:

' MSForms2 CommandButton Objects:

' cmdGo
' cmdGetSourceFolder
' cmdGetTargetFolder
' cmdOpenSource
' cmdOpenTarget

' In order to use this specific example, you will need pgp.exe and an RSA keyring.
' DOS 'copy' would work just as well: the minimum features are that it has Command-
' Line parameters for a source file and an output file.

Private Sub cmdGo_Click()

' The Big Button Marked 'GO'
' This launches the batch process.
' Runs the command-line utility on every matching file in the source folder.

' Creates a log file.

On Error GoTo ErrSub

Const FILE_PATTERN As String = "*.pgp"
Const LOG_FILE As String = "LogFile.txt"
Const ERROR_FILE As String = "Error.txt"
Const KEYRING_NAME As String = ".PRIVATE3"

Dim strCmd As String
Dim strFile As String
Dim strFolder_Target As String
Dim strFolder_Source As String
Dim iResult As Integer
Dim iFile As Integer
Dim iFileCount As Integer
Dim strFileTarget As String
Dim boolError As Boolean

' Help on the Windows Scripting host is available here:
' http://msdn2.microsoft.com/en-us/library/98591fh7.aspx

' Windows Script Host Object Model
' C:\WINDOWS\system32\wshom.ocx

Dim objFSO As IWshRuntimeLibrary.FileSystemObject
Dim oxLogFile As IWshRuntimeLibrary.TextStream
Dim objExec As IWshRuntimeLibrary.WshExec
Dim objShell As IWshRuntimeLibrary.WshShell
Dim objNetwork As IWshRuntimeLibrary.WshNetwork

' Read the user-specified folder paths:
strFolder_Target = ThisWorkbook.Names("TargetFolder").RefersToRange.Value
strFolder_Target = Trim(strFolder_Target)


If Right(strFolder_Target, 1) <> "\" Then

strFolder_Target = strFolder_Target & "\"

End If


'If the folder does not exist, create it:
If Len(Dir(strFolder_Target, vbDirectory)) < 1 Then

MkDir strFolder_Target

End If

strFolder_Source = ThisWorkbook.Names("SourceFolder").RefersToRange.Value
strFolder_Source = Trim(strFolder_Source)

If Right(strFolder_Source, 1) <> "\" Then

strFolder_Source = strFolder_Source & "\"

End If

' Create File system objects:
Set objNetwork = New IWshRuntimeLibrary.WshNetwork
Set objShell = New IWshRuntimeLibrary.WshShell
Set objFSO = New IWshRuntimeLibrary.FileSystemObject

' Create a log file:
Set oxLogFile = objFSO.OpenTextFile(strFolder_Target & "LogFile.txt", ForWriting, True, TristateUseDefault)

oxLogFile.WriteLine Now() & vbTab & "START FOLDER" & vbTab & strFolder_Source

iFileCount = objFSO.GetFolder(strFolder_Source).Files.Count

' Select the first of the source files for your command-line app.
strFile = Dir(strFolder_Source & FILE_PATTERN) ' Yes, it seems odd to use Dir() when there's a file
' system object, but it enumerates files with wildcards

If objFSO.FileExists(strFolder_Target & ERROR_FILE) Then

objFSO.DeleteFile strFolder_Target & ERROR_FILE, True

End If

'Start loop, processing every matching file in the folder...
boolError = False
iFile = 1

Do While Len(strFile) > 1

Application.StatusBar = "Reading file " & iFile & " of " & iFileCount

oxLogFile.WriteLine Now() & vbTab & "START FILE" & vbTab & Chr(34) & strFolder_Source & strFile & Chr(34)

strFileTarget = ""
strFileTarget = strFolder_Target & Left(strFile, Len(strFile) - 4)

If objFSO.FileExists(strFileTarget) Then

On Error Resume Next

objFSO.DeleteFile strFileTarget, True

End If


If objFSO.FileExists(strFileTarget) Then

strFileTarget = strFolder_Target & "Copy_of_ & Left(strFile, Len(strFile) - 4)"

End If

' Concatenate your command-line: use chr(34) instead of playing with nested quote marks!

strCmd = ""
strCmd = strCmd & "pgp.exe -v "
strCmd = strCmd & "--decrypt "
strCmd = strCmd & Chr(34) & strFolder_Source & strFile & Chr(34)
strCmd = strCmd & " --output "
strCmd = strCmd & Chr(34) & strFileTarget & Chr(34)
strCmd = strCmd & " --passphrase "
strCmd = strCmd & Chr(34) & KEYRING_NAME & Chr(34)

iResult = 0
iResult = objShell.Run(strCmd, 0, True)

If iResult <> 0 Then 'Failed run: retry, with logging. ObjExec can see the message stream

oxLogFile.WriteLine "" & vbTab & "RETRY FILE" & vbTab & "FAILED WITH ERROR " & iResult & vbTab & strCmd

Set objExec = objShell.Exec(strCmd)

Do While objExec.Status <= WshRunning

Do Until objExec.StdOut.AtEndOfStream

oxLogFile.WriteLine "" & vbTab & "MESSAGE" & vbTab & objExec.StdOut.ReadLine

Loop

Loop


' No, this isn't redundant.
Do Until objExec.StdOut.AtEndOfStream

oxLogFile.WriteLine "" & vbTab & "MESSAGE" & vbTab & objExec.StdOut.ReadLine

Loop


Do Until objExec.StdErr.AtEndOfStream

boolError = True

oxLogFile.WriteLine "" & vbTab & "ERROR MESSAGE" & vbTab & objExec.StdErr.ReadLine

Loop

End If 'iretry <> 0

Set objExec = Nothing

oxLogFile.WriteLine Now() & vbTab & "END FILE" & vbTab & Chr(34) & strFileTarget & Chr(34)

strFile = Dir
iFile = iFile + 1

Loop

oxLogFile.WriteLine Now() & vbTab & "END FOLDER" & vbTab & objNetwork.ComputerName & vbTab & objNetwork.UserName
oxLogFile.Close

' If the error flag is TRUE, rename the logfile as the error file.
If boolError Then

objFSO.CopyFile strFolder_Target & LOG_FILE, strFolder_Target & ERROR_FILE, True

End If

ExitSub:

Application.StatusBar = False
Application.Cursor = xlDefault
Set oxLogFile = Nothing
Set objExec = Nothing
Set objFSO = Nothing
Set objShell = Nothing
Set objNetwork = Nothing

Exit Sub

ErrSub:

Resume ExitSub

End Sub



Private Sub cmdGetSourceFolder_Click()

' Open a file dialogue enabling the user to create or select a folder.
' Populates the SourceFolder named range if successful

' Recommended icon for this button is the 'folder search' magnifying glass

Dim strFolder_Source As String
Dim objFD As FileDialog
Dim vrtSelectedItem As Variant

Set objFD = Application.FileDialog(msoFileDialogFolderPicker)

strFolder_Source = ThisWorkbook.Names("SourceFolder").RefersToRange.Value

With objFD

.AllowMultiSelect = False
.InitialFileName = strFolder_Source
.Title = "Specify the source folder..."
.InitialView = msoFileDialogViewDetails

If .Show = -1 Then

If .SelectedItems.Count > 0 Then

strFolder_Source = ""
strFolder_Source = .SelectedItems(1)

If strFolder_Source <> "" Then

ThisWorkbook.Names("SourceFolder").RefersToRange.Value = strFolder_Source

End If

End If

End If

End With


End Sub



Private Sub cmdGetTargetFolder_Click()

' Open a file dialogue enabling the user to create or select a folder.
' Populates the TargetFolder named range if successful
' Recommended icon for this button is the 'folder search' magnifying glass

Dim strFolder_Target As String
Dim objFD As FileDialog
Dim vrtSelectedItem As Variant

Set objFD = Application.FileDialog(msoFileDialogFolderPicker)

strFolder_Target = ThisWorkbook.Names("TargetFolder").RefersToRange.Value

With objFD

.AllowMultiSelect = False
.InitialFileName = strFolder_Target
.Title = "Specify the Target folder..."
.InitialView = msoFileDialogViewDetails

If .Show = -1 Then

If .SelectedItems.Count > 0 Then

strFolder_Target = ""
strFolder_Target = .SelectedItems(1)

If strFolder_Target <> "" Then

ThisWorkbook.Names("TargetFolder").RefersToRange.Value = strFolder_Target

End If

End If

End If

End With

End Sub



Private Sub cmdOpenSource_Click()

' Reads the SourceFolder named range.
' Opens a Windows Explorer window displaying the folder and its contents.
' Recommended icon for this button is the standard Windows 'Open Folder'

On Error Resume Next

Dim strCmd As String
Dim strFile As String
Dim strFolder_Source As String

strFolder_Source = ThisWorkbook.Names("SourceFolder").RefersToRange.Value

strCmd = "Explorer " & Chr(34) & strFolder_Source & Chr(34)

Shell strCmd, vbNormalFocus


End Sub



Private Sub cmdOpenTarget_Click()

' Reads the TargetFolder named range.
' Opens a Windows Explorer window displaying the folder and its contents.
' Recommended icon for this button is the standard Windows 'Open Folder'

On Error Resume Next

Dim strCmd As String
Dim strFile As String
Dim strFolder_Target As String

strFolder_Target = ThisWorkbook.Names("TargetFolder").RefersToRange.Value

strCmd = "Explorer " & Chr(34) & strFolder_Target & Chr(34)

Shell strCmd, vbNormalFocus

End Sub



Private Sub RedoAllFolders()

' Redo EVERY folder
' Start with the parent folders specified in source and target
' WARNING - this overwrites everything in the target folders.
' It is inadvisable to provide the users with this function -
' leave it for administrators doing large one-off batches

Dim strFile As String
Dim strFolder_Target As String
Dim strFolder_Source As String

' Help on the Windows Scripting host is available here:
' http://msdn2.microsoft.com/en-us/library/98591fh7.aspx

' Windows Script Host Object Model
' C:\WINDOWS\system32\wshom.ocx

Dim objFSO As IWshRuntimeLibrary.FileSystemObject
Dim objSourceFolder As IWshRuntimeLibrary.Folder
Dim objSourceParentFolder As IWshRuntimeLibrary.Folder
Dim objTargetFolder As IWshRuntimeLibrary.Folder
Dim objTargetParentFolder As IWshRuntimeLibrary.Folder

Set objFSO = New IWshRuntimeLibrary.FileSystemObject

strFolder_Target = ThisWorkbook.Names("TargetFolder").RefersToRange.Value
strFolder_Target = Trim(strFolder_Target)

If Right(strFolder_Target, 1) <> "\" Then

strFolder_Target = strFolder_Target & "\"

End If

'If the folder does not exist, create it:
If Not objFSO.FolderExists(strFolder_Target) Then

MkDir strFolder_Target

End If


If Not objFSO.FolderExists(strFolder_Target) Then

MsgBox "Target folder does not exist and cannot be created: " & vbCrLf & vbCrLf & strFolder_Target, vbCritical, "No target folder..."
GoTo ExitSub

End If

Set objTargetParentFolder = objFSO.GetFolder(strFolder_Target)

strFolder_Source = ThisWorkbook.Names("SourceFolder").RefersToRange.Value
strFolder_Source = Trim(strFolder_Source)

If Right(strFolder_Source, 1) <> "\" Then

strFolder_Source = strFolder_Source & "\"

End If


If Not objFSO.FolderExists(strFolder_Source) Then

MsgBox "The source folder does not exist: " & vbCrLf & vbCrLf & strFolder_Target, vbCritical, "No target folder..."
GoTo ExitSub

End If

Application.Cursor = xlWait

Set objSourceParentFolder = objFSO.GetFolder(strFolder_Source)

For Each objSourceFolder In objSourceParentFolder.SubFolders

ThisWorkbook.Names("SourceFolder").RefersToRange.Value = objSourceFolder.Path

strFolder_Target = objFSO.BuildPath(objTargetParentFolder.Path, objSourceFolder.Name)

ThisWorkbook.Names("TargetFolder").RefersToRange.Value = strFolder_Target

cmdGo_Click

Next objSourceFolder


ExitSub:

Application.Cursor = xlDefault
Set objSourceParentFolder = Nothing
Set objSourceFolder = Nothing
Set objFSO = Nothing

Exit Sub

ErrSub:

Resume ExitSub

End Sub




A few observations...


Are there better ways of doing this? Yes, definitely. There are scripting mavens out there who could can tap out a three-line shell script that'll do the lot in seconds... But I used the tools I had to hand; you're probably not the wizard you thought you were, and will find out after an hour of fruitless debugging; and unfamiliar command-line executables with finnicky switches need to be wrapped in a way that allows you to see, clearly and plainly, exactly what it is that you're doing. Note, also, that the error-logger writes out the command line, verbatim - a habit I would urge all novice developers to adopt today, because knowing what your application did is the key to effective testing.

I've used a separate logfile in this application but, this being Excel, I might've done better to write it all out onto a worksheet. I mean, it's already open and running, why not use it? Hindsight's a wonderful thing.

Meanwhile, feel free to offer examples of other tasks that you wrapped up in Excel.

Introduction to Excellerando


Welcome to Excellerando, a blog devoted to arcane VBA-Geekery, mostly in Microsoft Excel.

I don't claim to be a definitive source on the subject (that would be the book Professional Excel Development by Stephen Bullen, Rob Bovey and John Green). Instead, you'll get odd code scraps, fixes to uncommon problems, and usable code that saves reinventing the wheel.

Much of it will have a financial and banking perspective, because that's what I've spent the last ten years doing for a living: bashing out spreadsheets and add-ins for traders, analysts, risk managers and anyone else who asks in banks and hedge funds.

Comments are welcome; questions will be answered - eventually - but your first port of call for any Excel question should be the Usenet boards or a quick search through Daily Dose of Excel, a blog that you really should be reading every day if you're programming with Excel for a living.

Meanwhile, sit back, relax, and reflect upon the knowledge that the worst job you've ever been asked to do in Excel has surely been done already by someone else, and the source code's out there. Or, quite possibly, here.