Scroll to top

Bulk exporting Word documents to other formats with JavaScript


Ali - November 24, 2013 - 0 comments

JavaScript is hot commodity for server and client side web development. But when it comes to command line programming it is unlikely to ditch the likes of Python and Ruby for JavaScript.

I don’t have the luxury of picking my toolset at work. I’m a business consultant with no access to anything beyond the essential enterprise software. Heck, I don’t even have access to PowerShell. All I have left with is JavaScript. Life is fun.

I assume that I’m not the only one facing this challenge, so it may be worth sharing how I do some basic automation at work using JavaScript.

Problem

The team needed to convert a big number (200+) of Word documents saved in different formats into a single format, DOTX to be specific.

Assumptions

You have access to a Windows machine with Microsoft Office 2010 installed.

Solution

  1. Open up a text editor, paste the code in Listing 1, and save the file as bulk_word_exporter.js.
  2. Set the value of sourcePath (line 10) to the folder containing the source files. Use double backslash to separate the folders.
    In my example different file types (DOC, DOCX, DOT, etc.) are stored in different source folders and that’s why extension is added to the sourcePath. Feel free to change this.
  3. Set the value of destPath (line 11) to the folder where you want the exported files to be saved. Create the folder if it doesn’t exist.
  4. open command line and run the script:
    CScript.exe bulk_word_exporter.js

Listing 1

alert = function(s) { WScript.Echo(s) }

var fso,
   folder,
	wordDoc,
	wordApp, 
	extension = 'DOT',
	fileFormat = 1, //see http://msdn.microsoft.com/en-us/library/office/ff839952.aspx
	sourcePath = "\\SOURCE\\PATH\\" + extension + "\\",
	destPath = "\\DESTINATION\\PATH\\";

try {
	fso = new ActiveXObject("Scripting.FileSystemObject");

	wordApp = new ActiveXObject("Word.Application");
	wordApp.Visible = false;

	folder = fso.GetFolder(sourcePath); 
	files = new Enumerator(folder.files);
	
   alert("Preparing ...");
	
   var fileURI, fileName;
	for(var files = new Enumerator(folder.files); !files.atEnd(); files.moveNext()) {
		fileURI = '' + files.item();
		fileName = fileURI.substr(fileURI.lastIndexOf("\\") + 1, fileURI.length - 
            fileURI.lastIndexOf("\\") -  (fileURI.length - fileURI.lastIndexOf(".") + 1));
		
      alert("exporting '" + fileName + "'...");
		
      wordDoc = wordApp.Documents.open(fileURI);
		
      alert("opened '" + fileName + "'...");
		
      wordDoc.SaveAs(destPath + fileName + "." + extension, fileFormat);
		wordDoc.close();
		
      alert("exported '" + fileName + "' to " + extension);
	}

	alert("\n\nDone!\n");

} catch (error) {
	alert(serialize(error));
} finally {
	//close handles and cleanup the memory
	fso = null;
	if (wordDoc) {
		wordDoc.Close(0);
	}
	if (wordApp) {
		wordApp.Quit();
	}
	wordDoc = null;
	wordApp = null;
	fso = null;
}

/* serialize function, thanks to http://blog.stchur.com/2007/04/06/serializing-objects-in-javascript/ */
function serialize(_obj)
{
   // Let Gecko browsers do this the easy way
   if (typeof _obj.toSource !== 'undefined' && typeof _obj.callee === 'undefined')
   {
      return _obj.toSource();
   }

   // Other browsers must do it the hard way
   switch (typeof _obj)
   {
      // numbers, booleans, and functions are trivial:
      // just return the object itself since its default .toString()
      // gives us exactly what we want
      case 'number':
      case 'boolean':
      case 'function':
         return _obj;
         break;

      // for JSON format, strings need to be wrapped in quotes
      case 'string':
         return '\'' + _obj + '\'';
         break;

      case 'object':
         var str;
         if (_obj.constructor === Array || typeof _obj.callee !== 'undefined')
         {
            str = '[';
            var i, len = _obj.length;
            for (i = 0; i < len-1; i++) { str += serialize(_obj[i]) + ','; }
            str += serialize(_obj[i]) + ']';
         }
         else
         {
            str = '{';
            var key;
            for (key in _obj) { str += key + ':' + serialize(_obj[key]) + ','; }
            str = str.replace(/\,$/, '') + '}';
         }
         return str;
         break;

      default:
         return 'UNKNOWN';
         break;
   }
}

This code utilizes Scripting.FileSystemObject library to access the file system:
13. fso = new ActiveXObject("Scripting.FileSystemObject");

It also uses Word.Application for opening and saving Word documents:
15. wordApp = new ActiveXObject("Word.Application");

Note: You can refactor and move the whole extension setting part to the command line arguments. That would be more elegant.

Bonus

Set extension to ‘PDF’ (line 7) and fileFormat (line 8) to 17  and run the code. It will bulk export a batch of Word documents to PDFs. This could be a life saver if you deal with managing information and records.

Other Tools of the Trade

Don’t forget about MS-Excel and MS-Access when you;re working on more serious automation projects. Both applications provide structured data storage and offer programming using VBA. You can utilize many libraries and do really cool things with these tools.

Efficiency as a goal

When my colleagues ask me how long it will take for me to automate job X by writing a script, I always ask them how long will it take if we do it manually?

Writing scripts and automating work is fun and rewarding, but some tasks, no matter how mundane, are done faster manually. The main goal here is to improve efficiency, not to write code. So always consider the faster route.

However, if you are about to repeat the same manual process a few times a month or even a few times a year, it is worth spending some time and automating it.

Another added value of automation is when you are not around, or when you’ve moved on (to a better position), others will still be able to automate mundane work thanking you for the years to come. Trust me, you’ll be thanked for some time.

 

Summary
Article Name
Bulk exporting Word documents to other formats with JavaScript
Description
A Windows command line script written in JavaScript for bulk exporting Word documents to other formats (DOT, PDF, etc.)
Author

Related posts

Post a Comment

Your email address will not be published. Required fields are marked *