Creating a DOCX out of thin air
Suppose you were stranded on a desert island, and you wanted to create a document. A “Hello World!” document, just to let the world know you’re out there.
And let’s say you’re going to send it to a bunch of people back at the office who are all running Office 2007. (You’ve been on this island for quite a while, see, or maybe you work in DPE.) Office 2007 can open anything from a text file to a DOCX, but you’d like to send a DOCX, just to look cool and to show your support for open non-proprietary technology.
But you only have a crude computer with no application software installed. Maybe it’s running Windows XP, or maybe even Win95. Or it could be a Mac, or a Linux box, or even some old DOS machine or an Apple II or something.
Sounds like you need to create a DOCX out of thin air. No problem. You’ll need three things to do it:
1) A folder-based file system. CP/M or TSO won’t do — ideally you want something designed in the last 25 years or so. I used XP.
2) A text editor. I used Notepad.
3) Some way to compress a ZIP archive, and it needs to be a version that handles folders within the archive. I used WinZIP.
Here’s what to do …
First, create two folders. On my XP laptop, I just put them on the desktop. These folders should be named _rels and word.
Next create a text file at the same level as these folders (e.g., also on the desktop), and put this chunk of XML in it:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
</Types>
That’s just some XML that defines two content types: package relationships and a WordProcessingML document. Save the text file, and rename it [Content_Types].xml. You should then have two empty folders and a content-type file, all in the same folder or on your desktop.
OK, we’ve create three of the five things we’ll need to make this document work. We’re over half done!
The next thing we need to create is the relationships file. Create a text file in the _rels folder, and put this content in it:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="MyRelationship" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
</Relationships>
That’s some XML that basically says “this officeDocument’s outermost part is the document.xml file in the word folder.” Save this text file, and name it .rels. (Yes, just an extension — note that in the content types we said, in essence, “anything with a .rels extension defines package relationships.”)
If you’re running Windows, you can’t rename a file to an extension only from the GUI shell, so you’ll need to go to the command prompt to do this or find some other way. Deal with it.
The final step is to create the document.xml file in the word folder. That one should contain this chunk of WordProcessingML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:wordDocument xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/2/main">
<w:body>
<w:p>
<w:r>
<w:t xml:space="preserve">Hello World!</w:t>
</w:r>
</w:p>
</w:body>
</w:wordDocument>
Note that we could also have a _rels folder within the word folder, to define relationships to document.xml. But we didn’t bother, because there are no other parts in this document for document.xml to have a relationship with. When you have no relationships, life’s simple.
That’s it, you’ve created the contents of a DOCX file. Now you just need to package it up. Go back to your top level folder and ZIP the three pieces (the two folders and the content-types file) into a ZIP archive, then rename it something like HelloWorld.docx. Then open it in Word 2007, and it should look something like this:
This document has many key parts and items missing. It has no document properties and no application properties. It has no support for most common content types. It has no headers, footers, styles, themes, tables, fonts, or other typical document elements. But it’s an Office Open XML document, and Word 2007 will open it without complaint.
Suppose you want to add an image to this document. That gets a little messier. You have to add a content type for the image format (jpeg, say), then you have to define a relationship for the image and insert some WordProcessingML into document.xml to add the relationship to the document. You also have to put the image file somewhere in the document — Word puts them in a media folder, which is a good idea, but you can just throw it in the word folder if you’re feeling lazy, and make the relationship point to it there. If you do all of those steps, you get a document which looks something like this in Word 2007:
So you can create a DOCX that Word 2007 (or any other Open XML consumer) will open, and you can do it without Office 2007, or prior versions of Office, or any other software written in the last decade for that matter. You can create it right out of thin air. This isn’t the easiest way to go about things, of course, and it’s certainly not a best practice, but it demonstrates the openness of the new Office Open XML file formats in a simple and straightforward way.
The obvious variation, to create a DOC file from thin air, will be left as an exercise for the reader.