White Paper
Microsoft Office Document Inspector
Metadata Management for Microsoft Office 2007
by Randall Farrar
With the advent of Microsoft Office 2007 (MSO07) and its addition of the Document Inspector, our clients (and prospects) are asking if there's a need for iScrub, our enterprise metadata management application. Has Microsoft, in one single blow, nullified many companies' investment in iScrub? If metadata scrubbing is now built-in, why use another application?
In this white paper, I want to compare two approaches to metadata management and to show that the MSO07 Document Inspector is significantly lacking as an enterprise-metadata management tool. The first approach is the out-of-the-box MSO07 Document Inspector (DI) object model. The second approach is Esquire Innovations metadata management application, iScrub.
MSO07 DI is Microsoft's response to the market's outcry about the hidden data that can so easily be stored in Microsoft Office documents. When sharing these files outside of the company or firm, there's risk of disclosing discoverable, unintentional, confidential or hidden information that might be adverse to client representation or, at least, extremely embarrassing. Prior to MSO07 DI, Microsoft provided the “Remove Hidden Data Tool” that was barely usable and kludgy at best, and the MSO07 DI was a needed addition.
MSO07 Document Inspector
Microsoft's idea behind the MSO07 DI is to provide a central location for users to view MSO07 documents for personal, hidden, or sensitive information. To view or remove this information a user can use the built-in DI (see Figure 1). An organization can extend the DI with additional development using the DI Object Model.

Figure 1
The MSO07 DI is Composed of Three Modules
The MSO07 DI is composed of three modules users can access to inspect and remove specific metadata from the document – MSO07 Word DI, MSO07 Excel DI, and MSO07 PowerPoint DI
Metadata Elements for MSO07 Word DI
- Comments
- Revision marks from tracked changes
- Document version information
- Ink annotations
- Document properties, including information from the Summary, Statistics, and Custom tabs of the Document Properties dialog box
- E-mail headers
- Routing slips
- Send-for-review information
- Document server properties
- Document Management Policy information
- Databinding link information for databound fields (last value will be converted to text)
Note: Does not handle some linked fields such as IncludeText - User name
- Template name
- Text that is formatted as hidden (a font effect that is available in the Font dialog box)
Metadata Elements for MSO07 Excel DI
- Comments
- Ink annotations
- Document properties, including information from the Summary, Statistics, and Custom tabs of the Document Properties dialog box
- E-mail headers
- Routing slips
- Send-for-review information
- Document server properties
- Document Management Policy information
- User name
- Printer path information
- Scenario comments
- File path for publishing Web pages
- Comments for defined names and table names
- Inactive external data connections
- Information in worksheet headers
- Information in worksheet footers
- Hidden rows
- Hidden columns that contain data
- Objects that are not visible because they are formatted as invisible
Metadata Elements for MSO07 PowerPoint DI
- Comments
- Ink annotations
- Document properties, including information from the Summary, Statistics, and Custom tabs of the Document Properties dialog box
- E-mail headers
- Routing slips
- Send-for-review information
- Document server properties
- Document Management Policy information
- File path for publishing Web pages
- Objects that are not visible because they are formatted as invisible
- Text that was added to the Notes section of a presentation
- Custom XML data that might be stored within a presentation
Removing Metadata from the MSO07 Document using DI
Once the user selects Inspect (see Figure 1), the DI dialog box displays the type of metadata found in the document. The MSO07 DI provides buttons to remove specific metadata elements contained in that document (see Figure 2).

Figure 2
Once the user selects which metadata to remove, they can recheck the document for metadata by selecting the Reinspect button (see Figure 2)
Extending the MSO07 DI
MSO07 DI can be extended using VBA and managed code (Visual Basic .Net). Microsoft has added a new Document Inspectors collection type to the object models in MSO07 Word (Document object), MSO07 Excel (Workbook object), and MSO07 PowerPoint (Presentation object). This means that an organization with the programming resources can use either VBA or .Net to develop its own custom DI modules.
iScrub
iScrub is the premier enterprise solution for metadata removal and metadata management in document intensive organizations. iScrub uses sophisticated technologies to remove the visible document properties and scrubs the difficult to reach file elements, such as the list of past authors (all document authors) and Deleted Text.
iScrub provides a centralized administration feature that allows firms to establish and control the metadata removal settings - this is called an enterprise-metadata management approach.
iScrub publishes the clean version of a document, separate from the original file, inside or outside of a document management system.
iScrub works with Outlook, Lotus Notes and GroupWise to prompt users to scrub e-mail attachments before sending them; automatically helping to prevent sensitive metadata information from leaving the organization.
MSO07 DI Limitations
MSO07 DI's lack of extensive out of the box metadata management ability is not suited for an enterprise-metadata management approach. The onus is on individual users to "inspect" their documents and then decide what to remove.
With the Federal Rules of Civil Procedure relating to electronically stored information, relying on MSO07 DI places the company or firm at risk of "...inadvertent production." The firm should decide how to manage a document's electronic information (metadata) from an enterprise-wide approach, not individual users.
MSO07 DI Removes Metadata from the Original
MSO07 DI does not publish a result document making accidental removal of the metadata very easy. If the user unintentionally removes metadata using the MSO07 DI, there are metadata items that cannot be “undone” (see Figure 3).

Figure 3
In firms where document collaboration and client work product are the currency, accidental metadata destruction can be quite costly. For instance, an attorney asks a secretary to send an agreement he’s been working on all-night to his client. This particular document contains his and a colleague’s comments along with their track changes. He tells the secretary to send the client a copy with the metadata scrubbed. The secretary uses MSO07 DI to inspect the document and notices that this document has “Revision marks and Comments” with a red EXCLAMATION POINT! (see Figure 4). This must not be good, so he selects “Remove All” and then realizes in a panic that this was the original. He tries to “undo” the removal and can’t.
MSO07 DI does not preserve the original and makes it too easy to lose important metadata from the original. Along with this, MSO07 DI uses the term “Revision” when in fact it is Track Changes – this is confusing.

Figure 4
Below is a list of metadata the inspector removes that cannot be undone for MSO07 Word:
- Comments
- Revisions (Track Changes)
- Versions
- Annotations
- Custom Properties
- Template Name
- Statistics
- Data binding link information for data bound fields (last value will be converted to text)
- Template name
CAUTION: MSO07 DI does not always remove personal information in MSO07 Word. In Office 2003, when personal information was removed the author info was removed from track changes. In MSO07 DI the author information is NOT removed.
Header Footer Removal is Destructive
Here’s a feature of MSO07 DI I just don’t get. When the “Headers, Footers and Watermarks” (see Figure 5) are removed, the MSO07 DI removes everything in the footer, including the page number.

Figure 5
For long documents such as agreements, contracts and corporate documents where the footers are complex, this can cause some major problems…and heart ache! This can be Undone, but if you forget and save it first, it’s gone.
Not all Databinding Link Information Is Removed in MSO07 WORD
There are fields in all versions of Microsoft Word that can contain linked data in the form of text, pictures and hyperlinks that can reference files on a server that MSO07 DI does not remove or unlink (turn the field to text).
Here are examples of Link Fields that are not removed from a doument using the MSO07 DI (notice the server name and path information):
- { HYPERLINK "\\\\PRODEV\\People\\JDoe\\DOCS-" \l "609447-v18-Bylaws.DOC" }
- { INCLUDEPICTURE \\\\ PRODEV \\People\\ JDoe\\iRedlineLogo.gif" \* MERGEFORMAT }
- { LINK Equation.3 \\\\ PRODEV \\People\\ JDoe\\iRedlineLogo.gif" \p }
- { INCLUDETEXT "\\\\ PRODEV \\People\\ JDoe\\DOCS-#609447-v18-Bylaws.DOC" \* PRODEV }
MSO07 Excel Formula Errors Will Occur when Hidden Rows, Columns and Worksheets are Deleted
If there are formulas in a document that are referencing other values in hidden rows, columns or worksheets, when MSO07 DI removes them an error occurs (#REF!) in the formulas that originally referenced them. On the other hand iScrub converts the formulas to values before unhiding or deleting these items.
Metadata Elements That Can’t Be Managed
MSO07 DI lacks the ability to manage much metadata. And unless a firm invests in development efforts to extend it, MSO07 DI is not robust enough to implement an enterprise-metadata management policy.
Additionally, the MSO07 DI does not show what the metadata is, or where it is. For instance, once the user inspects the document, DI will tell them that there are document properties (built-in and custom), but doesn't show what those document properties are. This metadata may contain case-supporting evidence that should be disclosed or discovered.
Because DI lacks the ability to view specific metadata then this metadata can be left in a document. If this document becomes part of an e-discovery process, then it could prove to be costly and embarrassing because at that point the metadata could be revealed outside the walls of a firm. The "..producing party must notify the opposing party and court and retrieve that information should privileged information be inadvertently produced."1
The table below shows the metadata MSO07 DI removes, compared to the metadata iScrub manages.
| Metadata Element |
MSO07 DI Removes |
iScrub2 Manages |
|---|---|---|
Multiple Document Scrub (Batch Scrubbing) |
No |
Yes |
Word |
|
|
Comments |
Yes |
Yes |
Change Author Names |
No |
Yes |
Track Changes |
Yes |
Yes |
Document server properties |
Yes |
Yes |
Document Management Policy information |
Yes |
Yes |
Keep Track Changes Remove Author |
No |
Yes |
Revision Number |
Yes |
Yes |
Versions |
Yes |
Yes |
Annotations |
Yes |
Yes |
Built-in Properties |
Yes |
Yes |
Custom properties |
Yes |
Yes |
Preserve specific Custom properties |
No |
Yes |
Personal Information |
Yes |
Yes |
Custom XML Data |
Yes |
Yes |
E-mail head |
Yes` |
Yes |
Hidden Text |
Yes |
Yes |
Keep Track Changes Remove date and time |
No |
Yes |
Bookmarks |
No |
Yes |
Unused Styles |
No |
Yes |
Normalize Custom Styles names |
No |
Yes |
Set Compatibility |
No |
Yes |
Diminutive Fonts |
No |
Yes |
Document Variables |
No |
Yes |
Embedded True Type Fonts |
No |
Yes |
Field Codes |
No |
Yes |
Hyperlinks |
No |
Yes |
Hyperlink history |
No |
Yes |
Include Text Fields That contain network paths |
No |
Yes |
Invisible Ink |
No |
Yes |
Linguistic Data |
No |
Yes |
Linked Objects |
No |
Yes |
Random Number |
No |
Yes |
Routing Slips |
Yes |
Yes |
Smart Tags |
No |
Yes |
Style Sheets |
No |
Yes |
IncludePicture Fields |
No |
Yes |
Edit Time |
Yes |
Yes |
Print Date |
No |
Yes |
Creation Date |
No |
Yes |
Modified Date |
No |
Yes |
Convert Legacy document to Docx |
No |
Yes |
Send-for-review information |
Yes |
Yes |
Template name |
Yes |
Yes |
|
|
|
Excel |
|
|
Comments |
Yes |
Yes |
All external data connections |
No |
Yes |
Keep Comments Remove Author |
No |
Yes |
Comments for defined names and table names |
Yes |
Yes |
Annotations |
Yes |
Yes |
Built-in Properties |
Yes |
Yes |
Custom properties |
Yes |
Yes |
E-mail head |
Yes` |
Yes |
Personal Information |
Yes |
Yes |
Custom XML Data |
Yes |
Yes |
Document server properties |
Yes |
Yes |
Document Management Policy information |
Yes |
Yes |
Headers and Footers |
Yes |
Yes |
Headers and Footers |
No |
Yes |
Delete Hidden Rows and Columns |
Yes |
Yes |
Unhide Hidden Rows and Columns |
No |
Yes |
Delete Hidden Sheets |
No |
Yes |
Unhide Hidden Sheets |
No |
Yes |
Linked Objects |
No |
Yes |
Invisible Objects Note: DI cannot detect text that was hidden by other methods (for example, white text on a white background). |
Yes |
Yes |
Printer path information |
Yes |
Yes |
Track Changes |
No |
Yes |
Custom Number Formats |
No |
Yes |
Custom Style |
No |
Yes |
Custom Views |
No |
Yes |
Diminutive Fonts |
No |
Yes |
External Links |
No |
Yes |
Fonts Matching Cell Color |
No |
Yes |
Formulas |
No |
Yes |
Hyperlinks |
No |
Yes |
Hyperlink history |
No |
Yes |
Normalize Sheet Names |
No |
Yes |
Pivot Tables – disable refresh |
No |
Yes |
Pivot Tables – remove cache Data |
No |
Yes |
Pivot Tables – remove Data Connection |
No |
Yes |
Pivot Tables – remove Refresh Authors |
No |
Yes |
Range Names |
No |
Yes |
Scenarios |
No |
Yes |
Smart Tags |
No |
Yes |
|
|
|
PowerPoint |
|
|
Comments |
Yes |
Yes |
Annotations |
Yes |
Yes |
Built-in Properties |
Yes |
Yes |
Custom properties |
Yes |
Yes |
E-mail head |
Yes` |
Yes |
Personal Information |
Yes |
Yes |
Custom XML Data |
Yes |
Yes |
Invisible On-Slide Content |
Yes |
Yes |
Document server properties |
Yes |
Yes |
Document Management Policy information |
Yes |
Yes |
Presentation Notes |
Yes |
Yes |
Headers Footers |
No |
Yes |
Delete Hidden Slides |
No |
Yes |
Unhide Hidden Slides |
No |
Yes |
Hyperlinks |
No |
Yes |
Hyperlink history |
No |
Yes |
Linked Objects |
No |
Yes |
Notes Master |
No |
Yes |
Slide Master |
No |
Yes |
|
|
|
PDF documents |
No |
Yes |
Document Title |
No |
Yes |
Document Author |
No |
Yes |
Document Subject |
No |
Yes |
Keywords |
No |
Yes |
Application Creator |
No |
Yes |
Application Producer |
No |
Yes |
Different Levels of Metadata Management
Metadata should be managed differently depending on who the document is going to or its intended purpose. If a document is going to a client or collaborator then perhaps only certain metadata elements might be removed. If the document is going to an adverse party, then most (if not all) of the document's metadata should be removed. A company may wish to provide several standardized levels of metadata management to their users, thus removing the decision-making responsibility from the individual, and transforming it into a conscious enterprise approach.
MSO07 DI does not provide different levels of inspection and removal. This disadvantage makes MSO07 DI a poor choice for enterprise-metadata management. MSO07 DI relies on each user to understand and remove metadata components they believe to be potentially damaging. Therefore, by its nature (to be effective), extensive user education and training is required.
iScrub enables a company to set up fixed standards for metadata removal and enforce those standards. There are up to 5 different levels of scrubbing. Users simply have to select one of the levels available to them - there is no guess work and little training needed.
Preventing Metadata Disclosure for Email Attachments
iScrub prompts the user to scrub a document from e-mail. When a user has an attachment, iScrub sees it, and will remove the metadata as the document exits the company’s electronic walls.
MSO07 DI only works within its intrinsic Office Object Model, and will not prompt users to remove the metadata from within Outlook3. Once again, the firm must put its trust in the individual user; trust in his/her memory to actually apply the DI before attaching the document, and trust in his/her judgment or knowledge to remove the proper elements for that specific transaction. .
Lack of E-Discovery Features
As more and more companies are instituting E-discovery processes for managing internal electronic information (metadata), the ability to report on what metadata is in the document and what has been removed, becomes paramount. MSO07 DI lacks any reporting capability and, in fact, the user has no idea what has been removed or where it was in the document.
iScrub for Office 2007, on the other hand, will provide an XML output file that can be utilized in any number of ways. iScrub's report will detail all the metadata in the document and also report on what was removed.
Summary
MSO07 DI is significantly lacking as an enterprise-metadata management tool. The limited number of metadata elements that can be removed (much less viewed or actually managed) makes it a poor choice for document intensive organizations that truly need to manage their discoverable metadata. On the other hand, the costs and effort associated with extending the MSO07 DI to a richer metadata management model will be much higher and less efficient than investing in a proven metadata product, such as iScrub, which does significantly more out of the box at a much lower overall cost.
1 Michele C.S. Lange, Esq. "New FRCP Rules: What Does it Mean for You" MSBA Computer and Technology Law Section. December 01, 2006, http://mntech.typepad.com/msba/2006/12/new_frcp_rules_.html
2 iScrub version 5 for Microsoft Office 2007
3 iScrub also works from within Lotus Notes and GroupWise




