Web-Based Tracking Strings
It is possible to add a “Web Based Tracker String” (WBTS) to Microsoft Word documents. A WBTS could allow an author to track where a document is being read and how often. In addition, the author can watch how a “tracked” document is passed from one person to another or from one organization to another.
WBTSs are made possible by the ability of Microsoft Word documents to link to an image file that is located on a remote Web server. Because only the URL of the WBTS is stored in a document and not the actual image, Microsoft Word must get the image from a Web server each and every time the document is opened. This image linking feature then puts a remote server in the position to monitor when and where a document file is being opened. The server knows the IP address and host name of the computer that is opening the document. A host name will typically include a company name if a computer is located at a business. The host name of a home computer usually has the name of a user’s Internet Service Provider (ISP). (This is, of course, a moot point if the document never has contact with the internet.)
An additional issue, and one that could magnify the potential for surveillance, is that Web bugs in Word documents can also read and write browser cookies belonging to Internet Explorer. Cookies could allow an author to match up the computer viewer of a Word document to the visits to the author’s Web site.
WBTSs are used extensively today by Internet advertising companies on Web pages and in HTML-based e-mail messages for tracking. They are typically 1 pixel by 1 pixel in size to make them invisible on the screen to disguise the fact that they are used for tracking. Short of removing the feature that allows linking to Web images in Microsoft Word, there does not appear to be a good preventive solution. To stop WBTSs, it is best to disable cookies in a software patch. In addition to Word documents, WBTSs can also be used in Excel 2000 and PowerPoint 2000 documents.
One reason to use this tracking ability is to monitor the path of a confidential document, either within or beyond a company’s computer network. The confidential document could be “instructed” to alert someone each time it is opened. If the company’s Web server ever received a “server hit” from an IP address for the bug outside the organization, then it would learn immediately about the leak. Because the server log would include the host name of the computer where the document was opened, a company could know if the organization that received the leaked document was a competitor, a media outlet, or something else. All original copies of a confidential document could also be numbered so that a company could track the source of a leak. A unique serial number could be encoded in the query string of the WBTS’s URL. If the document is leaked, the server hit for the WBTS will indicate which copy was leaked. A serial number could be added to a WBTS in a document either manually right before a copy of a document is saved or automatically through a simple utility program. The utility program would scan a document for the WBTS’s URL and add a serial number in the query string. A Perl script (Perl being a computer language much used on the internet these days.) of less than 20 lines of code could easily be written to do this sort of serialization.
Another use of WBTSs in Word documents is to detect copyright infringement. For example, a publishing company could “bug” all outgoing copies of its newsletter. (the ÆGIS e-journal does not do this, as we encourage sharing of our experience). The WBTS in a newsletter could contain unique customer ID numbers to detect how widely an individual newsletter is copied and distributed.
A third possible use of WBTSs is for market research. For example, a company could place Web bugs in a press release distributed as a Word document. The server log hits for the WBTS would then tell the company what organizations have actually viewed the press release. The company could also observe how a press release is passed along within an organization, or to other organizations. In an academic setting, WBTSs might be used to detect plagiarism.
A document could be tagged before it is distributed. An invisible WBTS could be placed within each paragraph in the document. If text were to be cut and pasted from the document, it is likely that a WBTS would be picked up also and copied into the new document.
The use of WBTSs is not an issue unique to Word. Any file format that supports automatic linking to Web pages or images could lead to the same problem. Software engineers should take this privacy issue into consideration when designing new file formats. This issue is potentially critical for music file formats such as MP3 files where piracy concerns are high. For example, it is easy to imagine an extended MP3 file format that supports embedded HTML for showing song credits, cover artwork, lyrics, and so on. The embedded HTML with embedded WBTSs could also be used to track how many times a song is played and by which computer, identified by its IP address.
Recommendations: Short of getting rid of the ability of Word documents to link to Web images, there really is no solution to being able to track Word documents using WBTSs. This linking ability is a useful feature. However, the Web browser cookies could be disabled inside Word documents. There appears to be very little need for cookies outside a Web browser. In general, cookies should be disabled by default any time Internet Explorer is reused inside other applications such as Word, Excel, or Outlook.
Users concerned about being tracked can use a firewall such as ZoneAlarm (available free at http://www.zonelabs.com) to warn about WBTSs in Word documents. ZoneAlarm monitors all software and warns if a program for which you have not given explicit permission to do so is attempting to access the Internet. (In this case it will ask if you want Word to be able to access the Internet. The prudent answer is NO.) ZoneAlarm is designed to catch Trojan horses and spyware. However, because Word typically does not access the Internet, ZoneAlarm can also be used to catch “tagged” Word documents.