Browse Definitions:
Definition

document sanitization

Contributor(s): Ivy Wigmore

Document sanitization is the process of ensuring that only the intended information can be accessed from a document.

In addition to making sure the document text doesn’t openly divulge anything it shouldn’t, document sanitization includes removing document metadata that could pose a privacy or security risk. Document metadata can contain the names of authors and modifiers, the dates of creation and changes, file size, edit changes, revision histories and comment exchanges between authors and editors. Because that metadata may contain sensitive information, it's important safeguard it from unauthorized access. 

A common way to remove metadata from a document is to convert it to PDF format before releasing it; however, there are processes that must be followed to ensure the document contains no unintended information.  The National Security Agency (NSA) recommends the following six-step process for secure conversion and redaction of Word documents:

  1. Create a copy of the original document.
  2. Turn off “Track Changes” on the copy and remove all visible comments.
  3. Delete any sensitive information from the document that you wish to redact.
  4. Use the Microsoft Office Document Inspector to check for any unwanted metadata.
  5. Save the new document and convert it to a PDF file.
  6. Use the Sanitize Document tool in Acrobat Professional as a second check before releasing the redacted PDF.

See also: metadata management, metadata security

This was last updated in August 2014

Continue Reading About document sanitization

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Document sanitization can be automated now with simple add-on to your secure email or web gateway to ensure all documents leaving (or entering) your organization are clean.
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

SearchCompliance

SearchSecurity

  • black hat

    Black hat refers to a hacker who breaks into a computer system or network with malicious intent.

  • copyright

    Copyright is a legal term describing ownership of control of the rights to the use and distribution of certain works of creative ...

  • keylogger (keystroke logger or system monitor)

    A keylogger, sometimes called a keystroke logger or system monitor, is a type of surveillance technology used to monitor and ...

SearchHealthIT

  • population health management (PHM)

    Population health management (PHM) is a discipline within the healthcare industry that studies and facilitates care delivery ...

  • ICD-10-PCS

    The International Classification of Diseases, 10th Revision, Procedure Coding System (ICD-10-PCS) is a U.S. cataloging system for...

  • U.S. National Library of Medicine (NLM)

    The U.S. National Library of Medicine (NLM) is the largest biomedical library in the world.

SearchDisasterRecovery

  • business continuity plan (BCP)

    A business continuity plan (BCP) is a document that consists of the critical information an organization needs to continue ...

  • call tree

    A call tree -- sometimes referred to as a phone tree -- is a telecommunications chain for notifying specific individuals of an ...

  • mass notification system (MNS)

    A mass notification system is a platform that sends one-way messages to inform employees and the public of an emergency.

SearchStorage

  • open source storage

    Open source storage is data storage software developed in a public, collaborative manner that permits the free use, distribution ...

  • CompactFlash card (CF card)

    A CompactFlash card (CF card) is a memory card format developed by SanDisk in 1994 that uses flash memory technology to store ...

  • email archiving

    Email archiving (also spelled e-mail archiving) is a systematic approach to saving and protecting the data contained in email ...

SearchSolidStateStorage

  • RRAM or ReRAM (resistive RAM)

    RRAM or ReRAM (resistive random access memory) is a form of nonvolatile storage that operates by changing the resistance of a ...

  • JEDEC

    JEDEC is a global industry group that develops open standards for microelectronics.

  • M.2 SSD

    An M.2 SSD is a solid-state drive (SSD) that conforms to a computer industry specification written for internally mounted storage...

SearchCloudStorage

  • RESTful API

    A RESTful application program interface breaks down a transaction to create a series of small modules, each of which addresses an...

  • cloud storage infrastructure

    Cloud storage infrastructure is the hardware and software framework that supports the computing requirements of a private or ...

  • Zadara VPSA and ZIOS

    Zadara Storage provides block, file or object storage with varying levels of compute and capacity through its ZIOS and VPSA ...

Close