Browse Definitions:
Definition

document sanitization

Contributor(s): Ivy Wigmore

Document sanitization is the process of ensuring that only the intended information can be accessed from a document.

In addition to making sure the document text doesn’t openly divulge anything it shouldn’t, document sanitization includes removing document metadata that could pose a privacy or security risk. Document metadata can contain the names of authors and modifiers, the dates of creation and changes, file size, edit changes, revision histories and comment exchanges between authors and editors. Because that metadata may contain sensitive information, it's important safeguard it from unauthorized access. 

A common way to remove metadata from a document is to convert it to PDF format before releasing it; however, there are processes that must be followed to ensure the document contains no unintended information.  The National Security Agency (NSA) recommends the following six-step process for secure conversion and redaction of Word documents:

  1. Create a copy of the original document.
  2. Turn off “Track Changes” on the copy and remove all visible comments.
  3. Delete any sensitive information from the document that you wish to redact.
  4. Use the Microsoft Office Document Inspector to check for any unwanted metadata.
  5. Save the new document and convert it to a PDF file.
  6. Use the Sanitize Document tool in Acrobat Professional as a second check before releasing the redacted PDF.

See also: metadata management, metadata security

This was last updated in August 2014

Continue Reading About document sanitization

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Document sanitization can be automated now with simple add-on to your secure email or web gateway to ensure all documents leaving (or entering) your organization are clean.
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

SearchCompliance

  • PCAOB (Public Company Accounting Oversight Board)

    The Public Company Accounting Oversight Board (PCAOB) is a Congressionally-established nonprofit that assesses audits of public ...

  • cyborg anthropologist

    A cyborg anthropologist is an individual who studies the interaction between humans and technology, observing how technology can ...

  • RegTech

    RegTech, or regulatory technology, is a term used to describe technology that is used to help streamline the process of ...

SearchSecurity

  • email spam

    Email spam, or junk email, is unsolicited bulk messages sent through email with commercial, fraudulent or malicious intent.

  • distributed denial of service (DDoS) attack

    A distributed denial-of-service attack occurs when an attack originates from multiple computers or devices, usually from multiple...

  • application whitelisting

    Application whitelisting is the practice of identifying applications that have been deemed safe for execution and restricting all...

SearchHealthIT

  • national provider identifier (NPI)

    A national provider identifier (NPI) is a unique ten-digit identification number required by HIPAA for covered healthcare ...

  • athenahealth Inc.

    Based in Watertown, Mass., athenahealth Inc. is a leading vendor of cloud-based EHRs for small to medium-sized physician ...

  • Affordable Care Act (ACA or Obamacare)

    The Affordable Care Act (ACA) is legislation passed in 2010 that changed how uninsured Americans enroll in and receive healthcare...

SearchDisasterRecovery

  • disaster recovery as a service (DRaaS)

    One approach to a strong disaster recovery plan is DRaaS, where companies offload data replication and restoration ...

  • data recovery

    Data recovery restores data that has been lost, accidentally deleted, corrupted or made inaccessible. Learn how data recovery ...

  • disaster recovery plan (DRP)

    A company's disaster recovery policy is enhanced with a documented DR plan that formulates strategies, and outlines preparation ...

SearchStorage

  • GlusterFS (Gluster File System)

    GlusterFS (Gluster File System) is an open source distributed file system that can scale out in building-block fashion to store ...

  • virtual memory

    Virtual memory is a memory management capability of an OS that allows a computer to compensate for physical memory shortages by ...

  • yottabyte (YB)

    A yottabyte is a measure of theoretical storage capacity and is 2 to the 80th power bytes, or, in decimal, approximately 1,000 ...

SearchSolidStateStorage

  • PCIe SSD (PCIe solid-state drive)

    A PCIe SSD (PCIe solid-state drive) is a high-speed expansion card that attaches a computer to its peripherals.

  • SSD caching

    SSD caching, also known as flash caching, is the temporary storage of data on NAND flash memory chips in a solid-state drive so ...

  • NVDIMM (Non-Volatile Dual In-line Memory Module)

    An NVDIMM (non-volatile dual in-line memory module) is hybrid computer memory that retains data during a service outage.

SearchCloudStorage

  • RESTful API

    A RESTful application program interface breaks down a transaction to create a series of small modules, each of which addresses an...

  • cloud storage infrastructure

    Cloud storage infrastructure is the hardware and software framework that supports the computing requirements of a private or ...

  • Zadara VPSA and ZIOS

    Zadara Storage provides block, file or object storage with varying levels of compute and capacity through its ZIOS and VPSA ...

Close