Browse Definitions :
Definition

semi-structured data

Contributor(s): Ivy Wigmore

Semi-structured data is data that has not been organized into a specialized repository, such as a database, but that nevertheless has associated information, such as metadata, that makes it more amenable to processing than raw data.

The difference between structured data, unstructured data and semi-structured data:
Unstructured data has not been organized into a format that makes it easier to access and process. In reality, very little data is completely unstructured. Even things that are often considered unstructured data, such as documents and images, are structured to some extent. Structured data is basically the opposite of unstructured: It has been reformatted and its elements organized into a data structure so that elements can be addressed, organized and accessed in various combinations to make better use of the information. Semi-structured data lies somewhere between the two. It is not organized in a complex manner that makes sophisticated access and analysis possible; however, it may have information associated with it, such as metadata tagging, that allows elements contained to be addressed.

Here's an example: A Word document is generally considered to be unstructured data. However, you can add metadata tags in the form of keywords and other metadata that represent the document content and make it easier for that document to be found when people search for those terms -- the data is now semi-structured. Nevertheless, the document still lacks the complex organization of the database, so falls short of being fully structured data.

In reality, there is considerable overlap between the boundaries of the three categories, which are sometimes described collectively as the data continuum.

This was last updated in November 2014

Continue Reading About semi-structured data

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

can you provide me the examples of semistructured data
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

SearchCompliance

  • risk management

    Risk management is the process of identifying, assessing and controlling threats to an organization's capital and earnings.

  • compliance as a service (CaaS)

    Compliance as a Service (CaaS) is a cloud service service level agreement (SLA) that specified how a managed service provider (...

  • data protection impact assessment (DPIA)

    A data protection impact assessment (DPIA) is a process designed to help organizations determine how data processing systems, ...

SearchSecurity

  • spyware

    Spyware is a type of malicious software -- or malware -- that is installed on a computing device without the end user's knowledge.

  • application whitelisting

    Application whitelisting is the practice of specifying an index of approved software applications or executable files that are ...

  • botnet

    A botnet is a collection of internet-connected devices, which may include PCs, servers, mobile devices and internet of things ...

SearchHealthIT

SearchDisasterRecovery

  • business continuity plan (BCP)

    A business continuity plan (BCP) is a document that consists of the critical information an organization needs to continue ...

  • disaster recovery team

    A disaster recovery team is a group of individuals focused on planning, implementing, maintaining, auditing and testing an ...

  • cloud insurance

    Cloud insurance is any type of financial or data protection obtained by a cloud service provider. 

SearchStorage

  • DRAM (dynamic random access memory)

    Dynamic random access memory (DRAM) is a type of semiconductor memory that is typically used for the data or program code needed ...

  • RAID 10 (RAID 1+0)

    RAID 10, also known as RAID 1+0, is a RAID configuration that combines disk mirroring and disk striping to protect data.

  • PCIe SSD (PCIe solid-state drive)

    A PCIe SSD (PCIe solid-state drive) is a high-speed expansion card that attaches a computer to its peripherals.

Close