Browse Definitions :
Definition

semi-structured data

Contributor(s): Ivy Wigmore

Semi-structured data is data that has not been organized into a specialized repository, such as a database, but that nevertheless has associated information, such as metadata, that makes it more amenable to processing than raw data.

The difference between structured data, unstructured data and semi-structured data:
Unstructured data has not been organized into a format that makes it easier to access and process. In reality, very little data is completely unstructured. Even things that are often considered unstructured data, such as documents and images, are structured to some extent. Structured data is basically the opposite of unstructured: It has been reformatted and its elements organized into a data structure so that elements can be addressed, organized and accessed in various combinations to make better use of the information. Semi-structured data lies somewhere between the two. It is not organized in a complex manner that makes sophisticated access and analysis possible; however, it may have information associated with it, such as metadata tagging, that allows elements contained to be addressed.

Here's an example: A Word document is generally considered to be unstructured data. However, you can add metadata tags in the form of keywords and other metadata that represent the document content and make it easier for that document to be found when people search for those terms -- the data is now semi-structured. Nevertheless, the document still lacks the complex organization of the database, so falls short of being fully structured data.

In reality, there is considerable overlap between the boundaries of the three categories, which are sometimes described collectively as the data continuum. 

Chris Selland explains the big data continuum:

 

This was last updated in November 2014

Continue Reading About semi-structured data

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

can you provide me the examples of semistructured data
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

SearchCompliance

  • smart contract

    A smart contract, also known as a cryptocontract, is a computer program that directly controls the transfer of digital currencies...

  • risk map (risk heat map)

    A risk map, also known as a risk heat map, is a data visualization tool for communicating specific risks an organization faces. A...

  • internal audit (IA)

    An internal audit (IA) is an organizational initiative to monitor and analyze its own business operations in order to determine ...

SearchSecurity

SearchHealthIT

  • Health IT (health information technology)

    Health IT (health information technology) is the area of IT involving the design, development, creation, use and maintenance of ...

  • fee-for-service (FFS)

    Fee-for-service (FFS) is a payment model in which doctors, hospitals, and medical practices charge separately for each service ...

  • biomedical informatics

    Biomedical informatics is the branch of health informatics that uses data to help clinicians, researchers and scientists improve ...

SearchDisasterRecovery

  • risk mitigation

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a data center.

  • ransomware recovery

    Ransomware recovery is the process of resuming options following a cyberattack that demands payment in exchange for unlocking ...

  • natural disaster recovery

    Natural disaster recovery is the process of recovering data and resuming business operations following a natural disaster.

SearchStorage

  • RAID 5

    RAID 5 is a redundant array of independent disks configuration that uses disk striping with parity.

  • non-volatile storage (NVS)

    Non-volatile storage (NVS) is a broad collection of technologies and devices that do not require a continuous power supply to ...

  • petabyte

    A petabyte is a measure of memory or data storage capacity that is equal to 2 to the 50th power of bytes.

Close