Browse Definitions:
Definition

semi-structured data

Contributor(s): Ivy Wigmore

Semi-structured data is data that has not been organized into a specialized repository, such as a database, but that nevertheless has associated information, such as metadata, that makes it more amenable to processing than raw data.

The difference between structured data, unstructured data and semi-structured data:
Unstructured data has not been organized into a format that makes it easier to access and process. In reality, very little data is completely unstructured. Even things that are often considered unstructured data, such as documents and images, are structured to some extent. Structured data is basically the opposite of unstructured: It has been reformatted and its elements organized into a data structure so that elements can be addressed, organized and accessed in various combinations to make better use of the information. Semi-structured data lies somewhere between the two. It is not organized in a complex manner that makes sophisticated access and analysis possible; however, it may have information associated with it, such as metadata tagging, that allows elements contained to be addressed.

Here's an example: A Word document is generally considered to be unstructured data. However, you can add metadata tags in the form of keywords and other metadata that represent the document content and make it easier for that document to be found when people search for those terms -- the data is now semi-structured. Nevertheless, the document still lacks the complex organization of the database, so falls short of being fully structured data.

In reality, there is considerable overlap between the boundaries of the three categories, which are sometimes described collectively as the data continuum. 

Chris Selland explains the big data continuum:

 

This was last updated in November 2014 ???publishDate.suggestedBy???

Continue Reading About semi-structured data

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

can you provide me the examples of semistructured data
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchCompliance

  • risk map (risk heat map)

    A risk map, also known as a risk heat map, is a data visualization tool for communicating specific risks an organization faces. A...

  • internal audit (IA)

    An internal audit (IA) is an organizational initiative to monitor and analyze its own business operations in order to determine ...

  • pure risk (absolute risk)

    Pure risk, also called absolute risk, is a category of threat that is beyond human control and has only one possible outcome if ...

SearchCloudProvider

  • cloud ecosystem

    A cloud ecosystem is a complex system of interdependent components that all work together to enable cloud services.

  • cloud services

    Cloud services is an umbrella term that may refer to a variety of resources provided over the internet, or to professional ...

  • uncloud (de-cloud)

    The term uncloud describes the action or process of removing applications and data from a cloud computing platform.

SearchSecurity

  • cyberextortion

    Cyberextortion is a crime involving an attack or threat of an attack coupled with a demand for money or some other response in ...

  • Cybercrime

    Cybercrime is any criminal activity that involves a computer, networked device or a network.

  • National Security Agency (NSA)

    The National Security Agency is the official U.S. cryptologic organization of the United States Intelligence Community under the ...

SearchHealthIT

  • Practice Fusion

    Practice Fusion Inc. is a San Francisco-based company that developed a free electronic health record (EHR) system available to ...

  • RHIA (Registered Health Information Administrator)

    An RHIA, or registered health information administrator, is a certified professional who oversees the creation and use of patient...

  • 21st Century Cures Act

    The 21st Century Cures Act is a wide-ranging healthcare bill that funds medical research and development, medical device ...

SearchDisasterRecovery

SearchStorage

  • Random Access Memory (RAM)

    Random Access Memory (RAM) is the hardware in a computing device where the operating system (OS), application programs and data ...

  • floating gate transistor (FGT)

    A floating gate transistor (FGT) is a complementary metal-oxide semiconductor (CMOS) technology capable of holding an electrical ...

  • bad block

    A bad block is an area of storage media that is no longer reliable for storing and retrieving data because it has been physically...

SearchSolidStateStorage

  • hybrid hard disk drive (HDD)

    A hybrid hard disk drive is an electromechanical spinning hard disk that contains some amount of NAND Flash memory.

Close