Browse Definitions:
Definition

SequenceFile

Contributor(s): Matthew Haughn

A SequenceFile is a flat, binary file type that serves as a container for data to be used in Apache Hadoop distributed computing projects. SequenceFiles are used extensively with MapReduce.

Since Hadoop functions best with larger files, SequenceFiles are used to store and compress files that are smaller than the optimum size for operating efficiently with Hadoop, which can help reduce required disk space capacity and I/O  requirements.

SequenceFiles serve as a container for a sequence of files. Keys are listed for reference and values, and the contents of the file are referenced in a given key. SequenceFiles support a Writer, a Reader and a Sorter class for respective functions in relation to keys. As an example, a SequenceFile might contain a massive number of log files for a server where the key would be a timestamp and the value would be the entire log file. Normally, the small text files would be very inefficient in Hadoop. After packaging into SequenceFiles, however, they can be used effectively.

Beyond packaging files into a manageable size for Hadoop, SequenceFiles support compression of the keys, the values or both.  When both are compressed, the file keys and values are collected into blocks and separately compressed. The type of compression chosen determines the file format.

This was last updated in December 2017

Continue Reading About SequenceFile

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What steps are you taking to monitor Hadoop processing and ensure that big data workloads run efficiently?
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchCompliance

  • smart contract

    A smart contract, also known as a cryptocontract, is a computer program that directly controls the transfer of digital currencies...

  • risk map (risk heat map)

    A risk map, also known as a risk heat map, is a data visualization tool for communicating specific risks an organization faces. A...

  • internal audit (IA)

    An internal audit (IA) is an organizational initiative to monitor and analyze its own business operations in order to determine ...

SearchSecurity

SearchHealthIT

SearchDisasterRecovery

  • incident management plan (IMP)

    An incident management plan (IMP), sometimes called an incident response plan or emergency management plan, is a document that ...

  • crisis communication

    Crisis communication is a method of corresponding with people and organizations during a disruptive event to provide them with ...

  • Zerto

    Zerto is a storage software vendor that specializes in enterprise-class business continuity and disaster recovery in virtual and ...

SearchStorage

  • SSD write cycle

    An SSD write cycle is the process of programming data to a NAND flash memory chip in a solid-state storage device.

  • data storage

    Data storage is the collective methods and technologies that capture and retain digital information on electromagnetic, optical ...

  • hard disk

    A hard disk is part of a unit -- often called a disk drive, hard drive or hard disk drive -- that stores and provides relatively ...

SearchSolidStateStorage

  • hybrid hard disk drive (HDD)

    A hybrid hard disk drive is an electromechanical spinning hard disk that contains some amount of NAND Flash memory.

Close