Browse Definitions :
Definition

SequenceFile

Contributor(s): Matthew Haughn

A SequenceFile is a flat, binary file type that serves as a container for data to be used in Apache Hadoop distributed computing projects. SequenceFiles are used extensively with MapReduce.

Since Hadoop functions best with larger files, SequenceFiles are used to store and compress files that are smaller than the optimum size for operating efficiently with Hadoop, which can help reduce required disk space capacity and I/O  requirements.

SequenceFiles serve as a container for a sequence of files. Keys are listed for reference and values, and the contents of the file are referenced in a given key. SequenceFiles support a Writer, a Reader and a Sorter class for respective functions in relation to keys. As an example, a SequenceFile might contain a massive number of log files for a server where the key would be a timestamp and the value would be the entire log file. Normally, the small text files would be very inefficient in Hadoop. After packaging into SequenceFiles, however, they can be used effectively.

Beyond packaging files into a manageable size for Hadoop, SequenceFiles support compression of the keys, the values or both.  When both are compressed, the file keys and values are collected into blocks and separately compressed. The type of compression chosen determines the file format.

This was last updated in December 2017

Continue Reading About SequenceFile

SearchCompliance

SearchSecurity

  • cyber attack

    A cyber attack is any attempt to gain unauthorized access to a computer, computing system or computer network with the intent to ...

  • backdoor (computing)

    A backdoor is a means to access a computer system or encrypted data that bypasses the system's customary security mechanisms.

  • post-quantum cryptography

    Post-quantum cryptography, also called quantum encryption, is the development of cryptographic systems for classical computers ...

SearchHealthIT

SearchDisasterRecovery

  • risk mitigation

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a business.

  • call tree

    A call tree is a layered hierarchical communication model that is used to notify specific individuals of an event and coordinate ...

  • Disaster Recovery as a Service (DRaaS)

    Disaster recovery as a service (DRaaS) is the replication and hosting of physical or virtual servers by a third party to provide ...

SearchStorage

  • cloud SLA (cloud service-level agreement)

    A cloud SLA (cloud service-level agreement) is an agreement between a cloud service provider and a customer that ensures a ...

  • NOR flash memory

    NOR flash memory is one of two types of non-volatile storage technologies.

  • RAM (Random Access Memory)

    RAM (Random Access Memory) is the hardware in a computing device where the operating system (OS), application programs and data ...

Close