Browse Definitions :
Definition

SequenceFile

A SequenceFile is a flat, binary file type that serves as a container for data to be used in Apache Hadoop distributed computing projects. SequenceFiles are used extensively with MapReduce.

Since Hadoop functions best with larger files, SequenceFiles are used to store and compress files that are smaller than the optimum size for operating efficiently with Hadoop, which can help reduce required disk space capacity and I/O  requirements.

SequenceFiles serve as a container for a sequence of files. Keys are listed for reference and values, and the contents of the file are referenced in a given key. SequenceFiles support a Writer, a Reader and a Sorter class for respective functions in relation to keys. As an example, a SequenceFile might contain a massive number of log files for a server where the key would be a timestamp and the value would be the entire log file. Normally, the small text files would be very inefficient in Hadoop. After packaging into SequenceFiles, however, they can be used effectively.

Beyond packaging files into a manageable size for Hadoop, SequenceFiles support compression of the keys, the values or both.  When both are compressed, the file keys and values are collected into blocks and separately compressed. The type of compression chosen determines the file format.

This was last updated in December 2017

Continue Reading About SequenceFile

SearchCompliance
  • risk reporting

    Risk reporting is a method of identifying risks tied to or potentially impacting an organization's business processes.

  • risk avoidance

    Risk avoidance is the elimination of hazards, activities and exposures that can negatively affect an organization and its assets.

  • risk profile

    A risk profile is a quantitative analysis of the types of threats an organization, asset, project or individual faces.

SearchSecurity
SearchHealthIT
SearchDisasterRecovery
  • What is risk mitigation?

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a business.

  • fault-tolerant

    Fault-tolerant technology is a capability of a computer system, electronic system or network to deliver uninterrupted service, ...

  • synchronous replication

    Synchronous replication is the process of copying data over a storage area network, local area network or wide area network so ...

SearchStorage
  • cloud archive

    A cloud archive is storage as a service for long-term data retention.

  • cache

    A cache -- pronounced CASH -- is hardware or software that is used to store something, usually data, temporarily in a computing ...

  • archive

    An archive is a collection of data moved to a repository for long-term retention, to keep separate for compliance reasons or for ...

Close