Browse Definitions :
Definition

SequenceFile

A SequenceFile is a flat, binary file type that serves as a container for data to be used in Apache Hadoop distributed computing projects. SequenceFiles are used extensively with MapReduce.

Since Hadoop functions best with larger files, SequenceFiles are used to store and compress files that are smaller than the optimum size for operating efficiently with Hadoop, which can help reduce required disk space capacity and I/O  requirements.

SequenceFiles serve as a container for a sequence of files. Keys are listed for reference and values, and the contents of the file are referenced in a given key. SequenceFiles support a Writer, a Reader and a Sorter class for respective functions in relation to keys. As an example, a SequenceFile might contain a massive number of log files for a server where the key would be a timestamp and the value would be the entire log file. Normally, the small text files would be very inefficient in Hadoop. After packaging into SequenceFiles, however, they can be used effectively.

Beyond packaging files into a manageable size for Hadoop, SequenceFiles support compression of the keys, the values or both.  When both are compressed, the file keys and values are collected into blocks and separately compressed. The type of compression chosen determines the file format.

This was last updated in December 2017

Continue Reading About SequenceFile

SearchCompliance

  • compliance risk

    Compliance risk is an organization's potential exposure to legal penalties, financial forfeiture and material loss, resulting ...

  • information governance

    Information governance is a holistic approach to managing corporate information by implementing processes, roles, controls and ...

  • enterprise document management (EDM)

    Enterprise document management (EDM) is a strategy for overseeing an organization's paper and electronic documents so they can be...

SearchSecurity

  • information security (infosec)

    Information security, often shortened to infosec, is the practice, policies and principles to protect data and other kinds of ...

  • denial-of-service attack

    A denial-of-service (DoS) attack is a security event that occurs when an attacker makes it impossible for legitimate users to ...

  • user authentication

    User authentication verifies the identity of a user attempting to gain access to a network or computing resource by authorizing a...

SearchHealthIT

SearchDisasterRecovery

  • risk mitigation

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a business.

  • call tree

    A call tree is a layered hierarchical communication model that is used to notify specific individuals of an event and coordinate ...

  • Disaster Recovery as a Service (DRaaS)

    Disaster recovery as a service (DRaaS) is the replication and hosting of physical or virtual servers by a third party to provide ...

SearchStorage

  • cloud storage

    Cloud storage is a service model in which data is transmitted and stored on remote storage systems, where it is maintained, ...

  • cloud testing

    Cloud testing is the process of using the cloud computing resources of a third-party service provider to test software ...

  • storage virtualization

    Storage virtualization is the pooling of physical storage from multiple storage devices into what appears to be a single storage ...

Close