Browse Definitions :
Definition

SequenceFile

Contributor(s): Matthew Haughn

A SequenceFile is a flat, binary file type that serves as a container for data to be used in Apache Hadoop distributed computing projects. SequenceFiles are used extensively with MapReduce.

Since Hadoop functions best with larger files, SequenceFiles are used to store and compress files that are smaller than the optimum size for operating efficiently with Hadoop, which can help reduce required disk space capacity and I/O  requirements.

SequenceFiles serve as a container for a sequence of files. Keys are listed for reference and values, and the contents of the file are referenced in a given key. SequenceFiles support a Writer, a Reader and a Sorter class for respective functions in relation to keys. As an example, a SequenceFile might contain a massive number of log files for a server where the key would be a timestamp and the value would be the entire log file. Normally, the small text files would be very inefficient in Hadoop. After packaging into SequenceFiles, however, they can be used effectively.

Beyond packaging files into a manageable size for Hadoop, SequenceFiles support compression of the keys, the values or both.  When both are compressed, the file keys and values are collected into blocks and separately compressed. The type of compression chosen determines the file format.

This was last updated in December 2017

Continue Reading About SequenceFile

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

What steps are you taking to monitor Hadoop processing and ensure that big data workloads run efficiently?
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

SearchCompliance

  • PCI DSS (Payment Card Industry Data Security Standard)

    The Payment Card Industry Data Security Standard (PCI DSS) is a widely accepted set of policies and procedures intended to ...

  • risk management

    Risk management is the process of identifying, assessing and controlling threats to an organization's capital and earnings.

  • compliance framework

    A compliance framework is a structured set of guidelines that details an organization's processes for maintaining accordance with...

SearchSecurity

  • Trojan horse (computing)

    In computing, a Trojan horse is a program downloaded and installed on a computer that appears harmless, but is, in fact, ...

  • identity theft

    Identity theft, also known as identity fraud, is a crime in which an imposter obtains key pieces of personally identifiable ...

  • DNS over HTTPS (DoH)

    DNS over HTTPS (DoH) is a relatively new protocol that encrypts domain name system traffic by passing DNS queries through a ...

SearchHealthIT

  • telemedicine (telehealth)

    Telemedicine is the remote delivery of healthcare services, such as health assessments or consultations, over the ...

  • Project Nightingale

    Project Nightingale is a controversial partnership between Google and Ascension, the second largest health system in the United ...

  • medical practice management (MPM) software

    Medical practice management (MPM) software is a collection of computerized services used by healthcare professionals and ...

SearchDisasterRecovery

SearchStorage

  • M.2 SSD

    An M.2 SSD is a solid-state drive (SSD) that conforms to a computer industry specification and is used in internally mounted ...

  • kilobyte (KB or Kbyte)

    A kilobyte (KB or Kbyte) is a unit of measurement for computer memory or data storage used by mathematics and computer science ...

  • virtual memory

    Virtual memory is a memory management capability of an operating system (OS) that uses hardware and software to allow a computer ...

Close