Browse Definitions :
Definition

Avro (Apache Avro)

Contributor(s): Matthew Haughn

Apache Avro is a row-oriented object container storage format for Hadoop as well as a remote procedure call and data serialization framework. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Avro is optimized for write operations and includes a wire format for communication between nodes.

Avro makes translation between different nodes by way of the data definition and serialized permanent data. Avro uses JavaScript object notation to define the data types and protocols. The data is streamed in an efficient and compact binary format. An Avro container file consists of a header and one or multiple file storage blocks.

The header is made up of:

  • 4 bytes of ASCI “OBJ1”
  • File metadata including the schema definition
  • A sync marker: 16 bytes of randomly generated code

Avro also includes its own interface descriptor language (IDL) also named Avro, aside from JSON to define data types and protocols. IDL eases adoption by users who are used to more common traditional IDLs, which have a syntax more like C/C++.

Avro is a top-level project sponsored by the Apache Software Foundation (ASF).

This was last updated in January 2018

Continue Reading About Avro (Apache Avro)

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchCompliance

  • PCI DSS (Payment Card Industry Data Security Standard)

    The Payment Card Industry Data Security Standard (PCI DSS) is a widely accepted set of policies and procedures intended to ...

  • risk management

    Risk management is the process of identifying, assessing and controlling threats to an organization's capital and earnings.

  • compliance framework

    A compliance framework is a structured set of guidelines that details an organization's processes for maintaining accordance with...

SearchSecurity

  • Trojan horse (computing)

    In computing, a Trojan horse is a program downloaded and installed on a computer that appears harmless, but is, in fact, ...

  • identity theft

    Identity theft, also known as identity fraud, is a crime in which an imposter obtains key pieces of personally identifiable ...

  • DNS over HTTPS (DoH)

    DNS over HTTPS (DoH) is a relatively new protocol that encrypts domain name system traffic by passing DNS queries through a ...

SearchHealthIT

  • telemedicine (telehealth)

    Telemedicine is the remote delivery of healthcare services, such as health assessments or consultations, over the ...

  • Project Nightingale

    Project Nightingale is a controversial partnership between Google and Ascension, the second largest health system in the United ...

  • medical practice management (MPM) software

    Medical practice management (MPM) software is a collection of computerized services used by healthcare professionals and ...

SearchDisasterRecovery

SearchStorage

  • M.2 SSD

    An M.2 SSD is a solid-state drive (SSD) that conforms to a computer industry specification and is used in internally mounted ...

  • kilobyte (KB or Kbyte)

    A kilobyte (KB or Kbyte) is a unit of measurement for computer memory or data storage used by mathematics and computer science ...

  • virtual memory

    Virtual memory is a memory management capability of an operating system (OS) that uses hardware and software to allow a computer ...

Close