Browse Definitions :
Definition

Apache Kafka

Contributor(s): Matthew Haughn

Apache Kafka is a distributed publish-subscribe messaging system that receives data from disparate source systems and makes the data available to target systems in real time. Kafka is written in Scala and Java and is often associated with real-time event stream processing for big data.

Like other message brokers systems, Kafka facilitates the asynchronous data exchange between processes, applications and servers. Unlike other messaging systems, however, Kafka has very low overhead because it does not track consumer behavior and delete messages that have been read. Instead, Kafka retains all messages for a set amount of time and makes the consumer responsible for tracking which messages have been read.

Kafka software runs on one or more servers and each node in a Kafka cluster is called a broker. Kafka uses Apache ZooKeeper to manage clusters; the broker's job is to help producer applications write data to topics and consumer applications read from topics. Topics are divided into partitions to make them more manageable and Kafka guarantees strong ordering for each partition. Because messages are written into a partition in a particular order and are read in the same order, each partition essentially becomes a commit log that can function as a single source of truth (SSoT) for a distributed system’s events.

Kafka’s code base, which was originally developed at LinkedIn to provide a mechanism for parallel load in Hadoop systems, became an open source project under the Apache Software Foundation in 2011. In 2014, the developers at LinkedIn who created Kafka started a company called Confluent to facilitate Kafka deployments and support enterprise-level Kafka-as-a-service products. Version 5.0 of the Confluent Platform, which was commercially released in 2018, improves the handling of application client failover for disaster recovery (DR) and reduces reliance on the Java programming language for data streaming analytics applications.

This was last updated in March 2019

Continue Reading About Apache Kafka

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchCompliance

  • PCI DSS (Payment Card Industry Data Security Standard)

    The Payment Card Industry Data Security Standard (PCI DSS) is a widely accepted set of policies and procedures intended to ...

  • risk management

    Risk management is the process of identifying, assessing and controlling threats to an organization's capital and earnings.

  • compliance framework

    A compliance framework is a structured set of guidelines that details an organization's processes for maintaining accordance with...

SearchSecurity

  • Trojan horse (computing)

    In computing, a Trojan horse is a program downloaded and installed on a computer that appears harmless, but is, in fact, ...

  • identity theft

    Identity theft, also known as identity fraud, is a crime in which an imposter obtains key pieces of personally identifiable ...

  • DNS over HTTPS (DoH)

    DNS over HTTPS (DoH) is a relatively new protocol that encrypts domain name system traffic by passing DNS queries through a ...

SearchHealthIT

  • telemedicine (telehealth)

    Telemedicine is the remote delivery of healthcare services, such as health assessments or consultations, over the ...

  • Project Nightingale

    Project Nightingale is a controversial partnership between Google and Ascension, the second largest health system in the United ...

  • medical practice management (MPM) software

    Medical practice management (MPM) software is a collection of computerized services used by healthcare professionals and ...

SearchDisasterRecovery

SearchStorage

  • M.2 SSD

    An M.2 SSD is a solid-state drive (SSD) that conforms to a computer industry specification and is used in internally mounted ...

  • kilobyte (KB or Kbyte)

    A kilobyte (KB or Kbyte) is a unit of measurement for computer memory or data storage used by mathematics and computer science ...

  • virtual memory

    Virtual memory is a memory management capability of an operating system (OS) that uses hardware and software to allow a computer ...

Close