Browse Definitions :
Definition

data janitor (data wrangler)

A data janitor is an IT employee that cleans up big data sources to prepare them for data analysts and data scientists. The job was created to allow those with high-level skills to be employed most effectively rather than on work that could be done by others.  

It's estimated that data preparation time can make up more than 80 percent of the time involved in data analysis. Data janitors, also known as data wranglers, perform the necessary prep work that must be completed before more sophisticated processing and analysis are possible. A data janitor acquires, inspects, consolidates, cleans up and organizes disparate, disorganized data, making the work of data analysts and data scientists possible in much less time by offloading work that more skilled IT staff would normally have to do before actually working with the data.

Before data janitors do their work, big data is not ready for complex analysis. Their preparation also readies data for use with tools such as HadoopPigHiveSpark and MapReduce, and programming languages that include structured query language (SQL), PythonScala and Perl, as well as statistical computing languages such as R.

As IT firms acquire and process more and more data, division of the workload is increasingly important to deliver quality analysis on time. Often, it is junior employees in the field of data analysis that perform this painstaking preparation work. Almost a third of business intelligence workers can be considered data janitors, at least as part of their jobs. The term data janitor is typically not a job title but more of a description of the task. An employee whose primary role is data preparation may be referred to as a data engineer.

This was last updated in December 2017

Continue Reading About data janitor (data wrangler)

SearchCompliance
  • smart contract

    A smart contract is a decentralized application that executes business logic in response to events.

  • compliance risk

    Compliance risk is an organization's potential exposure to legal penalties, financial forfeiture and material loss, resulting ...

  • information governance

    Information governance is a holistic approach to managing corporate information by implementing processes, roles, controls and ...

SearchSecurity
  • social engineering

    Social engineering is an attack vector that relies heavily on human interaction and often involves manipulating people into ...

  • distributed denial-of-service (DDoS) attack

    A distributed denial-of-service (DDoS) attack is one in which multiple compromised computer systems attack a target, such as a ...

  • password cracking

    Password cracking is the process of using an application program to identify an unknown or forgotten password to a computer or ...

SearchHealthIT
SearchDisasterRecovery
  • change control

    Change control is a systematic approach to managing all changes made to a product or system.

  • disaster recovery (DR)

    Disaster recovery (DR) is an organization's ability to respond to and recover from an event that affects business operations.

  • risk mitigation

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a business.

SearchStorage
  • storage security

    Storage security is the group of parameters and settings that make storage resources available to authorized users and trusted ...

  • cloud storage

    Cloud storage is a service model in which data is transmitted and stored on remote storage systems, where it is maintained, ...

  • cloud data management

    Cloud data management is a way to manage data across cloud platforms, either with or instead of on-premises storage.

Close