Browse Definitions:

data janitor (data wrangler)

Contributor(s): Matthew Haughn

A data janitor is an IT employee that cleans up big data sources to prepare them for data analysts and data scientists. The job was created to allow those with high-level skills to be employed most effectively rather than on work that could be done by others.  

It's estimated that data preparation time can make up more than 80 percent of the time involved in data analysis. Data janitors, also known as data wranglers, perform the necessary prep work that must be completed before more sophisticated processing and analysis are possible. A data janitor acquires, inspects, consolidates, cleans up and organizes disparate, disorganized data, making the work of data analysts and data scientists possible in much less time by offloading work that more skilled IT staff would normally have to do before actually working with the data.

Before data janitors do their work, big data is not ready for complex analysis. Their preparation also readies data for use with tools such as HadoopPigHiveSpark and MapReduce, and programming languages that include structured query language (SQL), PythonScala and Perl, as well as statistical computing languages such as R.

As IT firms acquire and process more and more data, division of the workload is increasingly important to deliver quality analysis on time. Often, it is junior employees in the field of data analysis that perform this painstaking preparation work. Almost a third of business intelligence workers can be considered data janitors, at least as part of their jobs. The term data janitor is typically not a job title but more of a description of the task. An employee whose primary role is data preparation may be referred to as a data engineer.

This was last updated in December 2017

Continue Reading About data janitor (data wrangler)

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.


File Extensions and File Formats

Powered by:


  • risk map (risk heat map)

    A risk map, also known as a risk heat map, is a data visualization tool for communicating specific risks an organization faces.

  • internal audit (IA)

    An internal audit (IA) is an organizational initiative to monitor and analyze its own business operations in order to determine ...

  • pure risk (absolute risk)

    Pure risk, also called absolute risk, is a category of threat that is beyond human control and has only one possible outcome if ...


  • federated identity management (FIM)

    Federated identity management (FIM) is an arrangement that can be made among multiple enterprises to let subscribers use the same...

  • cross-site scripting (XSS)

    Cross-site scripting (XSS) is a type of injection security attack in which an attacker injects data, such as a malicious script, ...

  • firewall

    In computing, a firewall is software or firmware that enforces a set of rules about what data packets will be allowed to enter or...



  • business continuity and disaster recovery (BCDR)

    Business continuity and disaster recovery (BCDR) are closely related practices that describe an organization's preparation for ...

  • business continuity plan (BCP)

    A business continuity plan (BCP) is a document that consists of the critical information an organization needs to continue ...

  • call tree

    A call tree -- sometimes referred to as a phone tree -- is a telecommunications chain for notifying specific individuals of an ...


  • volume manager

    A volume manager is software within an operating system (OS) that controls capacity allocation for storage arrays.

  • external storage device

    An external storage device, also referred to as auxiliary storage and secondary storage, is a device that contains all the ...

  • NetApp SolidFire

    NetApp SolidFire is a business division of NetApp Inc. that specializes in all-flash storage systems.


  • hybrid hard disk drive (HDD)

    A hybrid hard disk drive is an electromechanical spinning hard disk that contains some amount of NAND Flash memory.