Browse Definitions :
Definition

data janitor (data wrangler)

Contributor(s): Matthew Haughn

A data janitor is an IT employee that cleans up big data sources to prepare them for data analysts and data scientists. The job was created to allow those with high-level skills to be employed most effectively rather than on work that could be done by others.  

It's estimated that data preparation time can make up more than 80 percent of the time involved in data analysis. Data janitors, also known as data wranglers, perform the necessary prep work that must be completed before more sophisticated processing and analysis are possible. A data janitor acquires, inspects, consolidates, cleans up and organizes disparate, disorganized data, making the work of data analysts and data scientists possible in much less time by offloading work that more skilled IT staff would normally have to do before actually working with the data.

Before data janitors do their work, big data is not ready for complex analysis. Their preparation also readies data for use with tools such as HadoopPigHiveSpark and MapReduce, and programming languages that include structured query language (SQL), PythonScala and Perl, as well as statistical computing languages such as R.

As IT firms acquire and process more and more data, division of the workload is increasingly important to deliver quality analysis on time. Often, it is junior employees in the field of data analysis that perform this painstaking preparation work. Almost a third of business intelligence workers can be considered data janitors, at least as part of their jobs. The term data janitor is typically not a job title but more of a description of the task. An employee whose primary role is data preparation may be referred to as a data engineer.

This was last updated in December 2017

Continue Reading About data janitor (data wrangler)

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchCompliance

  • regulatory compliance

    Regulatory compliance is an organization's adherence to laws, regulations, guidelines and specifications relevant to its business...

  • privacy compliance

    Privacy compliance is a company's accordance with established personal information protection guidelines, specifications or ...

  • data governance policy

    A data governance policy is a documented set of guidelines for ensuring that an organization's data and information assets are ...

SearchSecurity

  • asymmetric cryptography (public key cryptography)

    Asymmetric cryptography, also known as public-key cryptography, is a process that uses a pair of related keys -- one public key ...

  • Evil Corp

    Evil Corp is an international cybercrime network that uses malicious software to steal money from its victims' bank accounts.

  • Plundervolt

    Plundervolt is a method of hacking that involves depriving an Intel chip of power so that processing errors occur.

SearchHealthIT

  • telemedicine (telehealth)

    Telemedicine is the remote delivery of healthcare services, such as health assessments or consultations, over the ...

  • Project Nightingale

    Project Nightingale is a controversial partnership between Google and Ascension, the second largest health system in the United ...

  • medical practice management (MPM) software

    Medical practice management (MPM) software is a collection of computerized services used by healthcare professionals and ...

SearchDisasterRecovery

SearchStorage

  • M.2 SSD

    An M.2 SSD is a solid-state drive (SSD) that conforms to a computer industry specification written for internally mounted storage...

  • RAID (redundant array of independent disks)

    RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple hard disks or ...

  • cache memory

    Cache memory, also called CPU memory, is high-speed static random access memory (SRAM) that a computer microprocessor can access ...

Close