Browse Definitions :
Definition

information extraction (IE)

Contributor(s): Matthew Haughn

Information extraction (IE) is the automated retrieval of specific information related to a selected topic from a body or bodies of text.

Information extraction tools make it possible to pull information from text documents, databases, websites or multiple sources. IE may extract info from unstructured, semi-structured or structured, machine-readable text. Usually, however, IE is used in natural language processing (NLP) to extract structured from unstructured text.

Information extraction depends on named entity recognition (NER), a sub-tool used to find targeted information to extract. NER recognizes entities first as one of several categories such as location (LOC), persons (PER) or organizations (ORG). Once the information category is recognized, an information extraction utility extracts the named entity’s related information and constructs a machine-readable document from it, which algorithms can further process to extract meaning. IE finds meaning by way of other subtasks including co-reference resolution, relationship extraction, language and vocabulary analysis and sometimes audio extraction.

IE dates back to the early days of Natural Language Processing of the 1970’s. JASPER is a system for IE that for Reuters by Carnegie Melon University is an early example. Current efforts in multimedia document processing in IE include automatic annotation and content recognition and extraction from images and video could be seen as IE as well.

Because of the complexity of language, high-quality IE is a challenging task for artificial intelligence (AI) systems.

This was last updated in January 2018

Continue Reading About information extraction (IE)

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchCompliance

  • PCI DSS (Payment Card Industry Data Security Standard)

    The Payment Card Industry Data Security Standard (PCI DSS) is a widely accepted set of policies and procedures intended to ...

  • risk management

    Risk management is the process of identifying, assessing and controlling threats to an organization's capital and earnings.

  • compliance framework

    A compliance framework is a structured set of guidelines that details an organization's processes for maintaining accordance with...

SearchSecurity

  • DNS over HTTPS (DoH)

    DNS over HTTPS (DoH) is a relatively new protocol that encrypts domain name system traffic by passing DNS queries through a ...

  • integrated risk management (IRM)

    Integrated risk management (IRM) is an approach to risk management that uses a set of practices and processes to improve an ...

  • MITRE ATT&CK framework

    The MITRE ATT&CK (pronounced 'miter attack') framework is a free, globally accessible service that provides comprehensive and ...

SearchHealthIT

  • telemedicine (telehealth)

    Telemedicine is the remote delivery of healthcare services, such as health assessments or consultations, over the ...

  • Project Nightingale

    Project Nightingale is a controversial partnership between Google and Ascension, the second largest health system in the United ...

  • medical practice management (MPM) software

    Medical practice management (MPM) software is a collection of computerized services used by healthcare professionals and ...

SearchDisasterRecovery

SearchStorage

  • M.2 SSD

    An M.2 SSD is a solid-state drive (SSD) that conforms to a computer industry specification and is used in internally mounted ...

  • kilobyte (KB or Kbyte)

    A kilobyte (KB or Kbyte) is a unit of measurement for computer memory or data storage used by mathematics and computer science ...

  • virtual memory

    Virtual memory is a memory management capability of an operating system (OS) that uses hardware and software to allow a computer ...

Close