Browse Definitions :
Definition

robots.txt

Robots.txt is a file on a website that instructs search engine crawlers which parts of the site should not be accessed by search engine bot programs. Robots.txt is a plaintext file but uses special commands and syntax for webcrawlers. Though not officially standardized, robots.txt is generally followed by all search engines.

Spider programs, such as Googlebot, index a website using instructions set forth by the site's webmaster. Sometimes a webmaster may have parts of site that have not have been optimized for search engines, or some parts of websites might be prone to exploitation by spammers through, for example, link spam on a page that features user generated content (UGC). Should a webmaster wish to keep pages hidden from Google search, he can block the page with a robots.txt file at the top-level folder of the site.Robots.txt is also known as “the robot exclusion protocol.” Preventing crawlers from indexing spammy content means the page will not be considered when determining PageRank and placement in search engine results pages (SERP). 

The nofollow tag is another way to control webcrawler behavior. The nofollow tag stops crawlers from tallying links within pages for determining PageRank. Webmasters can use nofollow to avoid search engine optimization (SEO) penalties. To prevent Googlebot from following any links on a given page of a site, the webmaster can include a nofollow meta tag in the robots.txt file; to prevent the bot from following individual links, they can add rel="nofollow" to the links themselves.

This was last updated in June 2017

Continue Reading About robots.txt

SearchCompliance
  • compliance risk

    Compliance risk is an organization's potential exposure to legal penalties, financial forfeiture and material loss, resulting ...

  • information governance

    Information governance is a holistic approach to managing corporate information by implementing processes, roles, controls and ...

  • enterprise document management (EDM)

    Enterprise document management (EDM) is a strategy for overseeing an organization's paper and electronic documents so they can be...

SearchSecurity
  • Extensible Authentication Protocol (EAP)

    The Extensible Authentication Protocol (EAP) is a protocol for wireless networks that expands the authentication methods used by ...

  • session key

    A session key is an encryption and decryption key that is randomly generated to ensure the security of a communications session ...

  • data breach

    A data breach is a cyber attack in which sensitive, confidential or otherwise protected data has been accessed and/or disclosed ...

SearchHealthIT
SearchDisasterRecovery
  • risk mitigation

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a business.

  • call tree

    A call tree is a layered hierarchical communication model that is used to notify specific individuals of an event and coordinate ...

  • Disaster Recovery as a Service (DRaaS)

    Disaster recovery as a service (DRaaS) is the replication and hosting of physical or virtual servers by a third party to provide ...

SearchStorage
  • cloud storage

    Cloud storage is a service model in which data is transmitted and stored on remote storage systems, where it is maintained, ...

  • cloud testing

    Cloud testing is the process of using the cloud computing resources of a third-party service provider to test software ...

  • storage virtualization

    Storage virtualization is the pooling of physical storage from multiple storage devices into what appears to be a single storage ...

Close