Browse Definitions :
Definition

robots.txt

Robots.txt is a file on a website that instructs search engine crawlers which parts of the site should not be accessed by search engine bot programs. Robots.txt is a plaintext file but uses special commands and syntax for webcrawlers. Though not officially standardized, robots.txt is generally followed by all search engines.

Spider programs, such as Googlebot, index a website using instructions set forth by the site's webmaster. Sometimes a webmaster may have parts of site that have not have been optimized for search engines, or some parts of websites might be prone to exploitation by spammers through, for example, link spam on a page that features user generated content (UGC). Should a webmaster wish to keep pages hidden from Google search, he can block the page with a robots.txt file at the top-level folder of the site.Robots.txt is also known as “the robot exclusion protocol.” Preventing crawlers from indexing spammy content means the page will not be considered when determining PageRank and placement in search engine results pages (SERP). 

The nofollow tag is another way to control webcrawler behavior. The nofollow tag stops crawlers from tallying links within pages for determining PageRank. Webmasters can use nofollow to avoid search engine optimization (SEO) penalties. To prevent Googlebot from following any links on a given page of a site, the webmaster can include a nofollow meta tag in the robots.txt file; to prevent the bot from following individual links, they can add rel="nofollow" to the links themselves.

This was last updated in June 2017

Continue Reading About robots.txt

SearchCompliance
  • risk reporting

    Risk reporting is a method of identifying risks tied to or potentially impacting an organization's business processes.

  • risk avoidance

    Risk avoidance is the elimination of hazards, activities and exposures that can negatively affect an organization and its assets.

  • risk profile

    A risk profile is a quantitative analysis of the types of threats an organization, asset, project or individual faces.

SearchSecurity
SearchHealthIT
SearchDisasterRecovery
  • What is risk mitigation?

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a business.

  • fault-tolerant

    Fault-tolerant technology is a capability of a computer system, electronic system or network to deliver uninterrupted service, ...

  • synchronous replication

    Synchronous replication is the process of copying data over a storage area network, local area network or wide area network so ...

SearchStorage
  • cloud archive

    A cloud archive is storage as a service for long-term data retention.

  • cache

    A cache -- pronounced CASH -- is hardware or software that is used to store something, usually data, temporarily in a computing ...

  • archive

    An archive is a collection of data moved to a repository for long-term retention, to keep separate for compliance reasons or for ...

Close