A URI (Uniform Resource Identifier) is a sequence of characters that identifies a logical or physical resource. Universal Resource Identifiers are specified in the Internet Engineering Task Force (IETF) Request for Comments (RFC) 3986 and are summarized and extended in documentation for the W3C’s Web Architecture, Architecture of the World Wide Web, Volume 1. According to the specifications, resources do not have to be accessible on the Internet. Examples of resources include electronic documents, elevator door sensors, XML namespaces, web pages and ID microchips for pets.
Uniform Resource Locator (URL) – this type of URI begins by stating which protocol should be used to locate and access the physical or logical resource on a network. If the resource is a web page, for example, the URI will begin with the protocol HTTP. If the resource is a file, the URI will begin with the protocol FTP or if the resource is an email address, the URI will begin with the protocol mailto. It is important to remember that URLs are not persistent. This means that if the resource’s location changes, the URL also needs to change to point to the resource’s new location.
Uniform Resource Name (URN) – this type of URI does not state which protocol should be used to locate and access the resource; it simply labels the resource with a persistent, location-independent unique identifier. A URN will identify the resource throughout its lifecycle and will never change. Each URN has three components: the label “urn,” a colon and a character string that serves as a unique identifier.
Every URL is also a URI, but not vice versa.
The generic form of any URI is scheme:[//[user:[email protected]]host[:port]][/]path[?query][#fragment]
Scheme: The scheme lays out the concrete syntax and any associated protocols for the URI. Schemes are case-insensitive and are followed by a colon. Ideally, URI schemes should be registered with the Internet Assigned Numbers Authority (IANA), although nonregistered schemes can also be used.
While the two slashes shown in the example above are required by some schemes, they are not required by all schemes, including authority components, which are described below.
Authority component: An authority component is made up of multiple parts: an optional authentication section, a host -- consisting of either a registered name or an IP address -- and an optional port number. The authentication section contains the username and password, which are separated by a colon and followed by the symbol for at (@). After the @ comes the hostname, which is in turn followed by a colon and then a port number. It is important to note that IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed in brackets.
The path, which contains data, is notated by a sequence of segments separated by slashes. The path must begin with a single slash if an authority part was present. It may also begin with a single slash even if there is no authority part, but it cannot begin with a double slash. Keep in mind that while this part of the syntax may closely resemble a particular file path, it does not always imply a relation to that file system path.
Query (optional): The query contains a string of nonhierarchical data. Although the syntax is not well-defined, it is most often a sequence of attribute value pairs separated by a delimiter, such as an ampersand or a semicolon. The query is separated from the preceding part by a question mark.
Fragment (optional): The fragment contains a fragment identifier that provides direction to a secondary resource. For example, if the primary resource is an HTML document, the fragment is often an ID attribute of a specific element of that document. If the fragment identifies a certain section of an article identified by the rest of the URI, a Web browser will scroll this particular element into view. The fragment is separated from the preceding part by a hash (#).
URI resolution and references
URI resolution is one of a few common operations performed on URIs that are also URLs. It involves determining the proper data access method and parameters needed to locate and retrieve the resource that the URI points to.
A URI-reference is used to determine common usage for a URI. A URI reference may take the form of a full URI, a specific portion of a full URI or an empty string. If there is a fragment identifier, it will identify some portion of the resource referred to by the rest of the URI.
A URI-reference can be a URI, but it can also be what is known as a relative reference. A URI is a relative reference if the URI-reference's prefix does not match the syntax of a scheme followed by its colon separator. In order to determine what components are present and whether the reference is relative, each of the five URI components are parsed for its subparts and their validation.