Tuesday, November 6, 2012

URI in Java

Before starting with Java.net.URI let us understand first what does URI means in general.

URI (Uniform Resource Identifier) in General

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.
The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment.

   The following are two examples URIs and their component parts:




URI, URL, and URN

A URI can be further classified as a locator, a name, or both.

The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource.

The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.
The following examples illustrate URI that are in common use.

   ftp://ftp.is.co.za/rfc/rfc1808.txt
      -- ftp scheme for File Transfer Protocol services
   gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
      -- gopher scheme for Gopher and Gopher+ Protocol services
   http://www.math.uio.no/faq/compression-faq/part1.html
      -- http scheme for Hypertext Transfer Protocol services
   mailto:mduerst@ifi.unizh.ch
      -- mailto scheme for electronic mail addresses
   news:comp.infosystems.www.servers.unix
      -- news scheme for USENET news groups and articles
   telnet://melvyl.ucop.edu/
      -- telnet scheme for interactive services via the TELNET Protocol

URI syntax and components

At the highest level a URI reference in string form has the syntax

 [scheme:]scheme-specific-part[#fragment]

Where square brackets [...] delineate optional components and the characters: and # stand for themselves.

URI can be classified as below,
Absolute URI: URI that specifies a scheme.
Relative URI: URI that doesn’t specify a scheme.

An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character ('/'). Opaque URIs are not subject to further parsing. Some examples of opaque URIs are:
mailto:java-net@java.sun.com
news:comp.lang.java

A hierarchical URI is either an absolute URI whose scheme-specific part begins with a slash character, or a relative URI, that is, a URI that does not specify a scheme.
Some examples of hierarchical URIs are:
http://java.sun.com/j2se/1.3/
docs/guide/collections/designfaq.html#28
../../../demo/jfc/SwingSet2/src/SwingSet2.java
file:///~/calendar

A hierarchical URI is subject to further parsing according to the syntax
[scheme:][//authority][path][?query][#fragment]

Where the characters :, /, ?, and # stand for themselves. The scheme-specific part of a hierarchical URI consists of the characters between the scheme and fragment components.
The authority component of a hierarchical URI is, if specified, either server-based or registry-based. A server-based authority parses according to the familiar syntax
[user-info@]host[:port]

Where the characters @ and : stand for themselves. Nearly all URI schemes currently in use are server-based. An authority component that does not parse in this way is considered to be registry-based.
The path component of a hierarchical URI is itself said to be absolute if it begins with a slash character ('/'); otherwise it is relative. The path of a hierarchical URI that is either absolute or specifies an authority is always absolute.

java.net.URI Explained


The class java.net.URI provides,
  • Constructors for creating URI instances from their components or by parsing their string forms
  • Methods for accessing the various components of an instance
  • Methods for normalizing, resolving, and relativizing URI instances.
Instances of this class are immutable.

Constructing URI instances


You can construct URI instance by using any of below constructors.
URI(String str)
//Constructs a URI by parsing the given string.
URI(String scheme, String ssp, String fragment)  
//Constructs a URI from the given components.
URI(String scheme, String userInfo, String host, int port, String path, String query, String fragment)
//Constructs a hierarchical URI from the given components.
URI(String scheme, String host, String path, String fragment)
//Constructs a hierarchical URI from the given components.
URI(String scheme, String authority, String path, String query, String fragment)
//Constructs a hierarchical URI from the given components.
The all above constructors throw a checked exception URISyntaxException if string representation of URI could not be parsed as a URI reference.
/*
 * Constructing URI using constructor
 * URI(String scheme, String host, String path, String fragment)
 */
try {
 URI uri1 =  new URI("http","www.example.com","/abs.html","#2");
 System.out.println("Scheme : "+ uri1.getScheme());
 System.out.println("Authority : "+ uri1.getAuthority());
 System.out.println("Path : "+ uri1.getPath());
 System.out.println("Fragment :"+ uri1.getFragment());
 System.out.println("To String:" + uri1.toString());
} catch (URISyntaxException e) {
 System.out.println("Invalid Syntax excpetion");
}

/*
 * Constructing URI using constructor
 * URI(String str)
 */

try {
 URI uri1 =  new URI("http://www.example.com#2");
 System.out.println("Authority : "+ uri1.getAuthority());
 System.out.println("Fragment :"+ uri1.getFragment());
} catch (URISyntaxException e) {
 System.out.println("Invalid Syntax excpetion");
}

You can also use static URI.create(String str) method to construct URI. This method works similar to Constructor  URI(String str) , But doesn’t throw any checked exception URISyntaxException. This method is provided for use in situations where it is known that the given string is a legal URI, for example for URI constants declared within in a program, and so it would be considered a programming error for the string not to parse as such.
 /*
  * Using static method
  *    
  */
 URI uri1 = URI.create("http://www.example.com#2");
 System.out.println("Authority : "+ uri1.getAuthority());
 System.out.println("Fragment :"+ uri1.getFragment());

Operations on URI instances

1. Normalization
Normalization is the process of removing unnecessary "." and ".." segments from the path component of a hierarchical URI. Each "." segment is simply removed. A ".." segment is removed only if it is preceded by a non-".." segment. Normalization has no effect upon opaque URIs.
URI uri1 = URI.create("http://www.example.com/main/part/.././usr/../doc//as.html#2");
URI uri2 = uri1.normalize();
System.out.println(uri1);
//http://www.example.com/main/part/.././usr/../doc//as.html#2
System.out.println(uri2);
//http://www.example.com/main/doc/as.html#2
2. Resolution
Resolution is the process of resolving one URI against another, base URI.
The result, for example, of resolving
guide/collections/designfaq.html#28    
against the base URI
http://java.sun.com/j2se/1.3/docs/
is the result URI
http://java.sun.com/j2se/1.3/docs/guide/collections/designfaq.html#28.

URI uriBase = URI.create("http://java.sun.com/j2se/1.3/docs/");
URI uri = uriBase.resolve(URI.create("guide/collections/designfaq.html#28"));
System.out.println(uri);
//http://java.sun.com/j2se/1.3/docs/guide/collections/designfaq.html#28
3 Relativization

Relativization, finally, is the inverse of resolution.
URI uriRelative = uriBase.relativize(uri); // “uri” created in above example
System.out.println(uriRelative); 
// guide/collections/designfaq.html#28


No comments: