URL's, Domains & IP Addresses Explained

What Is A URL?

A URL (Uniform Resource Locator) is a pointer to (or the address of) a web page or website. You will also occasionally hear the more technical term URI (Uniform Resource Identifier) - but for our purposes, URL is the term to understand.

A URL is made up of two parts: The first part, known as the 'scheme' defines the protocol used to access the file. The second part contains the domain name of the server where the information is located, together with the file path and name. e.g.:

http://www.example.com/index.htm

The most common scheme is HTTP (HyperText Transfer Protocol) or its variant HTTPS (a secure version.) FTP (File Transfer Protocol) is another scheme used to identify the location of files for uploading and downloading. Most commonly, you will hear FTP used as a verb (as in "FTP it up to your site") or as a class of software tools for file transfer.

So... What Is A Domain?

Domain names identify a particular website. For example, www.adobe.com is the main website for the company Adobe. Sometimes you will see a "www." at the start of a domain, and sometimes not. WWW simply refers to the fact that it is on the World Wide Web. You could also go to adobe.com (without the www) and get the same result. There are a number of issues caused by this which are referred to by the term 'canonicalization' - but that's beyond our scope for now.

You will also see sub-domains – such as forums.adobe.com, where the www has been replaced by another word. (You cannot use forums.www.adobe.com though – this would be incorrect.) Wikipedia also used sub-domains to denote the language of the site. en.wikipedia.org is the English site, whereas es.wikipedia.org is the Spanish version.

URL's In More Depth

The second part of the URL is the path and filename of the file being referenced. So with

http://www.example.com/images/logo.gif

...the file is called 'logo.gif' and it is located in the 'images' sub-directory of the website.

Note: When you're linking files and pages from within a website, you should ensure that all links are 'relative' rather than 'absolute'. This simply means that if you're referencing a picture, image or page, the path you use is the relative one from the current or root directory - and doesn't include the full HTTP domain name. The relative path of an 'index.htm' home-page would be just that - 'index.htm'. The absolute path would be 'http://www.example.com/index.htm'. Many web development environments such as Dreamweaver can take care of this for you automatically.

You will also see many URL's that do not seem to have a file name or extension at the end. e.g.:

www.example.com/blog

...In this case, the web server will be looking within the 'blog' directory for a standard default document. This is usually index.htm, default.htm or index.php. It will then serve that file.

Sometimes, this isn’t the case though, as fancy URL direction rules can be written into a server (you may hear of exotic sounding Apache 'mod_rewrite' and .htaccess scripts.) These can take over the URL and redirect the browser to the page that is provided by the systems rules. One example is Amazon’s rather convoluted URL's. Take for example:

http://www.amazon.com/Art-War-Sun-Tzu/dp/1936594358/ref=sr_1_2?ie=UTF8&qid=1307625504&sr=8-2

This is essentially passing everything after amazon.com/ to the web server's interpreter - and using all the rest to decide what to output, where you linked from, and other sales information. It's not a static web-page in the traditional sense, but it will be a dynamically generated page from their database of products and page template designs. This URL would obtain the same page, but not provide other tracking information etc.

http://www.amazon.com/Art-War-Sun-Tzu/dp/1936594358

Domain Extensions

Domains always have an extension, which can be generic top-level domains (TLD's) or country-specific.

For example, some generic top-level domains include:

  • .com - indicates a commercial organisation or company.
  • .gov - indicates governmental entities and agencies.
  • .org - usually indicates non-profit companies - although this isn't strictly adhered to.
  • .edu - indicates educational institutions such as schools, colleges and universities.

Some country specific examples are:

  • .co.uk - United Kingdom
  • .co.in - India
  • .ca - Canada

A full list of all top-level domains can be found on Wikipedia at:

http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains

There are many more domain extensions coming online regularly as the registrars seek to make more money by opening up more options!

DNS (Domain Name System) & IP Addresses

Internally, on the Internet and inside servers, domain names are actually IP addresses - and look like this:

212.58.244.27 (under IPV4)

...which is a lot harder to remember than BBC.co.uk !

IPV6 makes it even worse - an example would be 2620:0:2d0:200::10 !

This is why the Domain Name System works like a big 'phone book' - to translate the easy-to-remember domain names we know into the IP addresses required by computers. They do this by using many distributed computers acting as 'nameservers'.

When you register a new domain, 'propagation' is the term for that registration seeping out across all the worldwide nameservers - until they can all find you. This is why it takes 24-48 hours for a new site to appear under it’s domain name after first registration - or if you move your site to another server. (Note: You could potentially view a new site immediately if you use the physical IP address though, as nameservers only update the pointer from the domain to the IP address; they don’t actually move anything. It’s kind of similar to moving house and taking your phone number with you – you keep the same handle (your phone number) - but it now points to a different box on the wall.

IPV4 has now essentially run out of new IP addresses (all 4 Billion are gone!) which is why IPV6 is required. The last IPV4 addresses were issued during 2011. There is a process called NAT (Network Address Translation) which effectively multiplies the available IP’s by 65,536 – and this is why the internet is still able to carry on uninterrupted – even with all the IP’s used up. But the workings of NAT are too complicated to cover here – and not really relevant to most people.