World Wide Web | Part 2

Browsing the Web and Deep Web

Exploration recursive web based resource is well chosen the basic method programmed in the spiders from search engines. In 2004, the search engines index about 4 billion of resources.

The Deep Web or Invisible Web is that part of the web is not indexed and therefore not found with general search engines. Studies indicate that the invisible Web is more than 99% of the Web. The deep web includes the following resources:

  • Resources inaccessible to the public, so the robots, including administrative pages or pay, protected by a password;
  • Resources that are not provided by communication protocols supported by the robots (often they only support HTTP and HTTPS);
  • Resources whose data format is not supported by the robot;
  • The resources listed in a file exclusion of robots;
  • Resources excluded by the robot because they are designed to take advantage of SEO (grunge);
  • Resources excluded by the robot because they are deemed too inappropriate (eg if a site contains millions of resources that are not linked by any other site);
  • Resources to which links are created dynamically in response to questions from visitors.

These last resources generally come from databases and are the most important part of the Deep Web.

Public Web Servers

The recursive exploration is not the only method used to index the Web and measure its size. The alternative is to measure the infrastructure connected to the Internet to host websites. Instead of following links, this method is to use the domain names registered in the Domain Name System and try to connect to all web servers potential. These include the method used by the company Netcraft, which regularly publishes the results of his explorations, which measures the popularity of HTTP servers. This measure focuses on the use of Web technologies as the Web itself. It helps to find such public sites that are not related to the World Wide Web.

Intranets And Private Web

A Web available on an intranet is private. It is totally separate from the Web, a Web source. It is a source where the intranet is connected to the Internet and a Web link points to a Web resource. Links from the Web, however, are impossible because by definition an intranet does not offer public access.

A source can also be found on the Internet. In this case, it is a virtual private web, because the public can not find the following links.

Web Archiving

The Web is constantly changing: the resources are constantly being created, modified and deleted. There are some initiatives Archives Web whose purpose is to help find that contained a site on a given date. The Internet Archive project is one of them.

Web Resource Types

The various types of Web resources have quite distinct uses:

  • Resources constituting web pages: HTML, JPEG or PNG or GIF, JavaScript, CSS, sounds, animations;
  • Resources accessible from a web page but consulted with a particular interface: streaming audio, streaming video;
  • Resources designed to be viewed separately: documents (PDF, PostScript, Word, etc.) Text file, all types of images, music, video, files to back;
  • Resources owned by systems distinct Web: Usenet, electronic mailboxes, local files.

HTML Documents

The HTML document is the main resource of a Web page that contains hyperlinks, which contains the text and structure, which provides links and multimedia resources. An HTML document contains only text: the text consulted the text in HTML more than any other scripting or style.

The presentation of HTML documents is the main feature of a Web browser. HTML browser leaves the task to exploit the best capabilities of the computer to display resources. Typically, the font, the long lines of text, colors, etc., must be adapted to the output device (monitor, printer, etc.).

Multimedia Elements

Multimedia elements still come from resources independent of the HTML document. HTML documents contain links that point to multimedia resources, which may be scattered across the Internet. Multimedia elements are linked automatically transferred to present a Web page.

Only use images and animations smaller is standardized. The support of audio, video, three-dimensional space or other multimedia elements is still based on standardized technologies. Many browsers offer the possibility of transplanting software (plug-in or plugin) to extend their functionality, including support for media types nonstandard.

Flows (audio, video) require a communication protocol operating at different HTTP. This is one reason why this type of resource often requires a plug-in and is poorly integrated into Web pages.

Images In Web Pages

This chapter deals with the images embedded in Web pages.

The use of JPEG data format is suitable for natural images, mainly photographs.

The use of data format PNG is suitable for synthetic images (logos, graphics). It is also suitable for natural images, but only when quality is more fully on the transfer time.

The use of GIF data format is suitable for small animations. For synthetic images, the former popularity of the GIF is often preferred to PNG. However, GIF suffers from some disadvantages, including limiting the number of colors and degrees of compression generally lower. Further controversy has surrounded the use of GIF from 1994 to 2004 because Unisys has asserted a patent covering the method of compression.

Scripts

A scripting language allows writing the text of a program run directly by software. Through the Web, a script is executed by a Web browser program and actions responding to the use that the visitor makes the Web page being viewed. A script can be integrated into the HTML document or from a linked resource. The first scripting language JavaScript Web was developed by Netscape. Then Microsoft developed a competing variant known as JScript. Finally, the ECMAScript standard has been proposed for the syntax, and DOM standards for interfacing with the documents.

Styles

The CSS language was developed to manage the detailed presentation of HTML documents. The CSS language text can be integrated into the HTML document or from resources linked style sheets. This separation allows separate management of information (contained in HTML documents) and presentation (contained in Cascading Style Sheets). It also speaks of “separation of content and form.”

Continued…

Related Posts:



Online 24X7 Chat Support
 
 
Telephone
Toll Free
Online chat
 
Online 24X7 Email Support
 
Emails
 
 
 
Support
Support email
sales
Sales email
 
Billing
Billing email
 
   
Latest Tutorials & Articles (Updated Daily)
http://blog.eukhost.com
  Forums :
http://www.eukhost.com/forums/