Intricacies of Web Search and Inadequacies of Present Search Technologies - A Scientific Study

K. Satya Sai Prakash
Network Systems Laboratory
Department of Computer Science and Engineering
Indian Institute of Technology Madras, Chennai, India


Search engine is a tool de-facto to obtain any kind of information on the web. Every Internet savvy user must have used at some time or the other. Motivating factors for this tutorial are:
- Relative difficulty in applying advanced operators in the basic search
- Lack of awareness in the common web searcher on "how to get what he/she wants by giving what n,he/she knows"
- Educate the user about Web search internals
- To explore the difficulties and possibilities of shifting key-word centric search to multi-media centric n,search and
- Analyze Cost of Service (CoS) and Quality of Service (QoS) of present crop of search engines.

From user perspective, given a keyword or a phrase, getting the relevant results in the order of milliseconds is an astounding feat. On submission of user request it is assumed that search engines "search the web for the required, relevant information" and respond to the user by giving a summary sheet, ordered on some criterion (C). Google certainly has taken the web community by awe and inspiration. Evolution of Google and other present generation search engines still have a long road to tread to satisfy all the needs of a web user. Some of them are:
- How precise and small can the result set can be?
- How to search the multimedia content?
- How intuitive and user friendly is search engine interface?

Search engine perspective illuminates the underlying web indexing process that has some concurrent/non-concurrent stages. Crawling and Indexing are vital phases of search engines. Typically present search engines have to crawl billions of web pages and index them on virtually every possible key word. Ranking the pages and yield them to the user is another challenge. It is also amazing to notice that search engines are adequately coping with growing web, changing web and increasing user base. On the same keel, thronging questions are:
- How search engines cope with growing web?
- What is the crawl periodicity or re-indexing period?
- What are the load-balancing strategies adopted to respond to the web searcher?
- Estimate the storage, communication and computation cost
- What are the Return on Investment (ROI) strategies?

After addressing these basic issues, tutorial explores some pertinent issues like profiling which has caused the "personalization - privacy" dilemma in e-Commerce environment, indexing various data formats, research issues in multimedia content analysis etc.

Overall this tutorial gives insight into
a) Intricacies in present web search
b) Internals of present search technology and
c) Inadequacies of present technology

And also brings some of the pertinent research issues like
a) Freshness/Recency Maintenance with growing web
b) Relevancy on per user basis
to the audience notice.

Organization and Structure
- Introduction
- Search Engine - System Perspective
n,o Crawling and Crawl Strategies
n,o Page Ranking and Subject Specific Ranking
n,o Indexing and Retrieval
- Search Engine - User Perspective
- Intricacies
n,o Basic Search
n,o Advanced Search
n,o Meta Search
n,o Profiling (Personalization - Privacy dilemma)
n,o Multimedia Search
- Present Search Technologies
n,o Computation, communication and Storage Requirements
n,o Hardware and Software Internals
n,o Protocols and formats
- Inadequacies and Amendments
n,o Coping with dynamic and growing Web
n,o Spider Menace
n,o Bandwidth Considerations
n,o Clustering and Classification
n,o Multimedia Content Analysis
n,o CoS and QoS Analysis
- Conclusion

Mr. Sai Prakash is a doctoral student from Indian Institute of Technology Madras, Chennai, India. His research area is "Search Engine Technologies".

He has obtained his Master of Science in Mathematics and Computer Science in 1995 from Sri Sathya Sai Institute of Higher Learning, Prasanthinilayam, India. After that he went on to complete Master of Technology in Computer Science from the same institute. He also obtained Master of Business Administration (Specialization in Finance) from Indira Gandhi National Open Univeristy, India.

After completing masters and before joining for doctoral program, he worked in academia and industry for a couple of years.

His academic and research interests include Knowledge based Systems, Intelligent Networks, Web Technologies, e-Commerce, Mobile & Distributed Computing.

He is a member of IEEE and ACM since 2000, a life member of CSI (Computer Society of India) and ISCA (Indian, Science Congress Association) and a member of IADIS (International Association for the Development of Information Society).

