How Search Engine Works
To understand how a search engine works, we must first know the structure of a search engine. Search engine is a complex and sophisticated set of hardwares (computers) and softwares that are programmed to collect, store, index and finally display information of web pages from the Internet, using a set of algorithms.
The basic structure of search engine can be broken down into three main parts:
- Spider or Robot
- Indexer
- Search Query and Result Interface
Spider or Robot
Spider or Robot, also called crawler (don't get it confused with 'meta crawler'), is a software or 'agent' that run automatically to crawl the entire web looking, gathering and fetching information contained on billions of web pages. Spider crawls one web page after another by following links found on the web pages.
Due to enormously large number of web pages today, search engines usually employ multiple crawlers that work simultaneously. By using multiple crawlers, the system can download over 100 web pages per second. The downloaded web page information are then sent to a central repository which is a part of the Indexer.
Indexer
The web page information collected by crawlers are then sorted and indexed and and ranked appropriately according to their 'value' or 'importance' using algorithms developed by each search engine company. The data stored in the database computers are compressed to save the computer's disk space.
Search Query and Result Interface
This part is where we, search engine users, are using to find information from the web. Search terms or query we enter to the search box, are then sent to central database of the indexer, which in turn, retrieves data of web pages that match our query. The data are presented by its relevancy based on various aspects such as keywords matching, page title, or content description matching, etc. depending on the design of the matching / ranking algorithm of the individual search engine.
As different search engine uses different ranking system, they will produce different results upon the same search query. Fortunately, these days search engines have generally developed excellent system so that although they display different results however the relevancy are usually very good.