Often when building a web application, it’s necessary to add search to the site. A variety of tools exist to help make it easier, my personal favorite is Sphider – a PHP-based search engine, however the basic principles in each approach are the same.
Most search engines are divided into two primary functions – indexing and searching. Indexing is where the search engine scans the website and organizes the content into an easy-to-find format, called an index. Searching is where the engine then quickly pulls the proper results from its index.
Thanks to Tim Berners-Lee’s arachnophilia, the computer that scans the “world wide web” is called a “spider,” or “web crawler.” The spider’s job is to go through every page on the Internet, and index it into a database. Most search engines, like Google, have entire warehouses full of computers, whose only job is to download each page on the Internet and run it through its database. Many websites are downloaded on a weekly basis.
Indexing a page in the database is similar to using the old-fashioned “card catalog” at the library. The web crawler takes every word on the page, and then records where each word can be found. For instance, if the word “crocodile” is half-way down the web page, the computer adds the word “crocodile” to its index, and lists all the places where that word exists.
The index then comes into play during search. When the user writes a search query, the computer goes through its index and finds all the pages that contain those words. For example, when the user searches for the word “crocodile,” the computer goes into the index, and receives a list of all the pages that have that word. In our previous example, the computer sees that “crocodile” was half-way down a particular page of the site.
Once the search engine gets a list of all the pages with the word “crocodile,” it then ranks those pages. The search engine’s job is to make sure that the most relevant pages, the pages with the best content, are listed first. This is how Google became popular, since it initially had the best search engine results based on their PageRank algorithm.
There are many options for customizing search engines in custom software development chicago. For instance, common words can be excluded from the index, different ranking algorithms can provider better results, and smart search engines can make intelligent decisions about what to show in each result description. Both Bing and Google are now trying to become even more intelligent by helping their users take actions, like finding airline tickets and solving math problems. Nevertheless, regardless of how advanced search engines become, the basic components that drive the search engine, indexing and search, remain the same.
Written by Andrew Palczewski
About the Author
Andrew Palczewski is CEO of apHarmony, a Chicago software development company. He holds a Master's degree in Computer Engineering from the University of Illinois at Urbana-Champaign and has over ten years' experience in managing development of software projects.