Website search (SearchEngine)

I have developed a search engine that is optimized for crawling and searching websites. The search engine and the source code can be downloaded from codeplex.com.

Here is a short post about how to install and use the search engine. Hopefully I will have time to describe the search engine in more details another time.

You need MS SQL Server to run the Search engine.

Installing the Indexer (Crawler)

Installing the indexer is very easy. Download the Setup Website Search and install it.

After the installation you will have this folder in your Programs:

image

If you have Visual Studio 2010 and do not want to install the program then you can download the source code and run the program from Visual Studio. The SearchDB.sql is also included in the project.

 

Creating the database

The first thing you need to do is creating the database. Open the SearchDB.sql in SQL Server Management and Execute the sql.

image

This will create the Search database.

You can change the name of the database by replacing [Search] with a new name.

 

Indexing

Now you are ready to run the indexer program. Open the Website Search program.

image

Enter the urls that you want to index in the Starting point box. You can create an index that combines two or more domains by entering one url on each line.

The crawler starts when you click the Go button and you will see the status in the bottom area of the program.

When the crawler is finish you can see the result in the database.

If you check the “Save Html”-checkbox then the indexer will also save the HTML from each page in the database.

The program will create a log file in the program folder. The program will also create temporary files in the program folder when searching pdf and Office files. Therefore you need to make sure that the current users have Edit access to the program folder: “C:\Program Files (x86)\SearchEngine\Website Search”. If the default user do not have access then you will not be able to index these file types and no log will be created.

The program will uses this default connection string “Data Source=.\SQLEXPRESS;Initial Catalog=Search;Integrated Security=True”. You can change this in the “SearchProgram.exe.config”-file in the program folder.

The program respects Metatags:

  • Description metatag. The page will get a description from this tag if the tag is found.
  • Robots metatags. Noindex and nofollow metags are respected.

 

Searching

In my next post I will show how to search using the search engine. But until then you can download the source code and see the sample website. It contains two searching samples.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: