hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
hawk enterprises portfolio
 
counter
Select Country/Language FrancaisDeutschEspanolItalianoPortugeseJapaneseKoreanChineseArabicRussianEnglish
Current Projects
portland paranormal.com
xxk search
battlenow
bighawk casino
Hawk Enterprises News

Cheetah Search v.9 to be released in may

Tags: , by Hawk on 05.03.08 9:59 pm

Cheetah Search formerly xxk search is to be the premier PHP/Mysql driven website. I Hawk Roberts of Hawk Enterprises is spending tremendous amounts of time redeveloping xxk into this faster hybrid of Cheetah Search.

How are we making it faster?

Currently our shared hosting provider is very stringent on CPU usage, no more than 1% sustained in 24 hour period. Instead of using the shared hosting for the fetching of links, downloading of pages, and parsing of keywords we bought several high speed lines here in Portland, Oregon and around the United States.

For these lines are connected to several smaller computers that none of them are over 2ghz in processing power. However combined these well contribute a great mass amount of computing power.

A more in depth view of a typical datacenter

In a typical data we have 8mbps down (this is all we care about) pipeline, hooked to three or more servers. These servers aren’t much more than 300mhz - 1ghz machines but they all carry 250GB hard drives and are maxed on ram, which for some is only 384mb.

These machines are nothing more than debian console only boxes. I’ve installed php5/mysql4/apache2 on all the boxes along with a few utilities that I like to use. Then every box gets loaded with the latest mechanized cheetah warrior bot.

What is this mechanized cheetah warrior bot?

Here at Hawk Enterprises we have been having lots of fun building search engines and crawler bots to roam the internet. Our Cheetah warrior is just the latest in a line of great development search crawlers. How this particular one works is a series of steps.

  1. Request Alternate Instructions from Alpha Cheetah server (discuss this in a moment)
  2. Retrieve List of URLs to Process
  3. Process URL ‘http://xyz.com/file.ext’
    1. Connect to URL
    2. Store Header Information, Meta-Page Information
    3. Determine Proper execution as HTML, Image, Script, etc.
    4. Parse out links and other media
    5. Send Links and Other Media to Alpha Cheetah
    6. Analyze Content for Keywords and Phrases
    7. Check Database for existing page links to URL and adjust rank accordingly
    8. Store Content, Links, Media, Rank
    9. Generate HTML page and all associated Pages
    10. Push to staging server
  4. If more URLs to Process Continue Otherwise
  5. Request more URLS from Alpha Cheetah Server

This is the basic process of the Cheetah Warrior Bot. The idea is the bot can run on it’s own pretty much autonomously. If at any point the Warrior Bot fails to have things to do the Alpha server will always have a huge list of processed, not processed URLs for it.

I also recognized the potential power that this yields for any type of distributed task I might need so I put in an alternative instruction directive. This basically allows me to place a file on the Alpha box, and all the other boxes will periodically check it and pull down new information and assimilate it. It uses a similar structure like word press with it’s hooks. However unlike wordpress it will update itself.

Faster still.

Having all the server process the pages and build them offsite and upload them to a seperate master server will improve speed dramatically. Instead of having to ask the database for a search all the pages will be pre-generated.

No Comments yet »

RSS feed for comments on this post. TrackBack URI

Leave a comment

You must be logged in to post a comment.