Setting up HttpArchive private instance

Create mysql instance for httparchive using the following schema

Download the source code from

  • http://code.google.com/p/httparchive/source/checkout
  • This source code contains the two components
    1. UI components : Files under checked out home directory. Responsible for showing the HttpArchive WebApplication.
    2. Batch processing components : Files under bulktest folder under home directory. Responsible for submitting the URLs to WPT and get the details like HAR, page load time, getting other stats and trends.

Move the checked out directory into webserver folder

Customize the code for your instance

make the changes into following files in the checked out code

File : settings.inc

  • Update WebPageTest(WPT) Instance URL. If you have a private instance set this to private URL else set to webpagetest.org
  • Update Mysql DB Name, username, password
  • Update HttpArchive WebApp URLs of local instance
  • Update HttpArchive filesystem path
  • Update PrivateInstance  to True

Optional : If you have a private instance of WebPageTest.org, you can change the default value of IE8 to your own location preference for WPT private instance in all the files in the source code.

File : bulktest/bootstrap.inc

  • Update Locations(in lieu of IE8) and WPT Key

Once the setup is complete. You can see the application running in the following URL

HomePage : http://localhost/httparchive/index.php

Adding URL Page : http://localhost/httparchive/addsite.php

Approving the URL for crawling : http://localhost/httparchive/admin.php

The crawling of the URL happens in the background as the batch process through the following scripts

  • run “php batch_start.php” to kick off a new batch testing.
  • run “php batch_process.php” repeatedly to perform a single batch testing.
  • run “php statscompute.php” to generate the stats after the batch_process is complet

Basic documentation for batch process available in the code Readme : http://code.google.com/p/httparchive/source/browse/trunk/bulktest/README.txt

I have setup the following crons jobs on my instance

  1. Run batch_start.php every 00:00 hrs
    • crontab 0 0 * * * user httparchive/bulktest/batch_start.php > /home/user/logs/yhttparchive/batch_start_out_err.txt 2>&1
  2. Run batch_process.php every hour except 00:00hrs
    • crontab 0 1,2,3,4,5,6,7,8,9,10,11,12 * * * user httparchive/bulktest/batch_process.php > /home/user/logs/yhttparchive/batch_process_out_err.txt 2>&1
  3. run php statscompute.php to populate the stats table
    • crontab 0 14 * * * user httparchive/bulktest/statscompute.php > /home/user/logs/yhttparchive/stats_compute_out_err.txt 2>&1

Database Tables Explaination:

  • urlschange : http://localhost/httparchive/addsite.php –> Newly added URLs go into ‘urlschange’ table
  • urls : http://localhost/httparchive/admin.php –> Admin will approve these newly-added-urls and then these get into ‘urls’ table
  • status : batch_start.php will create the entries into ‘status’ table by reading from ‘urls’ table
  • pages : batch_process.php will create the entries into ‘pages’ table with data pertaining to pageload time, pagespeed rank, total requests etc.. This table data is used to display in ‘viewsite.php’
  • requests : batch_process.php will create the entries into this table with data pertaining to all the requests generated for loading the page.(like HAR content).
  • stats : statscompute.php will create the entries into this table. This table data is used in ‘trends.php’ page
  • There are other tables which precede with mobile. These are for mobile URLs
Advertisements

3 Responses to Setting up HttpArchive private instance

  1. alfie says:

    any possible to share your private instance’s source code?I found that it is hard to get it run smoothly.

    • sundergs says:

      I haven’t forked the code, I have made the changes manually, which I listed in the blog. I am still figuring out the few last integrations(like showing the webpagetest video frames, integrating harviewer). Let me know where you have the difficulty, I will detail the article including those as well.

  2. Pingback: Setup your own HTTP Archive to track and query your site trends | Be better and faster

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: