Scripts description

These scripts allow you to monitor additional informations than the usual hits and pages statistics. The first three scripts will scan their own logfiles (or the standard one if you're using an extended logfile format).

You'll be able to watch out :

  • which browsers are used when people visits your Web.
  • where are they coming from and where do they access your Web.
  • which words people are using in search engine to reach your pages.
  • deleted pages people are asking for or bad links in your HTML pages.

Two optional scripts will allow you to understand HOW people are visiting your Web and HOW to optimize your HTML pages.


Cron-agent.pl
Purpose Agent log stats
Compute browsers/OS stats.
Frequency None. You can run the script when you want. I run it once every week or sometimes daily.
Time taken From a few seconds to one or two minutes depending of the size of the log file.
Options
-a show all browser
-b re-initialize everything
-c <file> load configuration file
-d <number> number of days to scan (Extended NCSA logfile only)
-g <graphics> different graphic option available
-l <language> select language output
-i input logfile
-f scan only HTML files (Extended NCSA logfile only)
-x display default values for flag options
-t <toplist> display only toplist browsers
-z use compressed logfiles
-v display version
How it works It scan the logfile to extract the most commonly used browser and operating system.
Notes Graphic is only produced with a combined logfile (Extended Common Logfile or Extended IIS format). It show you browsers version versus time. For each browser, you can watch percentage of each version.


Cron-refer.pl
Purpose Referer log stats
Compute the pages where come most people.
It could be very useful to know which sites have a link to your web site.
Frequency None. You can run the script when you want. I run it once every week.
Time taken From a few seconds to one or two minutes depending of the size of the log file.
Options
-b re-initialize everything
-c <file> load configuration file
-l <language> select language output
-i <file> input agent logfile
-flocal references include
-p <page> referer for this page
-t <toplist> display only toplist files
-x display default values for flag options
-z use compressed logfile
-v display version
How it works It scan the logfile to extract where come people accessing your pages. It output the most frequent sites and pages where people come from and where they arrived in your site. Also the most common words used in search engine are computed.
Notes View how your site have been referenced across the Web and how to improve it with selected words in your html code.


Cron-error.pl
Purpose Error log stats
It display the most common error from your web server.
A list of error due to files not found is produced also to check if the files are really missing.
Frequency None. You can run the script when you want. I run it once every week.
Time taken From a few seconds to one or two minutes depending of the size of the error log file.
Options
-r <tildealias> substitue ~ by the path alias
-b re-initialize everything
-c <file> load configuration file
-i <file> input error logfile
-d <number> number of days to scan
-j <date> stats for this date only
-g <graphics> select graphics output
-l <language> select language output
-f 'file does not exist', HTML files only
-q <tri> 'file does not exist', matching string only
-k 'file does not exist', show referer page
-s <threshold> display threshold for 'file not found'
-t <toplist> display only toplist most found errors
-x display default values for flag options
-z use compressed logfile
-v display version
How it works It scan the error log file to extract the most common error server. It output also the documents your server is unable to futfill.
Notes You can add in the code other error message produced by your server. But be sure, your error message you'll add is not a part of another one. Graphic is produced showing you the error versus time.
The page where come the missing file is also printed.

You don't have to wait for error happening to rectify wrong links in your pages....cron-url.pl is able to scan your documents tree and tell you about missing files in your links avoiding error log to become too big.
Windows users can use cron-error if they use redirection files.


Cron-session.pl
Purpose Session log stats
It compute how long people stay on your web by scanning the log file. Full session for each user is shown with other bonus informations.
Frequency None. You can run the script when you want.
Time taken From a few minutes to a several hours depending of the size of the log file.
Options
-a include robot session
-c <file> load configuration file
-d <number> number of days to scan
-g <graphics> select graphics output
-l <language> select language output
-i <file>input logfile
-m update only robot detection
-t <min>session maximum length
-j <min>maximum time to read a page
-q <min>display session longer than this value
-r <date>ending date for stats
-s <date>starting date for stats
-x display default values for flag options
-z use compressed logfile
How it works It's very hard to know how long people stay on your web as they can access a page, going to lunch and have a second access two hours later. But people usually have a more or less longer look at your web and only come back another day.
In the script, you have a maximum time limit session variable. If an access is made within this time, it's still the same session.
Another way is to select a time limit when reading a page. Usually, people doesn't need more than one hour to read a HTML page !
Accesses from network spider (robots) are removed (well, I try !)
Notes If you have a dynamic IP address (Internet provider for example), different people can come with the same IP address and it become very hard to know about session users !
The script will also output the average requests by hour and by the day of the week.

This script does not yet support incremental mode.


Cron-url.pl
Purpose Documents stats
Compute how your web is looking. Do you have a multimedia, graphical and heavy web ?
It will also translate the URL to the TITLE of the file and show you the most recent html files on your web.
Also a detailled server tree is output.
Frequency None. You can run the script when you want. I run it once a week.
Time taken From a few minutes to one hour depending of the size of your web.
Options
-c <file> load configuration file
-d <nbdays>show file newest than nbdays days
-g <graphics> select graphics output
-l <language> select language output
-t <topten> show only toplist files
-x show default values
-v display version
How it works It scans your web structure, counting for files, opening each file. Histograms showing how many links, images per document is produced and also graphs showing you the documents size distribution.
A histogram show the most recent file updated in your web.
A translation table is made between the URL of a document and its name (found inside the TITLE tag).
The structure (tree) of your web is also show with detail about HTML pages inside each part of the tree.
It also check every links and report missing files.
Notes Could be useful if you want to check every HTML document have a TITLE tag and is unique. Could also show you if you have heavy pages !
People could go directly to new html documents in the tree server or from the 'new documents' pages.