|
These scripts allow you to monitor additional informations than the usual
hits and pages statistics. The first three scripts will scan their own
logfiles (or the standard one if you're using an extended logfile format).
You'll be able to watch out :
- which browsers are used when people visits your Web.
- where are they coming from and where do they access your Web.
- which words people are using in search engine to reach your pages.
- deleted pages people are asking for or bad links in your HTML pages.
Two optional scripts will allow you to understand
HOW people are visiting your Web and HOW to optimize your HTML pages.
- Agent, referer and error log
- Session and Documents structures
Cron-agent.pl
|
Purpose
|
Agent log stats
Compute browsers/OS stats.
|
Frequency
|
None. You can run the script when you want. I run it once every week or sometimes daily.
|
Time taken
|
From a few seconds to one or two minutes depending of the size of the log file.
|
Options
|
-a | show all browser |
-b | re-initialize everything |
-c <file> | load configuration file |
-d <number> | number of days to scan (Extended NCSA logfile only) |
-g <graphics> | different graphic option available |
-l <language> | select language output |
-i | input logfile |
-f | scan only HTML files (Extended NCSA logfile only) |
-x | display default values for flag options |
-t <toplist> | display only toplist browsers |
-z | use compressed logfiles |
-v | display version |
|
How it works
|
It scan the logfile to extract the most commonly used browser and operating system.
|
Notes
|
Graphic is only produced with a combined logfile (Extended Common Logfile or Extended IIS format). It
show you browsers version versus time. For each browser, you can watch
percentage of each version. |
Cron-refer.pl
|
Purpose
|
Referer log stats
Compute the pages where come most people.
It could be very useful to know which sites have a link to your web site.
|
Frequency
|
None. You can run the script when you want. I run it once every week.
|
Time taken
|
From a few seconds to one or two minutes depending of the size of the log file.
|
Options
|
-b | re-initialize everything |
-c <file> | load configuration file |
-l <language> | select language output |
-i <file> | input agent logfile |
-f | local references include |
-p <page> | referer for this page |
-t <toplist> | display only toplist files |
-x | display default values for flag options |
-z | use compressed logfile |
-v | display version |
|
How it works
|
It scan the logfile to extract where come people
accessing your pages. It output the most frequent sites and pages
where people come from and where they arrived in your site. Also the most
common words used in search engine are computed.
|
Notes
|
View how your site have been referenced across the Web and how to improve it
with selected words in your html code.
|
Cron-error.pl
|
Purpose
|
Error log stats
It display the most common error from your web server.
A list of error due to files not found is produced also to check
if the files are really missing.
|
Frequency
|
None. You can run the script when you want. I run it once every week.
|
Time taken
|
From a few seconds to one or two minutes depending of the size of the error log file.
|
Options
|
-r <tildealias> | substitue ~ by the path alias |
-b | re-initialize everything |
-c <file> | load configuration file |
-i <file> | input error logfile |
-d <number> | number of days to scan |
-j <date> | stats for this date only |
-g <graphics> | select graphics output |
-l <language> | select language output |
-f | 'file does not exist', HTML files only |
-q <tri> | 'file does not exist', matching string only |
-k | 'file does not exist', show referer page |
-s <threshold> | display threshold for 'file not found' |
-t <toplist> | display only toplist most found errors |
-x | display default values for flag options |
-z | use compressed logfile |
-v | display version |
|
How it works
|
It scan the error log file to extract the most common error server.
It output also the documents your server is unable to futfill.
|
Notes
|
You can add in the code other error message produced by your server.
But be sure, your error message you'll add is not a part of another one.
Graphic is produced showing you the error versus time.
The page where come the missing file is also printed.
You don't have to wait for error happening to rectify wrong links in your
pages....cron-url.pl is able to scan your documents tree and tell you about
missing files in your links avoiding error log to become too big.
Windows users can use cron-error if they use redirection files.
|
Cron-session.pl
|
Purpose
|
Session log stats
It compute how long people stay on your web by scanning the
log file. Full session for each user is shown with other bonus
informations.
|
Frequency
|
None. You can run the script when you want.
|
Time taken
|
From a few minutes to a several hours depending of the size of the
log file.
|
Options
|
-a | include robot session |
-c <file> | load configuration file |
-d <number> | number of days to scan |
-g <graphics> | select graphics output |
-l <language> | select language output |
-i <file> | input logfile |
-m | update only robot detection |
-t <min> | session maximum length |
-j <min> | maximum time to read a page |
-q <min> | display session longer than this value |
-r <date> | ending date for stats |
-s <date> | starting date for stats |
-x | display default values for flag options |
-z | use compressed logfile |
|
How it works
|
It's very hard to know how long people stay on your web as they
can access a page, going to lunch and have a second access two
hours later. But people usually have a more or less longer look
at your web and only come back another day.
In the script, you have a maximum time limit session variable.
If an access is made within this time, it's still the same session.
Another way is to select a time limit when reading a page. Usually,
people doesn't need more than one hour to read a HTML page !
Accesses from network spider (robots) are removed (well, I try !)
|
Notes
|
If you have a dynamic IP address (Internet provider for example),
different people can come with the same IP address and it become
very hard to know about session users !
The script will also output the average requests by hour and by the
day of the week.
This script does not yet support incremental mode.
|
Cron-url.pl
|
Purpose
|
Documents stats
Compute how your web is looking. Do you have a multimedia,
graphical and heavy web ?
It will also translate the URL to the TITLE of the file and show
you the most recent html files on your web.
Also a detailled server tree is output.
|
Frequency
|
None. You can run the script when you want. I run it once a week.
|
Time taken
|
From a few minutes to one hour depending of the size of your web.
|
Options
|
-c <file> | load configuration file |
-d <nbdays> | show file newest than nbdays days |
-g <graphics> | select graphics output |
-l <language> | select language output |
-t <topten> | show only toplist files |
-x | show default values |
-v | display version |
|
How it works
|
It scans your web structure, counting for files, opening each
file. Histograms showing how many links, images per document is
produced and also graphs showing you the documents size
distribution.
A histogram show the most recent file updated in your web.
A translation table is made between the URL of a document and
its name (found inside the TITLE tag).
The structure (tree) of your web is also show with detail about
HTML pages inside each part of the tree.
It also check every links and report missing files.
|
Notes
|
Could be useful if you want to check every HTML document have
a TITLE tag and is unique.
Could also show you if you have heavy pages !
People could go directly to new html documents in the tree server or from
the 'new documents' pages.
|
|