Configuration
pwebstats creates a collection of html files and images in a group of directories under the output directory specified below. At present, pwebstats produces output statistics over daily, weekly or monthly periods. Note: the input logfile(s) must be split into separate daily, weekly or monthly files. A utility (log-splitter.pl) has been included to assist you in doing so.
- Edit the configuration file (./conf/pwebstats.conf) to reflect the location of the pwebstats distribution directory on your system, the location of the output directory, and your site-specific details.
- Some config details can also be given on the command line. Type ./pwebstats for a full list of options.
- Type the following command in the distribution directory to run pwebstats:
./pwebstats -c conf/pwebstats.conf
The output will go in the directory specified in the config file or on the command line. - If you want page-specific stats generated, have a look at the file ./conf/pwebstats.pages which is the configuration file for the page-based part of pwebstats. The format is a series of colon-separated directives in the format:
PERL pattern for html collection (or just path to html file):
Description of file or collection:Level of Indentation (for subsections):
URL to the page itself (to create an active link)Some examples are available.
I’ve made copy of our local page-config file and a copy of the pwebstats output for that page, available so you can get an idea of what can be done with page-based stats.
See http://www.its.unimelb.edu.au/manuals/perl5/perlre.html for details on perl regular expressions.
The Configuration File
The configuration file (./conf/pwebstats.conf) controls the setup information needed for pwebstats to run, and the user-settable limits and variables.
Lines starting with a # are comments and are ignored, as are blank lines.
All other lines are of the form variable:setting (the colon is necessary).
Use full pathnames where pathnames are to be specified (no trailing ‘/’).
Config Variables
- server
- Unique nickname for server – use only a-z, A-Z and ‘_’.
- Server_header
- Header for index page.
- logfile
- Location of log file (full pathname).
- logtype
- Type of logfile.
Acceptable values are: common (Common Log Format), squid, squid-emulated, ncsa-extended, and netscape-proxy. Defaults to common. - outdir
- Directory location for the output of pwebstats (full pathname).
- templates
- directory containing GIF templates (full pathname).
- interval
- Stats collection interval – can be daily, weekly, monthly, quarterly.
- verbose
- Verbose output – progress bar and other details when pwebstats is running (any value = on).
- fly_prog
- Location of ‘fly’ program (full pathname).
- page_config
- Location of page-based stats config file (full pathname).
- host_threshold
- Threshold for inclusion in all hosts list (default = 25).
- item_threshold
- Threshold for inclusion in all requests list (default = 25).
- domain_threshold
- Threshold for inclusion in all domains list (default = 5).
- protocol_threshold
- Threshold for inclusion in all protocols list (default = 25).
- local_patt
- Regular expression for local domain
e.g.: local_patt:\.unimelb\.edu\.au$|^128\.250|\.mu\.oz\.au$ - exclude
- regexp of items to exclude from display in request stats (but are still counted in totals)
- complete_exclude_host
- completely ignore access from this set of hostnames ( | is the delimeter)
e.g. complete_exclude_host:foo1.users.bar.com|foo2.users.bar.com|foo3.users.bar.com - complete_exclude_url_patt
- completely ignore access to this pattern of URLs
e.g. complete_exclude_url_patt:^/foo/bar/*$|^/robots.txt$ - complete_exclude_user
- completely ignore access from this set of users ( | is the delimeter)
e.g. complete_exclude_user:tom|dick|harry - dns_lookup
- Convert IP numbers in the hostname field to fully-qualified domain names (any value = on).
An example config file.
Additionally, in a configuration file for a proxy server, the following directives are applicable:
- remote_host_threshold
- Threshold for inclusion in all remote hosts list (default = 25)
- exclude_reqs
- Exclude requests/accesses array – saves time and a lot of memory! (any value = on)
Auxiliary programs
The following extra programs and scripts are included in the pwebstats distribution, in the utilities directory.
- log-splitter.pl
- This will split an existing log file into weekly or monthly files for input to pwebstats. Type ./log-splitter.pl for usage information.
- rotatelogs.sh
- Handy utility for rolling over logfiles, restarting the server and general cleaning-up.
- ns-proxy-splitter.pl
- This will split a Netscape Proxy extended log file into CERN-style proxy and cache logs.
- run-up.sh
- Simple shell script to feed all your old weekly/monthly logs into pwebstats. If you just have one big logfile, run it through log-splitter.pl first.