psyclpc/src/pcre
PSYC 2ede0de60e initial git creation 2009-03-03 04:27:01 +01:00
..
AUTHORS initial git creation 2009-03-03 04:27:01 +01:00
COPYING initial git creation 2009-03-03 04:27:01 +01:00
ChangeLog initial git creation 2009-03-03 04:27:01 +01:00
LICENCE initial git creation 2009-03-03 04:27:01 +01:00
NEWS initial git creation 2009-03-03 04:27:01 +01:00
README initial git creation 2009-03-03 04:27:01 +01:00
README.LDMUD initial git creation 2009-03-03 04:27:01 +01:00
config.h initial git creation 2009-03-03 04:27:01 +01:00
dftables.c initial git creation 2009-03-03 04:27:01 +01:00
get.c initial git creation 2009-03-03 04:27:01 +01:00
internal.h initial git creation 2009-03-03 04:27:01 +01:00
maketables.c initial git creation 2009-03-03 04:27:01 +01:00
pcre.c initial git creation 2009-03-03 04:27:01 +01:00
pcre.h initial git creation 2009-03-03 04:27:01 +01:00
pcreposix.c initial git creation 2009-03-03 04:27:01 +01:00
pcreposix.h initial git creation 2009-03-03 04:27:01 +01:00
study.c initial git creation 2009-03-03 04:27:01 +01:00

README

README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------

The latest release of PCRE is always available from

  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz

Please read the NEWS file if you are upgrading from a previous release.

PCRE has its own native API, but a set of "wrapper" functions that are based on
the POSIX API are also supplied in the library libpcreposix. Note that this
just provides a POSIX calling interface to PCRE: the regular expressions
themselves still follow Perl syntax and semantics. The header file
for the POSIX-style functions is called pcreposix.h. The official POSIX name is
regex.h, but I didn't want to risk possible problems with existing files of
that name by distributing it that way. To use it with an existing program that
uses the POSIX API, it will have to be renamed or pointed at by a link.

If you are using the POSIX interface to PCRE and there is already a POSIX regex
library installed on your system, you must take care when linking programs to
ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
up the "real" POSIX functions of the same name.


Contributions by users of PCRE
------------------------------

You can find contributions from PCRE users in the directory

  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib

where there is also a README file giving brief descriptions of what they are.
Several of them provide support for compiling PCRE on various flavours of
Windows systems (I myself do not use Windows). Some are complete in themselves;
others are pointers to URLs containing relevant files.


Building PCRE on a Unix-like system
-----------------------------------

To build PCRE on a Unix-like system, first run the "configure" command from the
PCRE distribution directory, with your current directory set to the directory
where you want the files to be created. This command is a standard GNU
"autoconf" configuration script, for which generic instructions are supplied in
INSTALL.

Most commonly, people build PCRE within its own distribution directory, and in
this case, on many systems, just running "./configure" is sufficient, but the
usual methods of changing standard defaults are available. For example,

CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local

specifies that the C compiler should be run with the flags '-O2 -Wall' instead
of the default, and that "make install" should install PCRE under /opt/local
instead of the default /usr/local.

If you want to build in a different directory, just run "configure" with that
directory as current. For example, suppose you have unpacked the PCRE source
into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:

cd /build/pcre/pcre-xxx
/source/pcre/pcre-xxx/configure

There are some optional features that can be included or omitted from the PCRE
library. You can read more about them in the pcrebuild man page.

. If you want to make use of the support for UTF-8 character strings in PCRE,
  you must add --enable-utf8 to the "configure" command. Without it, the code
  for handling UTF-8 is not included in the library. (Even when included, it
  still has to be enabled by an option at run time.)

. You can build PCRE to recognized CR or NL as the newline character, instead
  of whatever your compiler uses for "\n", by adding --newline-is-cr or
  --newline-is-nl to the "configure" command, respectively. Only do this if you
  really understand what you are doing. On traditional Unix-like systems, the
  newline character is NL.

. When called via the POSIX interface, PCRE uses malloc() to get additional
  storage for processing capturing parentheses if there are more than 10 of
  them. You can increase this threshold by setting, for example,

  --with-posix-malloc-threshold=20

  on the "configure" command.

. PCRE has a counter which can be set to limit the amount of resources it uses.
  If the limit is exceeded during a match, the match fails. The default is ten
  million. You can change the default by setting, for example,

  --with-match-limit=500000

  on the "configure" command. This is just the default; individual calls to
  pcre_exec() can supply their own value. There is discussion on the pcreapi
  man page.

. The default maximum compiled pattern size is around 64K. You can increase
  this by adding --with-link-size=3 to the "configure" command. You can
  increase it even more by setting --with-link-size=4, but this is unlikely
  ever to be necessary. If you build PCRE with an increased link size, test 2
  (and 5 if you are using UTF-8) will fail. Part of the output of these tests
  is a representation of the compiled pattern, and this changes with the link
  size.

. You can build PCRE so that its match() function does not call itself
  recursively. Instead, it uses blocks of data from the heap via special
  functions pcre_stack_malloc() and pcre_stack_free() to save data that would
  otherwise be saved on the stack. To build PCRE like this, use

  --disable-stack-for-recursion

  on the "configure" command. PCRE runs more slowly in this mode, but it may be
  necessary in environments with limited stack sizes.

The "configure" script builds five files:

. libtool is a script that builds shared and/or static libraries
. Makefile is built by copying Makefile.in and making substitutions.
. config.h is built by copying config.in and making substitutions.
. pcre-config is built by copying pcre-config.in and making substitutions.
. RunTest is a script for running tests

Once "configure" has run, you can run "make". It builds two libraries called
libpcre and libpcreposix, a test program called pcretest, and the pcregrep
command. You can use "make install" to copy these, the public header files
pcre.h and pcreposix.h, and the man pages to appropriate live directories on
your system, in the normal way.

Running "make install" also installs the command pcre-config, which can be used
to recall information about the PCRE configuration and installation. For
example,

  pcre-config --version

prints the version number, and

 pcre-config --libs

outputs information about where the library is installed. This command can be
included in makefiles for programs that use PCRE, saving the programmer from
having to remember too many details.


Shared libraries on Unix-like systems
-------------------------------------

The default distribution builds PCRE as two shared libraries and two static
libraries, as long as the operating system supports shared libraries. Shared
library support relies on the "libtool" script which is built as part of the
"configure" process.

The libtool script is used to compile and link both shared and static
libraries. They are placed in a subdirectory called .libs when they are newly
built. The programs pcretest and pcregrep are built to use these uninstalled
libraries (by means of wrapper scripts in the case of shared libraries). When
you use "make install" to install shared libraries, pcregrep and pcretest are
automatically re-built to use the newly installed shared libraries before being
installed themselves. However, the versions left in the source directory still
use the uninstalled libraries.

To build PCRE using static libraries only you must use --disable-shared when
configuring it. For example

./configure --prefix=/usr/gnu --disable-shared

Then run "make" in the usual way. Similarly, you can use --disable-static to
build only shared libraries.


Cross-compiling on a Unix-like system
-------------------------------------

You can specify CC and CFLAGS in the normal way to the "configure" command, in
order to cross-compile PCRE for some other host. However, during the building
process, the dftables.c source file is compiled *and run* on the local host, in
order to generate the default character tables (the chartables.c file). It
therefore needs to be compiled with the local compiler, not the cross compiler.
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD)
when calling the "configure" command. If they are not specified, they default
to the values of CC and CFLAGS.


Building on non-Unix systems
----------------------------

For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
the system supports the use of "configure" and "make" you may be able to build
PCRE in the same way as for Unix systems.

PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
the details because I don't use those systems. It should be straightforward to
build PCRE on any system that has a Standard C compiler, because it uses only
Standard C functions.


Testing PCRE
------------

To test PCRE on a Unix system, run the RunTest script that is created by the
configuring process. (This can also be run by "make runtest", "make check", or
"make test".) For other systems, see the instructions in NON-UNIX-USE.

The script runs the pcretest test program (which is documented in its own man
page) on each of the testinput files (in the testdata directory) in turn,
and compares the output with the contents of the corresponding testoutput file.
A file called testtry is used to hold the output from pcretest. To run pcretest
on just one of the test files, give its number as an argument to RunTest, for
example:

  RunTest 2

The first file can also be fed directly into the perltest script to check that
Perl gives the same results. The only difference you should see is in the first
few lines, where the Perl version is given instead of the PCRE version.

The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
detection, and run-time flags that are specific to PCRE, as well as the POSIX
wrapper API. It also uses the debugging flag to check some of the internals of
pcre_compile().

If you build PCRE with a locale setting that is not the standard C locale, the
character tables may be different (see next paragraph). In some cases, this may
cause failures in the second set of tests. For example, in a locale where the
isprint() function yields TRUE for characters in the range 128-255, the use of
[:isascii:] inside a character class defines a different set of characters, and
this shows up in this test as a difference in the compiled code, which is being
listed for checking. Where the comparison test output contains [\x00-\x7f] the
test will contain [\x00-\xff], and similarly in some other cases. This is not a
bug in PCRE.

The third set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
default tables. The tests make use of the "fr_FR" (French) locale. Before
running the test, the script checks for the presence of this locale by running
the "locale" command. If that command fails, or if it doesn't include "fr_FR"
in the list of available locales, the third test cannot be run, and a comment
is output to say why. If running this test produces instances of the error

  ** Failed to set locale "fr_FR"

in the comparison output, it means that locale is not available on your system,
despite being listed by "locale". This does not mean that PCRE is broken.

The fourth test checks the UTF-8 support. It is not run automatically unless
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
running "configure". This file can be also fed directly to the perltest script,
provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
commented in the script, can be be used.)

The fifth and final file tests error handling with UTF-8 encoding, and internal
UTF-8 features of PCRE that are not relevant to Perl.


Character tables
----------------

PCRE uses four tables for manipulating and identifying characters. The final
argument of the pcre_compile() function is a pointer to a block of memory
containing the concatenated tables. A call to pcre_maketables() can be used to
generate a set of tables in the current locale. If the final argument for
pcre_compile() is passed as NULL, a set of default tables that is built into
the binary is used.

The source file called chartables.c contains the default set of tables. This is
not supplied in the distribution, but is built by the program dftables
(compiled from dftables.c), which uses the ANSI C character handling functions
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
sources. This means that the default C locale which is set for your system will
control the contents of these default tables. You can change the default tables
by editing chartables.c and then re-building PCRE. If you do this, you should
probably also edit Makefile to ensure that the file doesn't ever get
re-generated.

The first two 256-byte tables provide lower casing and case flipping functions,
respectively. The next table consists of three 32-byte bit maps which identify
digits, "word" characters, and white space, respectively. These are used when
building 32-byte bit maps that represent character classes.

The final 256-byte table has bits indicating various character types, as
follows:

    1   white space character
    2   letter
    4   decimal digit
    8   hexadecimal digit
   16   alphanumeric or '_'
  128   regular expression metacharacter or binary zero

You should not alter the set of characters that contain the 128 bit, as that
will cause PCRE to malfunction.


Manifest
--------

The distribution should contain the following files:

(A) The actual source files of the PCRE library functions and their
    headers:

  dftables.c            auxiliary program for building chartables.c
  get.c                 )
  maketables.c          )
  study.c               ) source of
  pcre.c                )   the functions
  pcreposix.c           )
  printint.c            )
  pcre.in               "source" for the header for the external API; pcre.h
                          is built from this by "configure"
  pcreposix.h           header for the external POSIX wrapper API
  internal.h            header for internal use
  config.in             template for config.h, which is built by configure

(B) Auxiliary files:

  AUTHORS               information about the author of PCRE
  ChangeLog             log of changes to the code
  INSTALL               generic installation instructions
  LICENCE               conditions for the use of PCRE
  COPYING               the same, using GNU's standard name
  Makefile.in           template for Unix Makefile, which is built by configure
  NEWS                  important changes in this release
  NON-UNIX-USE          notes on building PCRE on non-Unix systems
  README                this file
  RunTest.in            template for a Unix shell script for running tests
  config.guess          ) files used by libtool,
  config.sub            )   used only when building a shared library
  configure             a configuring shell script (built by autoconf)
  configure.in          the autoconf input used to build configure
  doc/Tech.Notes        notes on the encoding
  doc/*.3               man page sources for the PCRE functions
  doc/*.1               man page sources for pcregrep and pcretest
  doc/html/*            HTML documentation
  doc/pcre.txt          plain text version of the man pages
  doc/pcretest.txt      plain text documentation of test program
  doc/perltest.txt      plain text documentation of Perl test program
  install-sh            a shell script for installing files
  ltmain.sh             file used to build a libtool script
  pcretest.c            comprehensive test program
  pcredemo.c            simple demonstration of coding calls to PCRE
  perltest              Perl test program
  pcregrep.c            source of a grep utility that uses PCRE
  pcre-config.in        source of script which retains PCRE information
  testdata/testinput1   test data, compatible with Perl
  testdata/testinput2   test data for error messages and non-Perl things
  testdata/testinput3   test data for locale-specific tests
  testdata/testinput4   test data for UTF-8 tests compatible with Perl
  testdata/testinput5   test data for other UTF-8 tests
  testdata/testoutput1  test results corresponding to testinput1
  testdata/testoutput2  test results corresponding to testinput2
  testdata/testoutput3  test results corresponding to testinput3
  testdata/testoutput4  test results corresponding to testinput4
  testdata/testoutput5  test results corresponding to testinput5

(C) Auxiliary files for Win32 DLL

  dll.mk
  pcre.def

(D) Auxiliary file for VPASCAL

  makevp.bat

Philip Hazel <ph10@cam.ac.uk>
December 2003