Full disclosure: heap overflow in H. Spencer’s regex library on 32 bit systems

Introduction

The following document describes a heap overflow vulnerability in Henry Spencer’s regex library, affecting 32 bit systems only. This library, or variations on and derivations of it, is used in such software as:

PHP
LLVM
MySQL server
Bionic libc

As well as various other *BSD libc implementations:

FreeBSD
NetBSD

The above applications are listed here merely to point out that they include the library. I have NOT tested the above applications for being vulnerable and thus I cannot give any guarantee that they are; they are listed here to point out that the library has been disseminated widely and that the vulnerability MAY not only be exploitable in “laboratory setting” cases and the danger of it MAY permeate deeply into software stacks.

The vulnerability requires a significant amount of control over one of the library’s functions to be exploited and is unlikely to occur in a general programming context, since it requires a string of ~683 megabytes to be constructed. However, allocations of such a size are, in certain contexts, certainly feasible. An additional factor that limits the overall feasibility of an attack is that the exact data written outside the bounds of the heap can only be controlled by the attacker to a certain extent, as opposed to a fully arbitrary mutation of memory.

Technical description

Source code excerpts that follow are taken from https://codeload.github.com/garyhouston/rxspencer/tar.gz/alpha3.8.g5 (as referenced to on http://www.arglist.com/regex/).

The vulnerability is caused inside the regcomp function:

85 int /* 0 success, otherwise REG_something */
86 regcomp(preg, pattern, cflags)
87 regex_t *preg;
88 const char *pattern;
89 int cflags;
90 {

This function compiles the regex as defined in string form by ‘const char *pattern’.

The vulnerable code:

111 len = strlen((char *)pattern);
...
...
118 p->ssize = len/(size_t)2*(size_t)3 + (size_t)1; /* ugh */
119 p->strip = (sop *)malloc(p->ssize * sizeof(sop));

‘len’ is here enlarged to such an extent that, in the process of enlarging (multiplication and addition), causes the 32 bit register/variable to overflow.

Formally, the smallest value of ‘len’ that causes an overflow is:

(2<<32 / 4 - 1) / 3 * 2 = 0x2AAAAAAA

Conversely:

(0x2AAAAAAA / 2 * 3 + 1) * 4 = 0x100000000

But since this is too large a value for a 32 bit register to hold, we yield:

0x100000000 & 0xFFFFFFFF = 0x00000000

The smallest ‘len’ value to result in a positive value to be passed to malloc is:

((0x2AAAAAAC / 2 * 3 + 1) * 4) & 0xFFFFFFFF = 0x0000000C

This is about 0x2AAAAAAC / 1024 / 1024 = 682 megabytes.

The ‘p->ssize’ variable, however, does not overflow, and contains the number of elements purportedly allocated by malloc, and is therefore an unreliable indicator to the library as to the size of the allocated buffer:

1375 /* deal with undersized strip */
1376 if (p->slen >= p->ssize)
1377 enlarge(p, (p->ssize+1) / 2 * 3); /* +50% */

Having discovered this vulnerability only recently, my research into the actual exploitability has been limited. At present I am mainly concerned at pointing it out rather than exploiting it. However, mutation of the heap-allocated memory that p->strip points to is mainly performed by the doemit function:

1363 doemit(p, op, opnd)
1364 register struct parse *p;
1365 sop op;
1366 size_t opnd;
1367 {
1368 /* avoid making error situations worse */
1369 if (p->error != 0)
1370 return;
1371
1372 /* deal with oversize operands ("can't happen", more or less) */
1373 assert(opnd < 1<<OPSHIFT);
1374
1375 /* deal with undersized strip */
1376 if (p->slen >= p->ssize)
1377 enlarge(p, (p->ssize+1) / 2 * 3); /* +50% */
1378 assert(p->slen < p->ssize);
1379
1380 /* finally, it's all reduced to the easy case */
1381 p->strip[p->slen++] = SOP(op, opnd);
1382 }

A simply grep of the invocations to doemit() in regcomp.c:

#define EMIT(op, sopnd) doemit(p, (sop)(op), (size_t)(sopnd))
EMIT(OEND, 0);
EMIT(OEND, 0);
EMIT(OOR2, 0); /* offset is very wrong */
EMIT(OLPAREN, subno);
EMIT(ORPAREN, subno);
EMIT(OBOL, 0);
EMIT(OEOL, 0);
EMIT(OANY, 0);
EMIT(OOR2, 0); /* offset very wrong... */
EMIT(OBOL, 0);
EMIT(OEOL, 0);
EMIT(OANY, 0);
EMIT(OLPAREN, subno);
EMIT(ORPAREN, subno);
EMIT(OBACK_, i);
EMIT(O_BACK, i);
EMIT(OBOW, 0);
EMIT(OEOW, 0);
EMIT(OANYOF, freezeset(p, cs));
EMIT(OCHAR, (unsigned char)ch);
EMIT(OOR2, 0);
EMIT(OOR2, 0); /* offset very wrong... */
EMIT(op, opnd); /* do checks, ensure space */

where (regex2.h):

43 #define OPSHIFT (26)
46 #define SOP(op, opnd) ((op)|(opnd))
49 #define OEND (1<<OPSHIFT) /* endmarker - */
50 #define OCHAR (2<<OPSHIFT) /* character unsigned char */
51 #define OBOL (3<<OPSHIFT) /* left anchor - */
52 #define OEOL (4<<OPSHIFT) /* right anchor - */
53 #define OANY (5<<OPSHIFT) /* . - */
54 #define OANYOF (6<<OPSHIFT) /* [...] set number */
55 #define OBACK_ (7<<OPSHIFT) /* begin d paren number */
56 #define O_BACK (8<<OPSHIFT) /* end d paren number */
57 #define OPLUS_ (9<<OPSHIFT) /* + prefix fwd to suffix */
58 #define O_PLUS (10<<OPSHIFT) /* + suffix back to prefix */
59 #define OQUEST_ (11<<OPSHIFT) /* ? prefix fwd to suffix */
60 #define O_QUEST (12<<OPSHIFT) /* ? suffix back to prefix */
61 #define OLPAREN (13<<OPSHIFT) /* ( fwd to ) */
62 #define ORPAREN (14<<OPSHIFT) /* ) back to ( */
62 #define ORPAREN (14<<OPSHIFT) /* ) back to ( */
63 #define OCH_ (15<<OPSHIFT) /* begin choice fwd to OOR2 */
64 #define OOR1 (16<<OPSHIFT) /* | pt. 1 back to OOR1 or OCH_ */
65 #define OOR2 (17<<OPSHIFT) /* | pt. 2 fwd to OOR2 or O_CH */
66 #define O_CH (18<<OPSHIFT) /* end choice back to OOR1 */
67 #define OBOW (19<<OPSHIFT) /* begin word - */
68 #define OEOW (20<<OPSHIFT) /* end word - */

Given the way doemit works (OR-ing the first and second parameter of EMIT and writing it to p->strip), this means that someone exploiting this has only a limited amount of control over which values are written.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s