Ruby vulnerability: heap corruption in string.c tr_trans() due to undersized buffer

Response by Ruby team: “severe but usual bug, not a vulnerability.”
Fixed in

Configure with ASAN AddressSanitizer:

mkdir install; CFLAGS="-fsanitize=address" ./configure
--disable-install-doc --disable-install-rdoc --disable-install-capi
-prefix=`realpath ./install` && make -j4 && make install

Then execute:

$ ./ruby -e '"a".encode("utf-32").tr("b".encode("utf-32"),
==17122==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x602000014a98 at pc 0x7ff04065cf01 bp 0x7ffdfe7629b0 sp 0x7ffdfe7629a8
WRITE of size 4 at 0x602000014a98 thread T0

The actual corruption occurs here:

6196     TERM_FILL(t, rb_enc_mbminlen(enc));

Ruby vulnerability: heap corruption in DateTime.strftime() on 32 bit for certain format strings

Response by Ruby team: “severe but usual bug, not a vulnerability.”
Fixed in

Setting a very high precision in the date_strftime_with_tmx() function,
the following check (in the STRFTIME macro in date_strftime.c) will not
work as expected if s >= 0x80000000.

124         if (start + maxsize < s + precision) {          \
125             errno = ERANGE;                 \
126             return 0;                       \
127         }

This code causes a crash on my 32 bit system:

require 'date'"%2147483647c")

64 bit is probably not affected (technically possible, but

Ruby vulnerability: StringIO strio_getline() may divulge arbitrary process memory

Originally reported privately to Ruby on 4 Jun 2016
Testing was done on Ruby 2.3.1 in 32 bit VM
Ruby has expressly allowed me to talk publicly about this issue while a fix is being prepared

The problem is this line in ext/stringio/stringio.c strio_getline():

1002     if (limit > 0 && s + limit < e) {
1003     e = rb_enc_right_char_head(s, s + limit, e, get_enc(ptr));
1004     }

This works as intended as long as the sum of s (pointer) and limit
(long) doesn’t overflow. So if on a 32 bit system ‘s’ happens to be
0xBF000000, and limit is 0x7FFFFFFF, the sum of both values is
0x3EFFFFFF, which is a completely unrelated address. From there, there
are several paths to be chosen from based on what the first parameter to
the function is (‘str’).

  1005      if (NIL_P(str)) {
  1008      else if ((n = RSTRING_LEN(str)) == 0) {
  1024      else if (n == 1) {
  1030      else {

All these paths eventually call strio_substr(). A wrong ‘pos’ parameter
to this function is not possible because it was checked earlier:

   996      if (ptr->pos >= (n = RSTRING_LEN(ptr->string))) {
   997      return Qnil;
   998      }

a wrong len parameter to this function doesn’t matter as it will
correct it itself:

    98  static VALUE
    99  strio_substr(struct StringIO *ptr, long pos, long len)
   100  {
   101      VALUE str = ptr->string;
   102      rb_encoding *enc = get_enc(ptr);
   103      long rlen = RSTRING_LEN(str) - pos;
   105      if (len > rlen) len = rlen;
   106      if (len < 0) len = 0;
   107      if (len == 0) return rb_str_new(0,0);
   108      return rb_enc_str_new(RSTRING_PTR(str)+pos, len, enc);
   109  }

As for the first path (str is nil, line 1005), it will call
strio_substr() with an invalid len value, which doesn’t matter because
strio_substr() corrects it:

  1005      if (NIL_P(str)) {
  1006      str = strio_substr(ptr, ptr->pos, e - s);
  1007      }

Within the second path (str is an empty string, line 1008), there is
the risk of an OOB read here, because this routine’s logic is based on
the belief that ‘e’ denotes the end of the buffer. ‘p’ will never become
‘e’ because either 1) a null pointer dereference will occur (once it
reads at address 0x00000000) or 2) no \n character is found before p reaches an invalid memory page. In theory an attacker could use this
mishap to find the \n character at various places in memory (by
adjusting the ‘limit’ variable), but that is usually not very useful.
(The way an attacker can know at which the \n character is found will
become clear later).

  1009      p = s;
  1010      while (*p == '\n') {
  1011          if (++p == e) {
  1012          return Qnil;
  1013          }
  1014      }
  1015      s = p;
  1016      while ((p = memchr(p, '\n', e - p)) && (p != e)) {
  1017          if (*++p == '\n') {
  1018          e = p + 1;
  1019          break;
  1020          }
  1021      }
  1022      str = strio_substr(ptr, s - RSTRING_PTR(ptr->string), e - s);

The third path (str is 1 character large, line 1024) is similar to the
second path except that memchr is used to find the desired character:

  1025      if ((p = memchr(s, RSTRING_PTR(str)[0], e - s)) != 0) {
  1026          e = p + 1;
  1027      }
  1028      str = strio_substr(ptr, ptr->pos, e - s);

The fourth path is entered if str is 2 or more bytes large (line
1030). The first condition is always true if a very high ‘limit’ value
is chosen (the premise of this vulnerability):

  1031      if (n < e - s) {

The first subpath is never true in this case:

  1032          if (e - s < 1024) {

So the second subpath is entered. This can be used to find the arbitrary
string str across the totality of virtual memory:

  1040          else {
  1041          long skip[1 << CHAR_BIT], pos;
  1042          p = RSTRING_PTR(str);
  1043          bm_init_skip(skip, p, n);
  1044          if ((pos = bm_search(p, n, s, e - s, skip)) >= 0) {
  1045              e = s + pos + n;
  1046          }
  1047          }

After any of these paths have been traversed, the attacker can read the
pos attribute to get the relative location of the string that has been
found somewhere in memory:

  1051      ptr->pos = e - RSTRING_PTR(ptr->string);

By subtracting this current pos from the previous pos the attacker
can know the position of string that was searched for relative to the
base string.

My hypothesis is that, if we assume that the attacker can control the
‘limit’ variable as well as the string that has to be searched for and
they can invoke strio_getline an arbitrary number of times, they can
make Ruby divulge arbitrary information such as private keys (if they
are loaded in memory), by searching for BEGIN PGP PRIVATE KEY BLOCK
and adjust the limit parameter in combination with all alphanumeric
characters to deduce the entire base64-encoded private key.

Note that a pointer address can naturally be very high (on 32 bit
anyway), such as 0xFFFF0000. In that event, a limit of 0x10000 can be
enough to overflow this computation:

1002     if (limit > 0 && s + limit < e) {

Here is code that can be used to trigger the vulnerability.

require "stringio"
s =
x = s.gets('xxx', 0x7FFFFFF0)

The vulnerability is more likely to trigger on 32 bit than on 64 bit,
since on 32 bit, the chance that the base string is allocated beyond the
half of the virtual address space (0x80000000 or above, like 0xBF000000
in my initial example) than on 64 bit (where it needs to be allocated at
0x8000000000000000 or above). I did all of my testing on 32 bit.