Name:
Location: Hyderabad, India

4/23/2005

Buffer overflow bug explained

What is a buffer overflow and how does it happen? Why does it seem like almost every virus, spyware or hacking program is taking advantage of this flaw? How did this one problem become so out of control that it can not be fixed? Lingo - The Talk of Broadband For the many of you who don't know the answers to these questions, this article will try to answer them as plain as possible. To understand the idea behind this trick called buffer overflow we must first present some basic rules of program execution and hardware, which a typical computer might be using. For the sake of simplicity we will use a very basic and primitive layout. Memory All programs use specified sections of memory where they are loaded into and run from. Each individual byte of memory represents a single place holder similar to a square found on a chess board. To keep track of each square location they are labeled vertically and horizontally just like letters and numbers on a chess board. However, instead of a traditional labeling system, computer memory is labeled with both letters and numbers for columns and rows which make up the amount of memory your system handles at one time. To give you a better understanding, imagine a chess board the size of 16x16 or 256 bytes of memory for both, horizontal and vertical, labeled 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F. This also conveniently enough happens to be the proper hexadecimal system all computer processors use today. Each of the 256 chess board squares is a location called an "address". Value and location Often they can be confused because the same characters are used to represent the very data content which is placed in memory (on the chess board). If our chess board represents locations, chess pieces with characters on them would be data values. The confusion comes from when a value is placed in the same character representation of an address location. For example; 6 stored at address 6 or 2B stored at address 2B. In any case, it is important to keep in mind the difference between what is stored (the value) and where it is stored (the address location). To separate the difference imagine our chess board as a flat 2D surface with location address written on each square and the chess pieces as 3D cubes with data value written on each of the six sides. Program execution The program execution is what all hackers and virus writers are after. It is the sequential reading of data which is stored in memory and passed to the central processor to execute one byte and one address at a single time. An example of this would be following a horizontal line on our chess board, of 3D data cubes and picking up each one in sequence to read the value on the cube and then the value on the board. Reading starts from left to right and wraps around to the next bottom left line just like reading text. Once execution starts, when you double click on a program or trigger an even in an existing program, it follows a critically specified path which was paved at the time when the program was written. Once taken over by the attacker, a new path can be paved to create new actions often unfriendly and damaging. The execution path can have many branches which curve program execution in many different ways depending on what is and what is not happening at a given moment of current condition of a task. Because many of those tasks repeat exactly as they appear, a function to jump out of the path can be used to save on space, and use the same path over and over again until another condition is met. The call The "call" function which jumps out of sequential program execution only to follow another path located somewhere else in memory is the problematic one here. For example when reading lyrics of a song, the chorus often is repeated but written once for that same reason. Once the jump is made and program execution begins in another place, there must be a way to remember where the old path ended. Kind of like remembering which verse to go back to after the chorus is over. A buffer, often called a "stack" is defined to store the address location immediately after a "call" function, and then released after the program returns. However, when the "call" function finishes, the actual data values are not erased from the stack, and instead something called a "stack pointer" keeps track of where the next return address is located at. Unavoidably, the external paths can also have other external paths which the programmer might want to use in another "call" function before the program gets a chance to return. For this reason, the stack buffer is often preset to be large enough to accept many "call" functions immediately one after another. Unfortunately, as you are about to find out, this buffer is not infinite. The overflow As you can already guess, the overflow happens when the stack buffer runs out of room to store all of those address locations for the many "call" functions which are executed. The attacker, who purposely overflows the stack buffer, uses a section of the program which allows for this multiple "call" mechanism as part of a normal program function, which is in many ways overlooked by the programmer and designer of the program. An extensive look in to the workings of a program can reveal many ways of doing this in a poorly created and un-debugged application. But the overflow itself will not let the attacker take over the program execution to form a new path. Because of space limitations and the constant compaction of code by software engineers, it is often the case that the end of the stack buffer is where another program path execution begins. As the buffer overfills, the new address value of the next return address eventually overwrites on top of that path and changes the original code which was once written there. Obviously, that code path will no longer function the same, and if code execution ever reaches that point again it will execute the new path paved by the attacker who forced the code on to the buffer stack. This is where the value and location confusion frequently happens because the location addresses which are being stored on to the stack buffer can also be carefully crafted values which in sequence make up a new path of code execution. Because the values are not erased from the "stack" they will be forming new code and new path for next time the code executes the spilled end of the stack buffer. From this point, a new path can be formed with more code injections into memory. The attacker has control over the execution and has unlimited access to the computer (or other hardware). Whatever the user was able to do; the attacker can do as well and even more. The bug fix There have been many patches and many attempts to fix this problem for every hardware system imaginable. Most of those fixes just stop the current trick to overflow the buffer and a new way to overflow it the same exact way can quickly be found. The proper way to fix the problem is to include a check after every stack buffer function and make sure it is not overfilling. However, most programmers get occupied with their project and sometimes are inexperienced and forget to include those checks in some places. Luckily there have been some improvements and actual fixes of the "call" and other similar functions at their core. Programming languages such as Java, Perl, Python and the new .NET languages such as C# and J# have special compilers which eliminate the problem all together. The programmer doesn't have to remember to put a check in every place and instead, the compiler takes care of the problem in every spot. This simple fix has been often proposed but because it slows down program execution, is hasn't been implemented until recently now that processors are much faster and can handle the extra work. Another fix to this problem, but maybe an attempt at best, was the introduction of the Enhanced Virus Protection in AMD 64 bit processors. With the release of Service Pack 2 for Windows XP a bit can be turned on and not allow for the stack buffer values to be executed. Intel answered this with their own version of Titanium processors with the NX (No eXecution) bit. While this fixes all attacks which took advantage of stack to execute code, it still allows it to be over flown. The code to execute could be forced to overflow further pass the stack buffer and still executed from there. Nevertheless this hardware based solution is a solid improvement over the old buffer overflow security and with the help of new programming languages, which fix the problem at its core, in time we should never hear of another buffer overflow hack again.

0 Comments:

Post a Comment

<< Home