/ CORRELATION_TOOL_README.md
CORRELATION_TOOL_README.md
  1  # Transaction Correlation Monitor Tool
  2  
  3  ## 🎯 Purpose
  4  
  5  This tool monitors **Espresso testnet** and **Caff Node** simultaneously to find correlations between transactions in different formats:
  6  - **Espresso**: TX~ format (base64)
  7  - **Caff Node**: 0x format (hex)
  8  
  9  ## 🚀 Quick Start
 10  
 11  ```bash
 12  # Monitor for 10 minutes (default)
 13  node correlate-tx-monitor.js
 14  
 15  # Monitor for 20 minutes
 16  node correlate-tx-monitor.js 20
 17  
 18  # Monitor for 30 minutes (to catch more activity)
 19  node correlate-tx-monitor.js 30
 20  ```
 21  
 22  ## 📊 What It Does
 23  
 24  ### Data Collection
 25  
 26  1. **Fetches current block heights** from both networks
 27  2. **Calculates block range** for the time window (e.g., last 10 minutes)
 28  3. **Monitors Espresso** RARI namespace (1380012617) for TX~ transactions
 29  4. **Monitors Caff Node** Rari chain for 0x transactions
 30  5. **Correlates transactions** within 5s-3min delay tolerance
 31  
 32  ### Correlation Algorithm
 33  
 34  **Matches transactions based on:**
 35  - ✅ Timestamp proximity (5s - 3min window)
 36  - ✅ Transaction index in block
 37  - ✅ Transaction size similarity
 38  - ✅ Namespace correctness
 39  - ✅ Block transaction count
 40  
 41  **Confidence scoring:**
 42  - **High (≥80%)**: Very likely match - timestamp <1min, index matches
 43  - **Medium (60-79%)**: Probable match - timestamp <3min, close index
 44  - **Low (40-59%)**: Possible match - within time window
 45  
 46  ## 📁 Output Files
 47  
 48  Every run generates **4-5 files**:
 49  
 50  ### 1. JSON Report (Complete Data)
 51  **File**: `correlation-full-{timestamp}.json`
 52  
 53  Contains:
 54  - Metadata (timestamp, duration, config)
 55  - Summary statistics
 56  - **ALL Espresso transactions** (with timestamps, hashes, blocks)
 57  - **ALL Caff Node transactions** (with timestamps, hashes, from/to, value)
 58  - **ALL correlations found** (with confidence scores)
 59  
 60  ### 2. Espresso Transactions CSV
 61  **File**: `espresso-transactions-{timestamp}.csv`
 62  
 63  Columns:
 64  ```
 65  Hash, Block, Index, Namespace, Timestamp, Timestamp_ISO, Size
 66  ```
 67  
 68  ### 3. Caff Node Transactions CSV
 69  **File**: `caff-transactions-{timestamp}.csv`
 70  
 71  Columns:
 72  ```
 73  Hash, Block, Index, Timestamp, Timestamp_ISO, From, To, Value, Gas, Size
 74  ```
 75  
 76  **Example row:**
 77  ```csv
 78  "0x025cdd51...",1416393,0,1761360628,"2025-10-25 02:50:28","0x000...a4b05","0x000...a4b05","0x0",0,133
 79  ```
 80  
 81  ### 4. Correlations CSV (if correlations found)
 82  **File**: `correlations-{timestamp}.csv`
 83  
 84  Columns:
 85  ```
 86  Espresso_TX, Espresso_Block, Espresso_Index, Espresso_Time,
 87  Caff_TX, Caff_Block, Caff_Index, Caff_Time,
 88  Time_Diff_Seconds, Confidence, Confidence_Percent
 89  ```
 90  
 91  ### 5. Human-Readable Text Report
 92  **File**: `correlation-report-{timestamp}.txt`
 93  
 94  Contains:
 95  - Configuration summary
 96  - Data collection summary
 97  - Confidence distribution
 98  - Time delay statistics
 99  - Top 10 correlations
100  
101  ## 📈 Current Results (Oct 27, 2025)
102  
103  ### Run 1: 10-minute window
104  ```
105  Duration: 10 minutes
106  Espresso blocks: #5721926 - #5721976 (50 blocks)
107  Caff Node blocks: #1416393 - #1416443 (50 blocks)
108  
109  Results:
110  ✅ Caff Node: 102 transactions collected
111  ❌ Espresso: 0 RARI transactions found
112  ⚠️  No correlations (no Espresso activity in this window)
113  ```
114  
115  **Observation**: Caff Node is very active (2 txs per block), but RARI namespace on Espresso had no transactions during this time.
116  
117  ## 🔍 Analysis of Collected Data
118  
119  ### Caff Node Transaction Patterns
120  
121  From the collected 102 transactions:
122  
123  **Transaction Types:**
124  1. **System transactions** (majority)
125     - From/To: `0x00000000000000000000000000000000000a4b05`
126     - Value: 0
127     - Gas: 0
128     - Size: 133-165 bytes
129     - Pattern: Every block has these
130  
131  2. **User transactions** (occasional)
132     - From: Real addresses
133     - To: Contracts or null (deployments)
134     - Gas: 53,458 - 3,496,663
135     - Example: Block 1416400, 1416401
136  
137  **Timestamp Distribution:**
138  ```
139  Oct 25, 02:50 - Block 1416393
140  Oct 25, 03:41 - Block 1416394
141  Oct 25, 13:42 - Block 1416395
142  Oct 25, 23:43 - Block 1416396
143  Oct 26, 09:44 - Block 1416397
144  Oct 26, 19:45 - Block 1416398
145  Oct 27, 05:46 - Block 1416399
146  Oct 27, 15:04 - Block 1416400-1416443
147  ```
148  
149  **Block time**: ~10 hours between blocks (very slow for last few days)
150  
151  ## 💡 Recommendations
152  
153  ### To Find Correlations
154  
155  1. **Run for longer duration** (30-60 minutes)
156     ```bash
157     node correlate-tx-monitor.js 60
158     ```
159  
160  2. **Check when RARI is active**
161     - Monitor Espresso explorer for RARI namespace activity
162     - Run tool when you see activity
163  
164  3. **Analyze historical data**
165     - Check blocks when both networks had activity
166     - Look at known transaction pairs (see documentation)
167  
168  ### To Improve Correlation
169  
170  1. **Adjust delay window** if needed
171     - Current: 5s - 3min
172     - Can be modified in script
173  
174  2. **Add more matching factors**
175     - Transaction sender address
176     - Transaction value
177     - Contract called
178  
179  3. **Implement caching**
180     - Store known correlations
181     - Build correlation database
182  
183  ## 📊 How to Analyze the CSV Files
184  
185  ### In Excel/Google Sheets
186  
187  1. **Open Caff Node CSV**
188     - Sort by Timestamp to see chronological order
189     - Filter by From/To to see user transactions
190     - Look for patterns in transaction timing
191  
192  2. **Open Espresso CSV** (when data available)
193     - Sort by Block height
194     - Compare timestamps with Caff Node
195     - Match by proximity
196  
197  3. **Manual correlation**
198     - Look for transactions within 5s-3min
199     - Match by transaction index
200     - Verify with block timing
201  
202  ### In Python/Pandas
203  
204  ```python
205  import pandas as pd
206  
207  # Load data
208  caff = pd.read_csv('caff-transactions-*.csv')
209  espresso = pd.read_csv('espresso-transactions-*.csv')
210  
211  # Convert timestamps
212  caff['timestamp'] = pd.to_datetime(caff['Timestamp'], unit='s')
213  espresso['timestamp'] = pd.to_datetime(espresso['Timestamp'], unit='s')
214  
215  # Find matches within 3 minutes
216  matches = []
217  for _, caff_tx in caff.iterrows():
218      time_diff = abs(espresso['timestamp'] - caff_tx['timestamp'])
219      within_window = espresso[time_diff < pd.Timedelta(minutes=3)]
220      if not within_window.empty:
221          matches.append({
222              'caff_tx': caff_tx['Hash'],
223              'espresso_tx': within_window.iloc[0]['Hash'],
224              'time_diff': time_diff.min().total_seconds()
225          })
226  
227  print(f"Found {len(matches)} potential matches")
228  ```
229  
230  ## 🔧 Configuration
231  
232  Edit the script to adjust:
233  
234  ```javascript
235  const DELAY_MIN = 5;      // Minimum delay (seconds)
236  const DELAY_MAX = 180;    // Maximum delay (seconds)
237  const RARI_NAMESPACE = 1380012617; // Namespace to monitor
238  const POLL_INTERVAL = 5000; // Poll interval (ms)
239  ```
240  
241  ## 🐛 Troubleshooting
242  
243  ### No Espresso transactions found
244  
245  **Possible causes:**
246  1. RARI namespace not active in this time window
247  2. Check Espresso explorer for recent RARI activity
248  3. Try longer monitoring duration
249  
250  **Solution**: Run when RARI is active or check historical data
251  
252  ### API rate limiting
253  
254  **Symptom**: Slow data collection or timeouts
255  
256  **Solution**: Increase delays in script
257  ```javascript
258  await new Promise(resolve => setTimeout(resolve, 500)); // Increase from 100ms
259  ```
260  
261  ### Large file sizes
262  
263  **Symptom**: JSON/CSV files are huge
264  
265  **Solution**: Monitor shorter periods or filter transactions in post-processing
266  
267  ## 📝 Example Use Cases
268  
269  ### 1. Real-time Monitoring
270  ```bash
271  # Watch for new correlations
272  node correlate-tx-monitor.js 10
273  # Check files every 10 minutes
274  ```
275  
276  ### 2. Historical Analysis
277  ```bash
278  # Collect data for 1 hour
279  node correlate-tx-monitor.js 60
280  # Analyze CSV files in spreadsheet
281  ```
282  
283  ### 3. Pattern Discovery
284  ```bash
285  # Multiple runs at different times
286  node correlate-tx-monitor.js 20  # Morning
287  node correlate-tx-monitor.js 20  # Afternoon
288  node correlate-tx-monitor.js 20  # Evening
289  # Compare patterns
290  ```
291  
292  ## 🎯 Next Steps
293  
294  1. **Capture active period**: Run when RARI namespace is active
295  2. **Analyze patterns**: Study transaction timing and delays
296  3. **Build correlation database**: Store proven matches
297  4. **Improve algorithm**: Add more matching factors
298  5. **Automate**: Run continuously and alert on high-confidence matches
299  
300  ## 📚 Related Documentation
301  
302  - `CAFF_NODE_INTEGRATION_PLAN.md` - Full integration strategy
303  - `TX_CORRELATION_STRATEGY.md` - Correlation algorithms in detail
304  - `TEST_REPORT.md` - API testing results
305  
306  ## 🚀 Success Metrics
307  
308  **When tool finds correlations:**
309  - ✅ High confidence (≥80%): Ready to use
310  - ✅ Medium confidence (60-79%): Review manually
311  - ⚠️ Low confidence (<60%): Needs improvement
312  
313  **Target**: Find 10+ high-confidence correlations to validate approach
314  
315  ---
316  
317  **Tool Version**: 1.0  
318  **Last Updated**: Oct 27, 2025  
319  **Status**: ✅ Working, waiting for RARI activity