- Published on
The Complete Guide to Reverse Engineering & Open-Source Replication in the AI Era
- Authors

- Name
- MASON Joey
- https://x.com/JoeyJoeMA
The Complete Guide to Reverse Engineering & Open-Source Replication in the AI Era
A Universal Methodology Using Open Computer Use as a Case Study
A hands-on summary from replicating Codex Computer Use as open source Applicable to all similar "analyze → reverse-engineer → replicate → open-source" projects
Table of Contents
- Project Background & Problem Definition
- Full Technology Stack Overview
- Phase 1: Reconnaissance & Information Gathering
- Phase 2: Static Analysis & Reverse Engineering
- Phase 3: Dynamic Analysis & Traffic Interception
- Phase 4: Breaking Process Signature Restrictions
- Phase 5: Core Feature Implementation
- Phase 6: Comparative Validation & Eval Feedback Loop
- Phase 7: Productization & Release
- Phase 8: Conquering Visual Details (Mouse Animation Reverse Engineering)
- Universal Methodology Summary
- Complete SOP for Future Similar Projects
1. Project Background & Problem Definition
1.1 What Is Computer Use
Computer Use is a technology that enables AI Agents to autonomously control a computer's interface (mouse, keyboard, applications) to complete tasks. There are traditionally two implementation approaches:
- Connectors mode: Directly calls application APIs such as Gmail and Slack — no UI manipulation
- GUI mode: Simulates mouse and keyboard events to directly operate on screen
OpenAI Codex's Innovation: Background Computer Use — a non-preemptive operation model. All previous solutions required "occupying" the screen, preventing the user from simultaneously using the computer. Codex's approach allows the AI to operate in the background without the user ever noticing.
1.2 Breaking Down the Objective
Before starting, the team clearly defined what needed to be replicated:
Original Goal: Replicate the "non-preemptive" core capability of Codex Computer Use
Broken down into:
├── Functional Goals (must complete)
│ ├── Background interaction with UI
│ ├── Screenshot capture for multimodal inference
│ └── Expose MCP service externally (9 tools)
└── Experience Goals (bonus)
├── Smooth mouse animation effects
├── Permission request floating window
└── One-click install/publish
Key Decision: Features first, experience second — advance in phases, don't let visual details block the main flow.
2. Full Technology Stack Overview
2.1 Analysis Toolchain
| Tool | Purpose | Principle |
|---|---|---|
file / strings | Binary basics | Reads ELF/Mach-O headers and printable strings |
class-dump / nm | Swift/ObjC symbol export | Extracts class names and method names from the binary symbol table |
Hopper Disassembler | Assembly-level reverse engineering | Decompiles machine code into pseudocode |
mitmdump | HTTPS man-in-the-middle traffic capture | Proxy intercept, TLS traffic decryption |
mitmproxy | Traffic capture visualization | GUI version of mitmdump |
Codex AI | Assists binary analysis | Multimodal understanding of decompilation results |
2.2 Development Stack
| Layer | Technology | Purpose |
|---|---|---|
| Core implementation | Swift | macOS native, accesses AX API |
| Service wrapper | MCP (Model Context Protocol) | Provides standardized tool interfaces externally |
| Signature bypass | Go | Write CLI to borrow Codex.app's signature |
| Package management & release | npm / Node.js | npm i -g open-computer-use |
| Animation algorithm | Swift + Bezier curves | Mouse movement path calculation |
| Multimedia processing | ffmpeg / ImageMagick | Video frame extraction, image processing |
2.3 AI Auxiliary Tool Usage
Codex (Primary)
├── Analyze binary files
├── Open multiple parallel sessions for division of labor
├── Traffic capture configuration and execution
└── Code generation and validation
Grok
└── Mining technical clues from Twitter/X
Claude / GPT-4o
└── Multimodal analysis (screenshots, video frame understanding)
3. Phase 1: Reconnaissance & Information Gathering
3.1 Goal: Find an Analyzable Entry Point
Before reverse-engineering any closed system, the very first step is always finding the physical files.
Steps
# 1. Find the location of Codex App's Computer Use plugin
ls -la ~/.codex/plugins/cache/openai-bundled/computer-use/
# 2. View directory structure
find ~/.codex/plugins/cache/openai-bundled/computer-use/ -type f | head -50
# 3. Check file size to estimate analysis workload
du -sh "~/.codex/plugins/cache/openai-bundled/computer-use/1.0.750/Codex Computer Use.app"
# → 26.5MB, manageable workload
# 4. View .app internal structure (macOS app bundle is essentially a directory)
ls -la "Codex Computer Use.app/Contents/"
# → MacOS/ (executables), Frameworks/, Resources/
Key Findings
- The entire feature is a standalone macOS App Bundle, 26.5MB
- The core executable is
SkyComputerUseClient - It exposes services externally via the MCP protocol (meaning the interface is standardized)
Why This Step Matters
Finding the physical files = finding the "battlefield." Without files, all subsequent analysis is empty talk. File size determines analysis cost; file type determines analysis method.
3.2 Gathering Public Information
Before diving into analysis, collect all the available "free intelligence":
Public information sources:
├── OpenAI official blog (feature intros, screenshots)
├── Twitter/X posts from Software.inc team members
├── App Store / release notes
├── MCP protocol specification docs (mcpprotocol.com)
└── Related open-source projects on GitHub
Actual operation: Ask Grok to search tweets from Software.inc and Ari, then infer technical keywords from the comments.
4. Phase 2: Static Analysis & Reverse Engineering
4.1 Extracting the Symbol Table (Swift Binary)
A Swift-compiled binary contains a wealth of symbol information — you can get class structure without ever executing the program.
# Extract all symbols (function names, class names, protocols)
nm "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient" | grep -i "computer\|screen\|mouse\|cursor"
# Use class-dump to extract ObjC/Swift class definitions
class-dump "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient" > symbols.txt
# strings extracts readable strings (discovers API endpoints, error messages, etc.)
strings "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient" | grep -E "mcp|tool|accessibility|screen"
Key Findings
Key clues extracted from the symbol table:
- AXUIElement (macOS Accessibility API related)
- CGScreenCapture (screenshot API)
- SkyComputerUseClient (main class name)
- MCP Tool related: 9 tool names in total
- osascript (AppleScript fallback)
4.2 AI-Assisted Analysis of Decompilation Results
Directly feed decompiled screenshots or text to Codex/Claude:
Prompt template:
"This is a code snippet decompiled from [binary name]. Please analyze:
1. What is the core functionality of this code?
2. What are the key data structures and interface definitions?
3. How does it interact with [target system]?
Please organize the analysis into a document."
Discovered Architecture:
SkyComputerUseClient
├── MCP Server Layer (external: 9 standardized tools)
│ ├── tool_1: screenshot
│ ├── tool_2: click
│ ├── tool_3: type_text
│ ├── tool_4: scroll
│ ├── tool_5: find_element
│ ├── tool_6: get_ui_tree
│ ├── tool_7: run_applescript
│ ├── tool_8: key_press
│ └── tool_9: wait
└── Core Interaction Layer (underlying: three control methods)
├── AX API (preferred, background control)
├── osascript (fallback when AX fails)
└── CGEvent (mouse events, last resort)
4.3 Core Principle: macOS Accessibility API
This is the technical foundation of the entire "non-preemptive" approach — it must be thoroughly understood.
What Is the Accessibility API (AX API)?
To support visually impaired users, macOS provides an interface called AXUIElement that allows programmatic reading and manipulation of all UI elements.
// Core capabilities of the AX API
import ApplicationServices
// 1. Get the AX tree (UI element tree) of an application
let appRef = AXUIElementCreateApplication(pid)
// 2. Find a specific UI element (by title, role, etc.)
var value: CFTypeRef?
AXUIElementCopyAttributeValue(appRef, kAXWindowsAttribute as CFString, &value)
// 3. Click a button in the background (no need for the window to be in the foreground!)
AXUIElementPerformAction(buttonRef, kAXPressAction as CFString)
// 4. Type text in the background
AXUIElementSetAttributeValue(fieldRef, kAXValueAttribute as CFString, "Hello" as CFTypeRef)
Key Feature: All AX API operations do not require the target window to be in the foreground — this is the fundamental reason for "non-preemptive" operation.
Three-Level Fallback Strategy: AX → osascript → CGEvent
func performClick(element: AXUIElement?) {
// First priority: AX API (most precise, works in background)
if let element = element {
let result = AXUIElementPerformAction(element, kAXPressAction as CFString)
if result == .success { return }
}
// Second priority: Apple Script (good compatibility)
let script = "tell application \"Safari\" to click button 1 of window 1"
NSAppleScript(source: script)?.executeAndReturnError(nil)
// Last resort: CGEvent mouse simulation (will occupy screen)
let event = CGEvent(mouseEventSource: nil, mouseType: .leftMouseDown,
mouseCursorPosition: point, mouseButton: .left)
event?.post(tap: .cghidEventTap)
}
5. Phase 3: Dynamic Analysis & Traffic Interception
5.1 Why Traffic Capture Is Necessary
Static analysis tells you "what exists," but the precise parameter definitions of tools (JSON Schema) can only be obtained from actual calls. Manually rewriting parameter definitions makes it nearly impossible to achieve 100% strict alignment.
Goal: Directly capture from Codex's actual network requests:
- Complete system prompt
- Precise JSON Schema definitions for all 9 tools
- Actual request/response formats
5.2 Configuring mitmdump for HTTPS Man-in-the-Middle Capture
Installation & Basic Configuration
# Install mitmproxy (includes mitmdump)
pip install mitmproxy
# or
brew install mitmproxy
# Start mitmdump, listening on port 8080
mitmdump -p 8080 --save-stream-file capture.mitm
# View real-time traffic (visual)
mitmweb -p 8080
Install mitmproxy Root Certificate (Trust HTTPS Decryption)
# 1. After starting mitmproxy, visit mitm.it in your browser
# 2. Download the certificate for your platform
# 3. Trust the certificate in macOS Keychain
# Or install via command line
security add-trusted-cert -d -r trustRoot -k ~/Library/Keychains/login.keychain ~/.mitmproxy/mitmproxy-ca-cert.pem
Configure Codex to Route Through the Proxy
# Method 1: System proxy (recommended)
# System Preferences → Network → Advanced → Proxies → HTTP/HTTPS Proxy → 127.0.0.1:8080
# Method 2: Environment variables
export HTTP_PROXY=http://127.0.0.1:8080
export HTTPS_PROXY=http://127.0.0.1:8080
# Method 3: Let Codex configure it itself (recursive)
# Simply tell Codex: "Please configure mitmdump and start traffic capture"
5.3 Recursive Capture: Let Codex Call Itself While Being Captured
This is the most elegant operation of the entire process:
User → Tell Codex: "Call your computer-use plugin to execute a screenshot task, while capturing traffic with mitmdump"
↓
Codex configures the proxy environment
↓
Codex calls its own computer-use MCP tool
↓
mitmdump captures all HTTPS requests
↓
User obtains the complete tools definition and system prompt
Captured Data Structure (Example)
{
"model": "gpt-4o",
"tools": [
{
"name": "screenshot",
"description": "Capture a screenshot of the current screen state...",
"input_schema": {
"type": "object",
"properties": {
"app_name": {
"type": "string",
"description": "Target application name"
}
}
}
}
// ... precise definitions for the remaining 8 tools
],
"system": "You are a computer use agent capable of..."
}
5.4 Writing a mitmdump Filter Script
# filter.py - only capture computer-use related requests
from mitmproxy import http
import json
def request(flow: http.HTTPFlow) -> None:
if "computer-use" in flow.request.pretty_url or \
"api.openai.com" in flow.request.pretty_url:
# Save request body
if flow.request.content:
data = json.loads(flow.request.content)
with open("captured_tools.json", "w") as f:
json.dump(data, f, indent=2)
print(f"[+] Captured request to {flow.request.pretty_url}")
# Run: mitmdump -s filter.py -p 8080
6. Phase 4: Breaking Process Signature Restrictions
6.1 Problem Discovery
# Attempt to connect to the official plugin directly using an MCP Client
npx @modelcontextprotocol/inspector "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient"
# Result: process crashes immediately
# Error: Process terminated with signal 9 (SIGKILL)
6.2 Diagnosing the Cause
# View process crash logs
log show --predicate 'process == "SkyComputerUseClient"' --last 5m
# Analyze crash report
cat ~/Library/Logs/DiagnosticReports/SkyComputerUseClient*.crash | grep -A 20 "Exception"
# Key finding: the process uses SecCodeCopyGuestWithAttributes to verify the parent process signature
# Only a parent process signed by Codex.app can launch it
6.3 Solution: Signature Inheritance Proxy
Approach: Write a Go program that runs within Codex.app's process context, thereby inheriting the correct signature.
// launcher.go
package main
import (
"fmt"
"os"
"os/exec"
"path/filepath"
)
func main() {
// Find the path to SkyComputerUseClient
pluginPath := filepath.Join(
os.Getenv("HOME"),
".codex/plugins/cache/openai-bundled/computer-use/1.0.750",
"Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient",
)
// Launch process, inheriting the signature context of the current process
cmd := exec.Command(pluginPath)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
fmt.Fprintf(os.Stderr, "Failed to launch: %v\n", err)
os.Exit(1)
}
}
# Compile
go build -o codex-launcher launcher.go
# Key: use codesign to let the launcher borrow Codex.app's signature
# (or call from within the Codex.app process to directly inherit)
# Actual operation: have Codex itself execute this launcher
# Since Codex itself is a validly signed process, child processes forked from it inherit the signature
6.4 Verification of Success
# Successfully call the official MCP via CLI
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | ./codex-launcher
# Expected output: the complete list of 9 tools
{
"result": {
"tools": [
{"name": "screenshot", ...},
// ...
]
}
}
7. Phase 5: Core Feature Implementation
7.1 Project Structure (Starting from harness-template)
open-computer-use/
├── Sources/
│ └── ComputerUse/
│ ├── main.swift # Entry point
│ ├── MCPServer.swift # MCP protocol layer
│ ├── AccessibilityEngine.swift # AX API core
│ ├── ScreenCapture.swift # Screenshot module
│ ├── AppleScriptFallback.swift # Fallback solution
│ └── Tools/ # 9 MCP tool implementations
│ ├── ScreenshotTool.swift
│ ├── ClickTool.swift
│ ├── TypeTextTool.swift
│ └── ...
├── Package.swift
├── install.sh # One-click install script
└── docs/ # Analysis docs continuously output by AI (LLM Wiki)
├── architecture.md
├── ax-api-reference.md
└── mcp-tools-spec.md
7.2 MCP Server Implementation
MCP (Model Context Protocol) is a standardized tool interface protocol proposed by Anthropic. AI models use MCP to call external tools.
// MCPServer.swift
import Foundation
struct MCPServer {
// Tool registry
let tools: [String: MCPTool] = [
"screenshot": ScreenshotTool(),
"click": ClickTool(),
"type_text": TypeTextTool(),
"scroll": ScrollTool(),
"find_element": FindElementTool(),
"get_ui_tree": GetUITreeTool(),
"run_applescript": RunAppleScriptTool(),
"key_press": KeyPressTool(),
"wait": WaitTool(),
]
// Handle JSON-RPC requests
func handle(request: JSONRPCRequest) -> JSONRPCResponse {
switch request.method {
case "tools/list":
return listTools()
case "tools/call":
return callTool(request)
default:
return errorResponse("Unknown method")
}
}
// Communicate with MCP Client via stdio
func run() {
while let line = readLine() {
guard let data = line.data(using: .utf8),
let request = try? JSONDecoder().decode(JSONRPCRequest.self, from: data) else {
continue
}
let response = handle(request: request)
let responseJSON = try! JSONEncoder().encode(response)
print(String(data: responseJSON, encoding: .utf8)!)
}
}
}
7.3 Core Tool Implementation Examples
// ScreenshotTool.swift
import ScreenSaver
import AppKit
struct ScreenshotTool: MCPTool {
var name = "screenshot"
var description = "Capture a screenshot of the current screen or a specific application window"
var inputSchema: JSONSchema = [
"type": "object",
"properties": [
"app_name": ["type": "string", "description": "Target app name (optional)"]
]
]
func execute(input: [String: Any]) async throws -> MCPToolResult {
let appName = input["app_name"] as? String
// Screenshot logic
let screenshot: NSImage
if let app = appName {
screenshot = try captureApp(named: app) // Capture specific app
} else {
screenshot = try captureScreen() // Capture full screen
}
// Convert to base64 and return
let base64 = screenshot.tiffRepresentation?
.base64EncodedString() ?? ""
return MCPToolResult(
content: [["type": "image", "data": base64, "mimeType": "image/png"]]
)
}
}
// AccessibilityEngine.swift - Core background control implementation
import ApplicationServices
class AccessibilityEngine {
// Find UI element
func findElement(inApp pid: pid_t, matching query: ElementQuery) -> AXUIElement? {
let appRef = AXUIElementCreateApplication(pid)
return searchUITree(root: appRef, query: query)
}
// Recursively search UI tree
private func searchUITree(root: AXUIElement, query: ElementQuery) -> AXUIElement? {
var children: CFTypeRef?
AXUIElementCopyAttributeValue(root, kAXChildrenAttribute as CFString, &children)
guard let childArray = children as? [AXUIElement] else { return nil }
for child in childArray {
if matches(element: child, query: query) { return child }
if let found = searchUITree(root: child, query: query) { return found }
}
return nil
}
// Background click (no need for window in foreground!)
func click(element: AXUIElement) throws {
let result = AXUIElementPerformAction(element, kAXPressAction as CFString)
guard result == .success else {
throw AccessibilityError.actionFailed(result)
}
}
// Background text input
func typeText(_ text: String, into element: AXUIElement) throws {
let result = AXUIElementSetAttributeValue(
element,
kAXValueAttribute as CFString,
text as CFTypeRef
)
guard result == .success else {
throw AccessibilityError.setValueFailed(result)
}
}
}
8. Phase 6: Comparative Validation & Eval Feedback Loop
8.1 Designing the Comparative Validation Framework
With the ability to "call the official version," establish a rigorous comparative validation system:
# Test script: run the same task on both official and open-source versions
#!/bin/bash
TASK="Take a screenshot of the current page in Safari browser"
echo "=== Official Version ==="
echo "$TASK" | codex --use-plugin=computer-use 2>&1 | tee official_output.json
echo "=== Open Source Version ==="
echo "$TASK" | codex --mcp-server=open-computer-use 2>&1 | tee opensource_output.json
# Compare differences
diff official_output.json opensource_output.json
8.2 Multi-Dimensional Comparison Metrics
Comparison dimensions:
├── Functionality (P0)
│ ├── Tool call success rate
│ ├── Screenshot quality (resolution, completeness)
│ ├── UI element location accuracy
│ └── Text input accuracy
├── Performance (P1)
│ ├── Response latency
│ ├── Memory usage
│ └── CPU usage
└── Stability (P2)
├── Long-run stability
└── Exception recovery capability
8.3 Dog Fooding: Using Your Own Product for Testing
In the final stage, use open-computer-use itself to develop open-computer-use. This is both a test and the most realistic form of validation:
Tasks executed using open-computer-use:
- Open Xcode and modify a file
- Run tests in Terminal
- Take screenshots to verify UI changes
- Submit a Git commit
9. Phase 7: Productization & Release
9.1 Permission Request UI (Reference from Software.inc Approach)
macOS requires explicit requests for two permissions:
Accessibility(required for AX API)Screen Recording(required for screenshots)
// PermissionWindow.swift - Draggable floating window
import AppKit
class PermissionFloatingWindow: NSPanel {
init() {
super.init(
contentRect: NSRect(x: 0, y: 0, width: 320, height: 200),
styleMask: [.titled, .closable, .miniaturizable, .utilityWindow],
backing: .buffered,
defer: false
)
// Float above all other windows
self.level = .floating
self.isMovableByWindowBackground = true // Drag anywhere to move
// Check permission status and guide user to enable
setupPermissionChecks()
}
func setupPermissionChecks() {
// Check Accessibility permission
let axEnabled = AXIsProcessTrusted()
// Check Screen Recording permission
let screenEnabled = CGPreflightScreenCaptureAccess()
// Update UI based on status
updateUI(ax: axEnabled, screen: screenEnabled)
}
@objc func openAccessibilitySettings() {
NSWorkspace.shared.open(URL(string: "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility")!)
}
}
9.2 Publishing to npm
// package.json
{
"name": "open-computer-use",
"version": "1.0.0",
"description": "Open-source alternative to Codex Computer Use",
"bin": {
"open-computer-use": "./bin/open-computer-use.js"
},
"scripts": {
"postinstall": "node scripts/install.js"
}
}
// bin/open-computer-use.js
#!/usr/bin/env node
const { execSync } = require('child_process');
const path = require('path');
// Find the Swift-compiled binary
const binaryPath = path.join(__dirname, '../bin/ComputerUse');
// Start the MCP service directly
execSync(binaryPath, { stdio: 'inherit' });
# install.js - automatically compile Swift code during installation
const { execSync } = require('child_process');
execSync('swift build -c release', { cwd: __dirname });
# Publish
npm publish
# User installation
npm install -g open-computer-use
# Add to Codex MCP config (one-click command)
open-computer-use install-to-codex
9.3 One-Click Integration with Codex MCP
// Automatically modify ~/.codex/config.json
const configPath = path.join(os.homedir(), '.codex', 'config.json');
const config = JSON.parse(fs.readFileSync(configPath, 'utf8'));
config.mcpServers = config.mcpServers || {};
config.mcpServers['open-computer-use'] = {
command: 'open-computer-use',
args: []
};
fs.writeFileSync(configPath, JSON.stringify(config, null, 2));
console.log('✅ open-computer-use has been added to Codex MCP config');
9.4 Logo Design (Fully AI-Generated)
Prompt to Codex:
"Design a logo for a project called open-computer-use.
Theme: AI controlling a computer in the background, mouse cursor is the core element.
Style: clean, modern, geeky.
Format: SVG, provide multiple options to choose from."
→ AI outputs multiple SVGs
→ Convert formats using ffmpeg/ImageMagick
→ AI self-reviews (send screenshots back to AI for confirmation)
→ Select final design
10. Phase 8: Conquering Visual Details (Mouse Animation Reverse Engineering)
This is the most hardcore part, demonstrating how visual effects can also be reverse-engineered.
10.1 Video Frame Analysis
# Download the demo video from the Software.inc author
# Use ffmpeg to extract frames (30 frames per second)
ffmpeg -i demo_video.mp4 -vf fps=30 frames/frame_%04d.png
# Ask Codex to analyze key frames
# Find the mouse position change sequence, reverse-engineer the motion curve
10.2 Keyword Identification & Paper Search
Extract keywords from Twitter comments: calculates natural and aesthetic motion paths
Ask AI to search relevant materials:
├── Bezier Curve mouse paths
├── Fitts' Law — human mouse movement patterns
├── Critically Damped Spring Animation
├── Related paper: "Natural Mouse Trajectory Simulation"
└── Open-source implementations: human-cursor, naturalmouser
10.3 Binary Reverse Engineering to Extract the Algorithm
When AI and papers aren't precise enough, reverse-engineer the binary directly:
# Use Hopper/IDA to decompile mouse animation-related functions
# Search for function name keywords
strings SkyComputerUseClient | grep -i "cursor\|animate\|bezier\|easing"
# Locate the core function in the decompiler
# Ask AI to analyze the assembly/pseudocode and reconstruct the algorithm
AI Prompt:
"This is a mouse animation function decompiled from SkyComputerUseClient.
Please analyze the algorithm principle and re-implement it in Swift:
[paste the decompiled pseudocode]"
10.4 Mouse Path Algorithm Implementation
// CursorAnimator.swift
import CoreGraphics
import QuartzCore
class NaturalCursorAnimator {
// Cubic Bezier curve path
func generatePath(from start: CGPoint, to end: CGPoint) -> [CGPoint] {
// Generate natural control points between start and end
let midX = (start.x + end.x) / 2
let midY = (start.y + end.y) / 2
// Add random offset to simulate hand tremor
let offset = CGFloat.random(in: -20...20)
let control1 = CGPoint(x: midX + offset, y: start.y + offset * 0.5)
let control2 = CGPoint(x: midX - offset, y: end.y - offset * 0.5)
// Sample path points along the Bezier curve
return sampleBezierCurve(
p0: start, p1: control1, p2: control2, p3: end,
steps: Int(distance(start, end) / 5) // More sample points for longer distances
)
}
// Velocity curve: accelerate → constant → decelerate (mimics human habits)
func easeInOutCubic(_ t: CGFloat) -> CGFloat {
if t < 0.5 {
return 4 * t * t * t
} else {
let f = 2 * t - 2
return 0.5 * f * f * f + 1
}
}
// Move virtual cursor along the path (doesn't affect real mouse)
func animateCursor(along path: [CGPoint], completion: @escaping () -> Void) {
var index = 0
Timer.scheduledTimer(withTimeInterval: 1.0/60.0, repeats: true) { timer in
guard index < path.count else {
timer.invalidate()
completion()
return
}
let t = CGFloat(index) / CGFloat(path.count)
let easedT = self.easeInOutCubic(t)
// Update virtual cursor position (drawn on overlay window)
self.updateVirtualCursor(to: path[index], opacity: easedT > 0.9 ? 1 - (easedT - 0.9) * 10 : 1)
index += 1
}
}
}
11. Universal Methodology Summary
11.1 The Meta-Framework for Problem Solving
┌─────────────────────────────────────────────────────┐
│ Problem Definition │
│ What (what to do) + Why (why it's feasible) │
└─────────────────────┬───────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Information Gathering │
│ Public intelligence + Static analysis + │
│ Dynamic traffic capture │
└─────────────────────┬───────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Decomposition & Parallelism │
│ Split large problems into independent modules, │
│ advance in parallel with multiple AI sessions │
└─────────────────────┬───────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Implementation & Validation │
│ MVP → Comparative Validation → Dog Fooding │
└─────────────────────┬───────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Productization & Release │
│ Package → Publish → Record → Open Source │
└─────────────────────────────────────────────────────┘
11.2 AI Assistance Principles
Providing AI with context is the core human responsibility.
The upper limit of AI capability = the quality of context you provide. When AI gets stuck, the problem isn't that AI isn't capable — it's that the right context is missing.
What AI is good at:
✅ Analyzing existing code/binaries/docs
✅ Processing multiple independent tasks in parallel
✅ Generating code to precise specifications
✅ Rapidly implementing algorithms when reference materials are available
✅ Multimodal analysis of images/videos
What humans need to do:
✅ Decide what to do and what not to do
✅ Collect and provide critical context
✅ Make judgment calls among multiple options
✅ Discover AI blind spots and fill in missing information
✅ Maintain overall direction and pace
11.3 Troubleshooting Path When Stuck
Problem: Implementation doesn't match the original
↓
Step 1: Is there more precise reference material? (traffic capture, reverse engineering, papers)
↓
Step 2: Is the context provided to AI specific enough?
↓
Step 3: Can automated comparative validation be established?
↓
Step 4: Can the binary be directly reverse-engineered to obtain the real algorithm?
12. Complete SOP for Future Similar Projects
Standard operating procedure applicable to all "analyze closed systems, replicate open-source" projects.
Phase 0: Preparation (30 minutes)
# 1. Create new project from harness-template
gh repo create my-project --template your-org/harness-template
cd my-project
# 2. Create docs/ folder, let AI continuously accumulate documentation
mkdir docs
# 3. Clearly decompose objectives
cat > docs/goal.md << EOF
## Objective
Replicate: [Target system name]
## Feature Breakdown
- P0 (must have):
- P1 (important):
- P2 (optional):
## Success Criteria
- [ ] Functional comparative validation passed
- [ ] Performance benchmarks met
- [ ] Releasable state
EOF
Phase 1: Information Gathering (1–2 hours)
# 1. Find target files
find / -name "*[target]*" 2>/dev/null
# 2. Basic analysis
file target_binary
strings target_binary | tee docs/strings.txt
nm target_binary | tee docs/symbols.txt
# 3. Have AI analyze and document in docs/
# Prompt: "Analyze these symbols, infer system architecture, write to docs/architecture.md"
# 4. Collect public intelligence
# - Official blog / documentation
# - GitHub Issues / PR
# - Twitter/X related discussions
# - Academic papers
Phase 2: Dynamic Analysis (1–2 hours)
# Configure traffic capture
pip install mitmproxy
mitmdump -p 8080 --save-stream-file capture.mitm &
# Configure proxy, trigger target features
export HTTPS_PROXY=http://127.0.0.1:8080
# Have AI analyze captured traffic
# Prompt: "Analyze capture.mitm, extract all API interface definitions, write to docs/api-spec.md"
Phase 3: Core Implementation (4–6 hours)
# Open multiple parallel AI sessions
# Session A: Implement core features
# Session B: Implement auxiliary tools
# Session C: Handle edge cases and error handling
# Session D: Write tests
# Each session's context must include:
# - docs/architecture.md (system architecture)
# - docs/api-spec.md (interface definitions)
# - Specific requirements for the current module
Phase 4: Validation Feedback Loop (1–2 hours)
# Establish comparative tests
./scripts/compare.sh "test task description" official open-source
# Dog Fooding
# Use your own tool for development work, discover real-world issues
# Fix → Validate → Iterate
Phase 5: Release (1 hour)
# Package
npm init && npm publish
# or
go build && goreleaser release
# Record a demo video
# Ask AI to recommend royalty-free music sites, download license-free music
ffmpeg -i screen_record.mov -i music.mp3 -shortest output.mp4
# Open Source
gh repo create open-[target-name] --public
git push
Closing Thoughts
"What the AI era changes is only the method of solving problems — but the Geek spirit, the drive to solve problems, remains constant."
- Table of Contents
- The Complete Guide to Reverse Engineering & Open-Source Replication in the AI Era
- A Universal Methodology Using Open Computer Use as a Case Study
- Table of Contents
- 1. Project Background & Problem Definition
- 1.1 What Is Computer Use
- 1.2 Breaking Down the Objective
- 2. Full Technology Stack Overview
- 2.1 Analysis Toolchain
- 2.2 Development Stack
- 2.3 AI Auxiliary Tool Usage
- 3. Phase 1: Reconnaissance & Information Gathering
- 3.1 Goal: Find an Analyzable Entry Point
- Steps
- Key Findings
- Why This Step Matters
- 3.2 Gathering Public Information
- 4. Phase 2: Static Analysis & Reverse Engineering
- 4.1 Extracting the Symbol Table (Swift Binary)
- Key Findings
- 4.2 AI-Assisted Analysis of Decompilation Results
- 4.3 Core Principle: macOS Accessibility API
- What Is the Accessibility API (AX API)?
- Three-Level Fallback Strategy: AX → osascript → CGEvent
- 5. Phase 3: Dynamic Analysis & Traffic Interception
- 5.1 Why Traffic Capture Is Necessary
- 5.2 Configuring mitmdump for HTTPS Man-in-the-Middle Capture
- Installation & Basic Configuration
- Install mitmproxy Root Certificate (Trust HTTPS Decryption)
- Configure Codex to Route Through the Proxy
- 5.3 Recursive Capture: Let Codex Call Itself While Being Captured
- Captured Data Structure (Example)
- 5.4 Writing a mitmdump Filter Script
- 6. Phase 4: Breaking Process Signature Restrictions
- 6.1 Problem Discovery
- 6.2 Diagnosing the Cause
- 6.3 Solution: Signature Inheritance Proxy
- 6.4 Verification of Success
- 7. Phase 5: Core Feature Implementation
- 7.1 Project Structure (Starting from harness-template)
- 7.2 MCP Server Implementation
- 7.3 Core Tool Implementation Examples
- 8. Phase 6: Comparative Validation & Eval Feedback Loop
- 8.1 Designing the Comparative Validation Framework
- 8.2 Multi-Dimensional Comparison Metrics
- 8.3 Dog Fooding: Using Your Own Product for Testing
- 9. Phase 7: Productization & Release
- 9.1 Permission Request UI (Reference from Software.inc Approach)
- 9.2 Publishing to npm
- 9.3 One-Click Integration with Codex MCP
- 9.4 Logo Design (Fully AI-Generated)
- 10. Phase 8: Conquering Visual Details (Mouse Animation Reverse Engineering)
- 10.1 Video Frame Analysis
- 10.2 Keyword Identification & Paper Search
- 10.3 Binary Reverse Engineering to Extract the Algorithm
- 10.4 Mouse Path Algorithm Implementation
- 11. Universal Methodology Summary
- 11.1 The Meta-Framework for Problem Solving
- 11.2 AI Assistance Principles
- 11.3 Troubleshooting Path When Stuck
- 12. Complete SOP for Future Similar Projects
- Phase 0: Preparation (30 minutes)
- Phase 1: Information Gathering (1–2 hours)
- Phase 2: Dynamic Analysis (1–2 hours)
- Phase 3: Core Implementation (4–6 hours)
- Phase 4: Validation Feedback Loop (1–2 hours)
- Phase 5: Release (1 hour)
- Closing Thoughts