# PII Detection API

RESTful API for secure document PII detection with multi-tenant thread management.

---

## 🚀 **Quick Start**

### **Base URL**
```
http://localhost/redact/api/v1/
```

### **Authentication**
All endpoints (except `thread_create`) require:
- `thread_id` - Unique thread identifier
- `private_key` - Secret key for authentication

---

## 📡 **API Endpoints**

### **1. Create Thread**

Create a new thread and receive authentication credentials.

**Endpoint:** `POST /api/v1/thread_create.php`

**Request:**
```json
{
  "metadata": {
    "user_id": "12345",
    "app_name": "MyApp",
    "description": "Legal document review"
  }
}
```

**Response:**
```json
{
  "success": true,
  "thread_id": "thread_abc123...",
  "private_key": "key_xyz789...",
  "created_at": "2025-12-18 14:30:00",
  "message": "Thread created successfully. Keep the private_key secure!"
}
```

**Status Codes:**
- `201` - Created
- `400` - Bad request
- `500` - Internal error

⚠️ **Important:** Store both `thread_id` and `private_key` securely!

---

### **2. Process File**

Process a document and detect PII.

**Endpoint:** `POST /api/v1/process_file.php`

**Request:**
```json
{
  "thread_id": "thread_abc123...",
  "private_key": "key_xyz789...",
  "file_data": "base64_encoded_file_content",
  "file_name": "document.pdf"
}
```

**Response:**
```json
{
  "success": true,
  "thread_id": "thread_abc123...",
  "processing_time": "12345ms",
  "total_pages": 3,
  "total_pii_instances": 156,
  "comprehend_calls": 45,
  "optimization_rate": 65.2,
  "cache": {
    "before": {
      "cached_words": 15
    },
    "after": {
      "cached_words": 18
    },
    "words_learned": 3
  },
  "pages": [
    {
      "page_number": 1,
      "pii_blocks": [...],
      "word_blocks": [...],
      "image_data": "base64_image"
    }
  ]
}
```

**Status Codes:**
- `200` - Success
- `400` - Bad request (invalid file, missing params)
- `401` - Unauthorized (invalid credentials)
- `500` - Processing error

**Limits:**
- Max file size: 10MB
- Supported formats: PDF, JPG, PNG

---

### **3. Get Thread Info**

Get information about a thread.

**Endpoint:** `GET /api/v1/thread_info.php`

**Parameters:**
- `thread_id` - Thread identifier
- `private_key` - Authentication key

**Example:**
```
GET /api/v1/thread_info.php?thread_id=thread_abc123&private_key=key_xyz789
```

**Response:**
```json
{
  "success": true,
  "thread": {
    "thread_id": "thread_abc123...",
    "created_at": "2025-12-18 14:30:00",
    "last_activity": "2025-12-18 15:45:00",
    "document_count": 3,
    "total_pii_found": 156,
    "total_api_calls": 45,
    "metadata": {...}
  }
}
```

**Status Codes:**
- `200` - Success
- `401` - Unauthorized
- `404` - Thread not found

---

### **4. Delete Thread**

Delete a thread and all associated data.

**Endpoint:** `DELETE /api/v1/thread_delete.php` (or POST)

**Request:**
```json
{
  "thread_id": "thread_abc123...",
  "private_key": "key_xyz789..."
}
```

**Response:**
```json
{
  "success": true,
  "thread_id": "thread_abc123...",
  "message": "Thread and all associated data deleted successfully"
}
```

**Status Codes:**
- `200` - Deleted
- `401` - Unauthorized
- `404` - Thread not found

---

## 🔒 **Security**

### **Authentication**

Each thread has a unique `private_key` that must be provided with all requests:

```javascript
{
  "thread_id": "thread_abc123...",
  "private_key": "key_xyz789..."  // Required!
}
```

### **Best Practices**

1. ✅ **Store credentials securely** (database, encrypted storage)
2. ✅ **Use HTTPS** in production
3. ✅ **Validate file types** before encoding
4. ✅ **Handle errors** gracefully
5. ✅ **Delete threads** when done
6. ❌ **Never expose private_key** in client-side code
7. ❌ **Never commit credentials** to version control

### **CORS**

API allows cross-origin requests. Update `api/config.php` to restrict domains:

```php
header('Access-Control-Allow-Origin: https://yourdomain.com');
```

---

## 📝 **Usage Examples**

### **JavaScript/Node.js**

```javascript
// 1. Create thread
const createResponse = await fetch('http://localhost/redact/api/v1/thread_create.php', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    metadata: {
      user_id: '12345',
      app_name: 'MyApp'
    }
  })
});

const { thread_id, private_key } = await createResponse.json();

// Store these securely!
localStorage.setItem('thread_id', thread_id);
localStorage.setItem('private_key', private_key);

// 2. Process file
const fileInput = document.getElementById('fileInput');
const file = fileInput.files[0];

// Convert to base64
const reader = new FileReader();
reader.onload = async function(e) {
  const base64Data = e.target.result.split(',')[1]; // Remove data:...;base64,
  
  const processResponse = await fetch('http://localhost/redact/api/v1/process_file.php', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      thread_id: thread_id,
      private_key: private_key,
      file_data: base64Data,
      file_name: file.name
    })
  });
  
  const result = await processResponse.json();
  
  if (result.success) {
    console.log('PII found:', result.total_pii_instances);
    console.log('Processing time:', result.processing_time);
    console.log('Cache hit rate:', result.optimization_rate + '%');
  }
};
reader.readAsDataURL(file);

// 3. Delete thread when done
await fetch('http://localhost/redact/api/v1/thread_delete.php', {
  method: 'DELETE',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    thread_id: thread_id,
    private_key: private_key
  })
});
```

---

### **Python**

```python
import requests
import base64

# 1. Create thread
response = requests.post('http://localhost/redact/api/v1/thread_create.php', json={
    'metadata': {
        'user_id': '12345',
        'app_name': 'PythonApp'
    }
})

data = response.json()
thread_id = data['thread_id']
private_key = data['private_key']

# 2. Process file
with open('document.pdf', 'rb') as f:
    file_data = base64.b64encode(f.read()).decode('utf-8')

response = requests.post('http://localhost/redact/api/v1/process_file.php', json={
    'thread_id': thread_id,
    'private_key': private_key,
    'file_data': file_data,
    'file_name': 'document.pdf'
})

result = response.json()

if result['success']:
    print(f"PII found: {result['total_pii_instances']}")
    print(f"Processing time: {result['processing_time']}")
    print(f"Cache hit rate: {result['optimization_rate']}%")

# 3. Delete thread
requests.delete('http://localhost/redact/api/v1/thread_delete.php', json={
    'thread_id': thread_id,
    'private_key': private_key
})
```

---

### **PHP**

```php
<?php

// 1. Create thread
$ch = curl_init('http://localhost/redact/api/v1/thread_create.php');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
    'metadata' => [
        'user_id' => '12345',
        'app_name' => 'PHPApp'
    ]
]));
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']);

$response = curl_exec($ch);
$data = json_decode($response, true);
$threadId = $data['thread_id'];
$privateKey = $data['private_key'];

// 2. Process file
$fileData = base64_encode(file_get_contents('document.pdf'));

curl_setopt($ch, CURLOPT_URL, 'http://localhost/redact/api/v1/process_file.php');
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
    'thread_id' => $threadId,
    'private_key' => $privateKey,
    'file_data' => $fileData,
    'file_name' => 'document.pdf'
]));

$response = curl_exec($ch);
$result = json_decode($response, true);

if ($result['success']) {
    echo "PII found: {$result['total_pii_instances']}\n";
    echo "Processing time: {$result['processing_time']}\n";
}

curl_close($ch);
?>
```

---

## 🧪 **Testing**

### **Test with cURL**

```bash
# 1. Create thread
curl -X POST http://localhost/redact/api/v1/thread_create.php \
  -H "Content-Type: application/json" \
  -d '{"metadata":{"user_id":"test123"}}'

# Response: {"thread_id":"thread_xxx","private_key":"key_yyy",...}

# 2. Process file (with base64 encoded file)
curl -X POST http://localhost/redact/api/v1/process_file.php \
  -H "Content-Type: application/json" \
  -d '{
    "thread_id":"thread_xxx",
    "private_key":"key_yyy",
    "file_data":"'$(base64 -w 0 document.pdf)'",
    "file_name":"document.pdf"
  }'

# 3. Get thread info
curl "http://localhost/redact/api/v1/thread_info.php?thread_id=thread_xxx&private_key=key_yyy"

# 4. Delete thread
curl -X DELETE http://localhost/redact/api/v1/thread_delete.php \
  -H "Content-Type: application/json" \
  -d '{"thread_id":"thread_xxx","private_key":"key_yyy"}'
```

---

## 📊 **Response Format**

### **Success Response**

```json
{
  "success": true,
  "...": "endpoint-specific data"
}
```

### **Error Response**

```json
{
  "success": false,
  "error": "Error message"
}
```

### **Common Status Codes**

| Code | Meaning |
|------|---------|
| `200` | Success |
| `201` | Created |
| `400` | Bad request |
| `401` | Unauthorized |
| `404` | Not found |
| `405` | Method not allowed |
| `500` | Internal error |

---

## 🔧 **Configuration**

Edit `api/config.php`:

```php
define('API_MAX_FILE_SIZE', 10 * 1024 * 1024); // 10MB
define('API_RATE_LIMIT', 100); // requests per hour

// CORS
header('Access-Control-Allow-Origin: *'); // Change in production!
```

---

## 📈 **Performance**

| Metric | Value |
|--------|-------|
| Max file size | 10MB |
| Processing time | 10-60s (depends on pages) |
| Cache optimization | 60-80% fewer API calls |
| Concurrent threads | Unlimited |

---

## ⚠️ **Important Notes**

1. **Private keys are sensitive** - Store securely, never expose
2. **Threads expire after 30 days** of inactivity
3. **Base64 encoding increases size** by ~33%
4. **Large files take time** - implement progress indicators
5. **HTTPS recommended** for production

---

## 📚 **Additional Resources**

- **Thread Management:** `../src/classes/README_THREADS.md`
- **Quick Start:** `../src/classes/QUICK_START_THREADS.md`
- **Class Documentation:** `../src/classes/USAGE.md`

---

**API Version:** 1.0  
**Last Updated:** December 18, 2025

🎉 **Ready for integration!**
