Documentation Index
Fetch the complete documentation index at: https://fileguard.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
FileGuard provides comprehensive file validation to ensure only legitimate, safe files are stored.
Validation Pipeline
Extension Check
Verify file extension is in context’s allowed_extensions
Size Check
Verify file size is within context’s max_file_size_mb
Blank File Detection
Check for meaningful content (if reject_blank_files enabled)
Corrupt File Detection
Verify file integrity (if reject_corrupt_files enabled)
Virus Scanning
Scan with ClamAV (if scan_for_viruses enabled)
Blank File Detection
When reject_blank_files is enabled, FileGuard detects and rejects files with no meaningful content.
| File Type | What’s Considered “Blank” |
|---|
| PDF | No text (3+ letter words) and no embedded images |
| Images | Single color or dimensions less than 2x2 pixels |
| Excel | No data rows in any worksheet |
| CSV | No data rows (or only headers) |
| Text | No words (3+ letters) |
| JSON | Empty {}, [], null, or no meaningful values |
| XML | Only empty root element, no children or text |
| DOCX | No paragraphs or tables with content |
| ZIP | Empty archive (no files) |
| MP3/MP4 | Zero duration |
| Other | Only rejects 0-byte files |
Example Error Messages
{
"errors": [
"File appears to be blank/empty: PDF has no readable content"
]
}
{
"errors": [
"File appears to be blank/empty: Image contains no meaningful content (single color)"
]
}
Corrupt File Detection
When reject_corrupt_files is enabled, FileGuard validates file integrity.
| File Type | Integrity Checks |
|---|
| PDF | Valid PDF structure, parseable |
| Images | Valid image data, decodable |
| Excel | Valid ZIP-based format |
| CSV | Valid UTF-8, parseable |
| JSON | Valid JSON syntax |
| XML | Valid XML structure |
| DOCX | Valid Word format |
| ZIP | Archive integrity, CRC check |
| MP3 | Valid audio frames |
| MP4 | Valid container structure |
| All | File signature (magic bytes) matches extension |
Magic Bytes Validation
FileGuard checks that file signatures match declared extensions:
| Extension | Expected Magic Bytes |
|---|
| PDF | %PDF |
| JPEG | \xFF\xD8\xFF |
| PNG | \x89PNG |
| ZIP/DOCX/XLSX | PK |
| MP3 | ID3 or \xFF\xFB |
| MP4 | ftyp |
Example Error Messages
{
"errors": [
"File is corrupt or unreadable: Invalid PDF file: file signature does not match"
]
}
{
"errors": [
"File is corrupt or unreadable: Invalid JSON format: Expecting property name"
]
}
Virus Scanning (ClamAV)
When scan_for_viruses is enabled, files are scanned using ClamAV.
How It Works
- File passes all other validations
- File content sent to ClamAV daemon
- If threat detected, upload rejected immediately
- Scan results logged in API call logs
Example Error Messages
{
"errors": [
"File contains malware: Eicar-Test-Signature"
]
}
{
"errors": [
"File contains malware: Win.Trojan.Generic-12345"
]
}
Testing Virus Scanning
Use the EICAR test file (detected by all antivirus engines):
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
The EICAR test file is harmless but will trigger virus detection.
Supported File Types
| Category | Extensions | Library |
|---|
| Documents | pdf, docx | pypdf, python-docx |
| Images | jpg, jpeg, png, gif, webp | Pillow |
| Spreadsheets | xlsx, csv | openpyxl, built-in |
| Data | json, xml | Built-in |
| Archives | zip | Built-in |
| Media | mp3, mp4 | mutagen |
| Text | txt | Built-in |
Disabling Validation
Disable validation for specific use cases:
{
"context_key": "raw_uploads",
"reject_blank_files": false,
"reject_corrupt_files": false,
"scan_for_viruses": false
}
Disabling validation reduces security. Only do this for specific use cases like encrypted files or proprietary formats.
Fail-Safe Design
FileGuard uses a fail-safe approach:
- If validation cannot be performed (e.g., missing library), the file is rejected
- This ensures no potentially harmful files slip through due to errors
- Errors are logged for debugging