【中級〜上級者向け】正規表現でデジタルフォレンジックを加速する｜ログ解析・マルウェア検出・KAPE連携の実践ガイド

デジタルフォレンジック調査において、膨大なログやファイル群から「異常な痕跡」を素早く見つけ出す能力は、調査の速度と精度を決定づけます。その強力な武器となるのが 正規表現（Regular Expression / Regex） です。本記事では、Windowsイベントログ・syslog・ファイルシステムアーティファクトを対象に、実務で即使える正規表現パターンと、KAPE・EZTools・PowerShell・Python との具体的な連携手法を体系的に解説します。

1. フォレンジックにおける正規表現の位置づけ
1. 1-1. なぜ正規表現が必要か
2. 1-2. フォレンジックで使う主なエンジンと構文の差異
2. 正規表現の基本構文チートシート（フォレンジック特化版）
3. Windowsイベントログの正規表現解析
4. syslog・Webサーバーログの正規表現解析
1. 4-1. syslogからの異常検出（Linux/WSL環境）
2. 4-2. Apache / Nginx ログからの C2 通信・スキャン検出
5. マルウェア・不審ファイルの検出に正規表現を使う
6. KAPEカスタムModuleへの正規表現スクリプト統合
1. 6-1. 正規表現解析スクリプトをModuleとして組み込む
2. 6-2. 出力をTimeline Explorer用にフォーマットする
7. Splunk SPL での正規表現活用
8. 正規表現ライブラリの管理と再利用
1. 8-1. パターンをYAMLで一元管理する
まとめ：正規表現をフォレンジックの「フィルタ層」として設計する
参考リソース

1. フォレンジックにおける正規表現の位置づけ

1-1. なぜ正規表現が必要か

KAPE の !EZParser や EZTools が生成する CSV は整形されており非常に扱いやすいですが、それでも以下のような場面では正規表現なしに効率的な調査は難しくなります。

数十万行のイベントログから特定パターンのログオン失敗だけを抽出したい
syslog や Apache ログから C2（コマンド＆コントロール）通信を示す不審な URL を検出したい
ファイル名・パス・レジストリ値から Living-off-the-Land（LotL）攻撃の痕跡パターンを見つけたい
マルウェアのドロッパーが使う Base64 エンコード文字列をバイナリやログから拾い出したい

正規表現は「検索条件の抽象化」であり、単純なキーワード検索では対応できない「パターン」を指定できる点が最大の強みです。

1-2. フォレンジックで使う主なエンジンと構文の差異

同じ正規表現でも、使うツールによって挙動が微妙に異なります。代表的なエンジンと注意点を整理します。

ツール／環境	正規表現エンジン	注意点
PowerShell（Select-String）	.NET Regex	後読み・先読みに対応。`-Pattern` に渡す
Python（re / regex モジュール）	PCRE互換	最も柔軟。`re.IGNORECASE` などのフラグが便利
grep / egrep（Linux/WSL）	POSIX ERE	`-P` フラグで Perl 互換に切り替え可能
Timeline Explorer（検索欄）	.NET Regex	列フィルタに正規表現を直接入力できる
Splunk（SPL）	PCRE	`rex` コマンドでフィールド抽出、`regex` コマンドでフィルタ
Elasticsearch（Kibana）	Lucene Regex	一部の構文（先読みなど）が非サポート

本記事ではPowerShellとPythonを主軸に解説し、Splunk への応用も併せて紹介します。

2. 正規表現の基本構文チートシート（フォレンジック特化版）

基礎は理解している前提で、フォレンジック調査で頻出するパターンを中心に整理します。

2-1. IPアドレス・URLの抽出

# IPv4アドレスの抽出（厳密版）
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

# プライベートアドレス範囲の除外（外部通信のみ抽出）
(?!10\.|172\.(?:1[6-9]|2\d|3[01])\.|192\.168\.)
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

# URLの抽出（http/https/ftp）
https?://[\w\-\.]+(?:\:\d+)?(?:/[\w\-\./\?=%&+#]*)?

# Base64エンコード文字列の抽出（40文字以上）
[A-Za-z0-9+/]{40,}={0,2}

2-2. Windowsパス・ファイル名のパターン

# TEMP/TMP配下の実行ファイル（LotL攻撃の典型的な配置場所）
(?i)[A-Za-z]:\\(?:Users\\[^\\]+\\AppData\\(?:Local|Roaming)|Windows\\Temp|Temp)\\[^\\]+\.(?:exe|dll|bat|ps1|vbs|hta|cmd)

# System32以外からのpowershell.exe実行（なりすまし検出）
(?i)(?|]+)*

2-3. タイムスタンプのパターン

# Windowsイベントログのタイムスタンプ形式
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z?

# syslogのタイムスタンプ形式（RFC3164）
(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}

# 特定時間帯のみ抽出（例：業務時間外の00:00〜06:00）
T(?:0[0-5]):\d{2}:\d{2}

3. Windowsイベントログの正規表現解析

3-1. PowerShell + Select-String による高速フィルタリング

KAPE の EvtxECmd モジュールで生成された CSV（または生の .evtx をエクスポートしたテキスト）に対して、PowerShell の Select-String を使うことで、数十万行のログを数秒でフィルタリングできます。

# ブルートフォース攻撃の検出：短時間に同一IPからのEventID 4625（ログオン失敗）を抽出
$logPath = "D:\mout\EventLogs\Security_4625.csv"

# 外部IPからの失敗のみ（プライベートアドレスを除外）
Select-String -Path $logPath `
  -Pattern '(?!10\.|172\.(?:1[6-9]|2\d|3[01])\.|192\.168\.)\b(?:\d{1,3}\.){3}\d{1,3}\b' |
  Select-Object -ExpandProperty Line |
  ConvertFrom-Csv |
  Group-Object IpAddress |
  Where-Object { $_.Count -gt 10 } |
  Sort-Object Count -Descending |
  Format-Table Name, Count -AutoSize

# EventID 4688（プロセス作成）から不審な子プロセスを抽出
# cmd.exe / powershell.exe の親プロセスが Office 系の場合を検出
$csv = Import-Csv "D:\mout\EventLogs\Security_4688.csv"
$csv | Where-Object {
    $_.NewProcessName -match '(?i)(cmd|powershell|wscript|cscript|mshta)\.exe$' -and
    $_.ParentProcessName -match '(?i)(winword|excel|powerpnt|outlook)\.exe$'
} | Select-Object TimeCreated, SubjectUserName, ParentProcessName, NewProcessName, CommandLine

3-2. 重要なEventIDと対応する正規表現パターン

EventID	意味	検出に有効な正規表現の観点
4624	ログオン成功	LogonType=3（ネットワーク）かつ業務時間外のタイムスタンプ
4625	ログオン失敗	外部IPアドレスから短時間多発（ブルートフォース）
4688	プロセス作成	LOLBin名・疑わしい引数パターン（-EncodedCommand など）
4698	スケジュールタスク作成	TaskName が乱数・意味不明な文字列に一致するパターン
7045	サービス作成	ImagePath が TEMP 配下や UNCパスを示すパターン
4720	ユーザーアカウント作成	作成日時が業務時間外・作成者が通常の管理者以外

3-3. エンコードされたPowerShellコマンドの検出と復号

攻撃者がよく使う -EncodedCommand（または -enc）オプションは、Base64エンコードされたコマンドを実行するためのフラグです。EventID 4688 の CommandLine フィールドから抽出し、デコードするワンライナーを示します。

# EncodedCommand の Base64 部分を抽出してデコードする
$csv = Import-Csv "D:\mout\EventLogs\Security_4688.csv"
$encoded = $csv | Where-Object {
    $_.CommandLine -match '(?i)-(?:en(?:c(?:odedcommand)?)?)\s+([A-Za-z0-9+/=]{20,})'
} | ForEach-Object {
    if ($_.CommandLine -match '(?i)-(?:en(?:c(?:odedcommand)?)?)\s+([A-Za-z0-9+/=]{20,})') {
        $b64 = $Matches[1]
        $decoded = [System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($b64))
        [PSCustomObject]@{
            TimeCreated = $_.TimeCreated
            UserName    = $_.SubjectUserName
            Encoded     = $b64
            Decoded     = $decoded
        }
    }
}
$encoded | Export-Csv "D:\analysis\decoded_powershell.csv" -NoTypeInformation -Encoding UTF8

4. syslog・Webサーバーログの正規表現解析

4-1. syslogからの異常検出（Linux/WSL環境）

Linux サーバーの syslog（/var/log/auth.log など）を調査する場合、grep -P（Perl互換正規表現）と awk を組み合わせると高速に処理できます。

# SSH ブルートフォースの検出：1分間に10回以上失敗した送信元IPを抽出
grep -P 'Failed password for' /var/log/auth.log | \
  grep -oP '\d{1,3}(?:\.\d{1,3}){3}' | \
  sort | uniq -c | sort -rn | \
  awk '$1 >= 10 {print $2, ":", $1, "failures"}'

# Sudoの不審な実行：通常ユーザーからのroot昇格を時刻付きで抽出
grep -P 'sudo:.*COMMAND' /var/log/auth.log | \
  grep -P 'TTY=(?!pts/0)' | \
  grep -oP '^\w{3}\s+\d+\s+\d{2}:\d{2}:\d{2}.*?(?=\s*$)'

4-2. Apache / Nginx ログからの C2 通信・スキャン検出

# Pythonによるwebアクセスログ解析スクリプト
import re
import sys
from collections import defaultdict

LOG_PATTERN = re.compile(
    r'(?P<ip>\d{1,3}(?:\.\d{1,3}){3})\s+-\s+-\s+'
    r'\[(?P<time>[^\]]+)\]\s+'
    r'"(?P<method>\w+)\s+(?P<path>[^\s]+)\s+[^"]+"\s+'
    r'(?P<status>\d{3})\s+(?P<size>\d+)'
)

# 検出したいパターン群
SUSPICIOUS_PATTERNS = {
    "SQLインジェクション試行":   re.compile(r"(?i)(?:union\s+select|or\s+1=1|'--|\bxp_cmdshell\b)"),
    "パストラバーサル試行":       re.compile(r"(?:\.\.\/|\.\.\\|%2e%2e%2f|%252e)"),
    "シェルアップロード試行":     re.compile(r"(?i)\.(php|asp|aspx|jsp|cgi)\b.*(?:cmd|exec|system|shell)"),
    "スキャナーUA":              re.compile(r"(?i)(?:nikto|sqlmap|nmap|masscan|zgrab|dirbuster)"),
    "不審なエンコード":           re.compile(r"%(?:00|0d|0a|27|3c|3e|7c|60)"),
}

hits = defaultdict(list)

with open(sys.argv[1], "r", errors="replace") as f:
    for line in f:
        m = LOG_PATTERN.search(line)
        if not m:
            continue
        path = m.group("path")
        ip   = m.group("ip")
        time = m.group("time")
        for label, pat in SUSPICIOUS_PATTERNS.items():
            if pat.search(line):
                hits[label].append({"ip": ip, "time": time, "path": path})

for label, entries in hits.items():
    print(f"\n=== {label} ({len(entries)} hits) ===")
    for e in entries[:5]:  # 上位5件表示
        print(f"  {e['time']} | {e['ip']} | {e['path']}")

5. マルウェア・不審ファイルの検出に正規表現を使う

5-1. ファイル名・パスの異常パターン検出

マルウェアはしばしば正規ファイルに偽装したファイル名を使います。以下は代表的な偽装パターンとその検出正規表現です。

# PowerShell：プリフェッチCSVから不審な実行ファイル名を抽出
$csv = Import-Csv "D:\mout\Prefetch\*.csv"
$suspiciousPatterns = @(
    '(?i)^[a-z0-9]{8}\.exe$',               # ランダム8文字（マルウェアドロッパーに多い）
    '(?i)svchost\.exe$(?<!\\system32\\)',   # system32以外のsvchost.exe
    '(?i)(?:tmp|temp)\d{3,}\.exe',           # temp+数字パターン
    '(?i)[a-z]{1,3}\d{4,}\.exe',             # 短い文字+長い数字
    '(?i)(?:update|install|setup)\d+\.exe',  # 偽インストーラ
    '(?i)\.exe\.(?:txt|pdf|docx?)$'          # 二重拡張子
)

$combined = "(" + ($suspiciousPatterns -join "|") + ")"
$csv | Where-Object {
    $_.SourceFilename -match $combined
} | Select-Object SourceFilename, LastRun, RunCount, Volume0Name |
  Export-Csv "D:\analysis\suspicious_prefetch.csv" -NoTypeInformation -Encoding UTF8

5-2. レジストリの永続化メカニズム検出

KAPE + RECmd で出力したレジストリ CSV に対して、正規表現で自動起動エントリの異常を検出します。

# Python：Runキー CSVから不審な自動起動エントリを検出
import pandas as pd
import re

df = pd.read_csv("D:\\mout\\Registry\\RunKeys.csv")

SUSPICIOUS_RUN = re.compile(
    r"""(?ix)
    (?:
        # TEMP/TMP配下の実行ファイル
        [a-z]:\\(?:users\\[^\\]+\\appdata\\(?:local|roaming)|windows\\temp|temp)\\[^\\]+\.(?:exe|dll|bat|ps1|vbs|hta)
        |
        # rundll32経由での不審なDLL読み込み
        rundll32\.exe\s+[a-z]:\\(?!windows)
        |
        # regsvr32のスクリプトレット悪用（Squiblydoo）
        regsvr32.*\/[siu].*(?:http|\\\\)
        |
        # mshta経由のスクリプト実行
        mshta(?:\.exe)?\s+(?:http|vbscript|javascript)
        |
        # wscript/cscriptのTEMP配下実行
        (?:wscript|cscript)\.exe\s+[a-z]:\\(?:users\\[^\\]+\\appdata|windows\\temp)
        |
        # Base64エンコードされたコマンド
        -(?:en(?:c(?:odedcommand)?)?)?\s+[A-Za-z0-9+/]{30,}
    )
    """, re.IGNORECASE | re.VERBOSE
)

hits = df[df["ValueData"].str.contains(SUSPICIOUS_RUN, na=False, regex=True)]
print(f"検出数: {len(hits)}")
print(hits[["HivePath", "KeyPath", "ValueName", "ValueData"]].to_string())

5-3. メモリダンプ・バイナリファイルからの IoC 抽出

strings コマンド（または Python の strings ライブラリ）でバイナリから文字列を抽出し、正規表現で IoC を検出するパターンです。

import re
import subprocess
import sys

# stringsコマンドで可読文字列を抽出
result = subprocess.run(["strings", "-n", "8", sys.argv[1]],
                        capture_output=True, text=True, errors="replace")
strings_output = result.stdout

# IoCパターン辞書
IOC_PATTERNS = {
    "IPv4":          re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'),
    "URL":           re.compile(r'https?://[\w\-\.]{4,}(?:/[\w\-\./?=%&+#]*)?'),
    "ドメイン":       re.compile(r'\b(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+(?:com|net|org|ru|cn|xyz|top|pw|tk)\b'),
    "メールアドレス": re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}'),
    "Mutex候補":     re.compile(r'\b[A-Z][a-z]{2,8}[A-Z][a-z]{2,8}[0-9]{2,6}\b'),
    "Base64文字列":  re.compile(r'[A-Za-z0-9+/]{40,}={0,2}'),
    "Registry Run": re.compile(r'(?i)SOFTWARE\\(?:Microsoft\\Windows\\CurrentVersion\\)?Run(?:Once)?'),
}

print(f"解析対象：{sys.argv[1]}\n")
for label, pat in IOC_PATTERNS.items():
    matches = list(set(pat.findall(strings_output)))
    if matches:
        print(f"[{label}] {len(matches)} 件")
        for m in matches[:10]:
            print(f"  {m}")
        if len(matches) > 10:
            print(f"  ... 他 {len(matches)-10} 件")

6. KAPEカスタムModuleへの正規表現スクリプト統合

6-1. 正規表現解析スクリプトをModuleとして組み込む

セクション3〜5で紹介したPowerShell・Pythonスクリプトを KAPE の Module として統合することで、収集（Target）→ パース（!EZParser）→ 正規表現解析（カスタムModule） という一気通貫のワークフローを構築できます。

Name: RegexTriage_Suspicious
Description: EZParser出力CSVに対して正規表現で不審な痕跡を自動検出するトリアージModule
Author: YourName
Version: 1.0
Id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

ExternalProcesses:
  -
    Executable: powershell.exe
    CommandLine: >
      -ExecutionPolicy Bypass -File
      %kapeDirectory%\Modules\Scripts\Invoke-RegexTriage.ps1
      -InputDir %sourceDirectory%
      -OutputDir %destinationDirectory%\RegexHits
    ExportFormat: csv
    ExportPath: RegexTriage

この Invoke-RegexTriage.ps1 の中に、プリフェッチ・イベントログ・レジストリの正規表現チェックをまとめて実装しておくことで、KAPEの「Execute!」ボタンを1回押すだけで収集から不審痕跡の抽出まで自動完結します。

6-2. 出力をTimeline Explorer用にフォーマットする

正規表現解析の結果をTimeline Explorerで時系列確認できる形式に整形するには、CSVの先頭列にタイムスタンプを統一フォーマット（yyyy-MM-dd HH:mm:ss）で含めることが重要です。

# PowerShell：解析結果をTimeline Explorer互換CSVに変換
$results = @()  # 正規表現ヒット結果の配列

$results | Select-Object `
    @{N="Timestamp"; E={ $_.TimeCreated -replace 'T',' ' -replace '\.\d+Z','' }},
    @{N="Source";    E={ "RegexTriage" }},
    @{N="Category";  E={ $_.DetectionType }},
    @{N="Detail";    E={ $_.CommandLine }},
    @{N="Host";      E={ $env:COMPUTERNAME }} |
  Export-Csv "D:\analysis\timeline_regex_hits.csv" -NoTypeInformation -Encoding UTF8

7. Splunk SPL での正規表現活用

KAPE 出力を Splunk に取り込んだ後、SPL の rex コマンドと regex コマンドを使うことで、さらに柔軟な分析が可能になります。

7-1. rex コマンドによるフィールド抽出

| index=forensics sourcetype=kape_ezparser
| rex field=CommandLine "(?i)-enc(?:odedcommand)?\s+(?P<b64_cmd>[A-Za-z0-9+/]{20,}={0,2})"
| where isnotnull(b64_cmd)
| table _time, Computer, SubjectUserName, CommandLine, b64_cmd

7-2. 統計を使った異常検出

| index=forensics EventID=4625
| rex field=IpAddress "(?P<src_ip>\d{1,3}(?:\.\d{1,3}){3})"
| where NOT match(src_ip, "^(?:10\.|172\.(?:1[6-9]|2\d|3[01])\.|192\.168\.)")
| stats count AS failures, earliest(_time) AS first_seen, latest(_time) AS last_seen by src_ip
| where failures > 20
| eval duration_min=round((last_seen - first_seen)/60, 1)
| sort -failures

7-3. regex コマンドによるLotLバイナリの検出

| index=forensics EventID=4688
| regex CommandLine="(?i)(?:certutil.*-decode|regsvr32.*/[sS].*scrobj|mshta.*(?:vbscript|javascript)|rundll32.*javascript:)"
| table _time, Computer, SubjectUserName, ParentProcessName, NewProcessName, CommandLine

8. 正規表現ライブラリの管理と再利用

8-1. パターンをYAMLで一元管理する

調査を重ねるごとに有効なパターンが蓄積されます。これをYAMLファイルで管理し、各スクリプトから読み込む設計にすることで、チーム全体で再利用・改善できるライブラリになります。

# forensic_patterns.yaml
patterns:
  encoded_powershell:
    regex: '(?i)-enc(?:odedcommand)?\s+([A-Za-z0-9+/]{20,}={0,2})'
    description: "Base64エンコードされたPowerShellコマンドの検出"
    mitre_technique: "T1059.001"
    severity: high

  lolbin_execution:
    regex: '(?i)(?:certutil.*-decode|regsvr32.*/[sS].*scrobj|mshta.*(?:vbscript|javascript))'
    description: "Living-off-the-Land バイナリの不審な使用"
    mitre_technique: "T1218"
    severity: high

  temp_executable:
    regex: '(?i)[A-Za-z]:\\(?:users\\[^\\]+\\appdata|windows\\temp|temp)\\[^\\]+\.(?:exe|dll|bat|ps1)'
    description: "TEMP配下の実行ファイル起動"
    mitre_technique: "T1036"
    severity: medium

# Python でYAMLパターンライブラリを読み込む例
import yaml
import re

with open("forensic_patterns.yaml") as f:
    config = yaml.safe_load(f)

patterns = {
    name: {
        "compiled": re.compile(p["regex"], re.IGNORECASE),
        "description": p["description"],
        "severity": p["severity"],
        "mitre": p.get("mitre_technique", "")
    }
    for name, p in config["patterns"].items()
}

def scan_line(line):
    hits = []
    for name, p in patterns.items():
        if p["compiled"].search(line):
            hits.append({"pattern": name, "severity": p["severity"], "mitre": p["mitre"]})
    return hits

このライブラリを Git リポジトリで管理し、新たな攻撃手法が確認されるたびにパターンを追加・更新していくことで、組織のフォレンジック対応力を継続的に向上させることができます。MITRE ATT&CK のテクニック番号を紐付けておくと、報告書作成時にも活用できます。

まとめ：正規表現をフォレンジックの「フィルタ層」として設計する

本記事の内容を整理します。

エンジンの理解：PowerShell（.NET）・Python（PCRE）・Splunk（PCRE）・grep（POSIX ERE）の差異を意識してパターンを書き分ける
イベントログ解析：EventID ごとの着目フィールドを把握し、ブルートフォース・LotL・エンコードコマンドを正規表現で効率的に抽出する
syslog・Webログ：Python スクリプトによるパターン辞書スキャンで、C2通信・スキャン・インジェクション試行を自動検出する
マルウェア検出：ファイル名・レジストリ・バイナリ文字列への正規表現適用で IoC を網羅的に抽出する
KAPE統合：正規表現スクリプトをカスタム Module として組み込み、収集からトリアージまでを自動化する
パターンライブラリ化：YAML + Git で正規表現パターンを管理し、MITRE ATT&CK と紐付けてチームで共有・進化させる

正規表現はあくまでフィルタ層であり、万能ではありません。攻撃者は正規表現の「抜け穴」を突いた難読化を用いることがあります。しかし、既知の攻撃パターンを効率よく除外・検出できる「最初の網」として正規表現を使い、残りをより詳細な解析にかけるという階層的アプローチが、現実的かつ効果的なフォレンジック調査の設計指針となります。