Skip to main content

TruffleHog

TruffleHog is an open-source secret-scanning tool that searches filesystems, git repositories, and archives for common secret patterns (API keys, tokens, passwords, private keys, etc.). It runs multiple detectors and can emit structured JSON for automated processing.

Integration

TruffleHog is integrated as an analysis tool under jobs/trufflehog. The tool folder contains a small Python script (trufflehog.py) and a Dockerfile (Dockerfile_trufflehog) that:

  • Installs TruffleHog (via the official install script) into the container.
  • Registers a task handler using the project's tool_registry decorator so the worker listens on a Redis queue for tasks (the queue name is derived from the tool name, e.g. queue_trufflehog).
  • Runs as a WorkerManager process which pulls payloads from Redis and invokes the worker handler for each task.

The worker declares dependencies=["binwalk"], so it expects an earlier tool to have extracted the image filesystem into an extracted directory that the scanner can operate on.

Worker behavior

When a task arrives on the TruffleHog queue the worker code (jobs/trufflehog/trufflehog.py) performs the following actions:

  1. Resolve the provided image path and locate the extracted filesystem directory (the worker expects an extracted directory next to the image path produced by earlier tools such as binwalk).
  2. Invoke the TruffleHog binary in filesystem mode with --json to produce structured output. The worker runs TruffleHog as a subprocess and enforces a timeout to avoid indefinite scans.
  3. Append human-readable stdout/stderr to the job's output.txt so operators and the frontend can inspect raw tool output.
  4. Read TruffleHog's line-delimited JSON output, parse each line, and filter for detection objects (the implementation treats objects containing DetectorName as detections).
  5. For each detection, extract metadata (filesystem file path, line number, detector/decoder names, raw secret values when present) and call DBConnector.insert_trufflehog_result(...) to persist the findings.
  6. Write the structured detections to a file named trufflehog-output.json next to the output file.
  7. Push a result/status message back to Redis on queue_return describing the job, tool, image and final status (success or failure) so the executor can update job state.

Outputs

The worker produces several artifacts that other components can consume:

  • output.txt (appended): a human-readable log containing TruffleHog's stdout and stderr.
  • trufflehog-output.json: a JSON file containing the structured detection objects containing the secrets and the filepaths where they were found.
  • Database rows via DBConnector.insert_trufflehog_result(...): a parsed, queryable representation of each detection (image id, file path, line number, detector/decoder names, and raw secret fields when available).
  • Redis return message on queue_return: a small JSON payload describing the job/tool/image and whether the run finished success/failure so orchestration can continue.