How Sidekick Analyzes Binaries¶

Sidekick is not a chatbot that answers questions about assembly. It is an agent that classifies your request, builds a working model of the program, changes the binary database as its understanding develops, and records persistent findings. Understanding how it plans and executes makes it much easier to steer.

This tutorial explains the agentic architecture behind Sidekick: the specialist modes it draws on, the persistent structures it maintains, and the decision points that shape how it handles a given request.

Two categories of action¶

Every action Sidekick takes is one of two kinds:

Attention — reading, querying, searching, modeling. The database does not change.
Transformation — writing names, types, comments, structures. The database changes.

Sidekick always attends before transforming. It will not rename a function without first reading its code. It will not act on a reference you provide (a function, address, or variable) before verifying it exists in the database. If a reference does not resolve, it reports the mismatch and stops rather than guessing a likely substitute.

The three places understanding lives¶

Sidekick keeps its understanding in three different structures, each with a different lifespan and purpose.

The binary database¶

Binary Ninja's representation of the binary: functions, basic blocks, IL instructions, data variables, types, strings, symbols, comments. Dense, concrete, and closed — it contains only what is in the binary itself.

Sidekick's transformations — new names, type definitions, comments, structure layouts — land here, applied atomically and grouped into transactions visible in the Transaction Log.

The behavioral model¶

A sparse, open-world description of what the program does: component roles, data flows, environmental interactions (OS services, protocols, file formats), dependencies, and uncertainties. Every element is anchored to concrete addresses in the database.

Unlike the database, the behavioral model can incorporate knowledge from outside the binary — what a library does, how a protocol works, what a format specifies. It persists across turns and sessions at the workspace level, so Sidekick can return to it rather than rebuilding it from scratch.

The Notebook¶

Where Sidekick tracks goals and outcomes across turns. An entry represents a goal being pursued. Within an entry:

Tasks — forward-looking steps Sidekick intends to do (tasks are plans, never backfilled as progress logs).
Outcomes — persistent findings anchored to evidence. Three kinds:
Finding — an anchored fact ("the function at 0x401500 decrypts config using XOR with key 0x3A")
Artifact — a concrete deliverable (a YARA rule, a report, a PoC)
Blocker — an external obstacle halting progress (missing symbols, unavailable runtime)

Outcomes carry a status (draft, verified, rejected) and are re-checked automatically when their evidence changes. If an outcome gets invalidated, dependent outcomes are revisited too.

Sidekick creates notebook entries only when work spans multiple turns with a nameable goal. Quick lookups don't open an entry.

How Sidekick classifies a request¶

The first thing Sidekick does with your message is decide what kind of work it is. The four top-level categories:

Transform — improve the binary's representation: rename, retype, annotate, restructure. Default when you point at code and ask for cleanup.
Investigate — answer a question using evidence from the binary: find, explain, trace.
Reference — a documentation lookup against Binary Ninja or Sidekick's own docs.
Script — use scripting or automation to do something novel; opens only when you ask for it.

The classification informs how much Sidekick delegates, whether it opens a notebook entry, and which skills it loads. For a broad investigation ("reverse engineer the crypto subsystem"), expect a notebook entry and multi-turn execution. For a targeted lookup ("what does decrypt_string take as arguments?"), expect a direct answer.

You can steer this explicitly: "just answer the question" pulls Sidekick toward quick action; "investigate systematically" pushes it toward a tracked effort.

Specialist modes¶

Sidekick keeps the analytical judgment to itself but delegates mechanical work to specialist modes. Each specialist has a narrow scope and a bounded posture — they do one kind of thing well and stop cleanly when they are done. When Sidekick delegates, you see a named tool call in chat with a structured result.

Research — multi-hop searches across the binary: graph walks through unnamed code, semantic pattern queries, finding everything that fits a structural or behavioral shape. Pure attention; never modifies the database, never interprets what it finds. Returns an address-keyed report with the evidence that made each result a match.

Transform — code recovery at scale. Renames default identifiers (sub_*, var_*, arg_*), reconstructs struct layouts, corrects function signatures, identifies and documents deobfuscation patterns (XOR decryption, control-flow flattening, API hashing). Reads before writing — always. Returns a change report with the basis for each rename and an explicit "not changed" section that says what was skipped and why.

Repair Analysis — fixes broken local analysis: wrong function boundaries, bad imported signatures, unresolved indirect calls, broken cross-references, misclassified code/data. Sidekick invokes this when downstream work is blocked by a substrate failure. The repair stays bounded — it fixes the local root cause rather than fanning out into unrelated regions.

Modeling — builds the behavioral model described above. Delegated when the model would be too large to construct in chat or when the analysis will span many turns. Each invocation extends the existing model rather than rebuilding it, marking new and updated elements explicitly.

Debugger — drives Binary Ninja's debugger to observe runtime behavior: breakpoints at targeted addresses, register and memory inspection, steered execution to reach an observation goal, extraction of runtime data (decrypted buffers, resolved API tables). Mutation is minimized — each one potentially changes the behavior being observed. Every mutation has a stated observational goal.

Automation — authors a reusable Library item. The artifact may be a Python script, a local agent, a skill, or a small bundle of supporting files. Use this when an analysis should be repeatable, parameterized, or iterated across multiple binaries. The Automation mode validates what it writes (Python entry points run through a type check) but does not execute scripts itself.

Sidekick delegates only when it gains one of three benefits: context isolation (the specialist absorbs a lot of reads the root does not need to keep), specialization (the specialist's posture differs materially from general-purpose handling), or parallelism (independent subtasks run concurrently). Otherwise Sidekick handles work inline — a single targeted query does not need a research delegation.

Resumable sessions¶

Some specialists maintain session state and return a thread ID you can continue:

Debugger sessions hold breakpoints, process state, and observations.
Automation threads hold in-progress Library item authoring — you can refine a script or agent incrementally.

Reference the thread in a follow-up ("continue the debugger session and capture what happens after the bind call") and Sidekick resumes that work rather than starting fresh.

Alongside Chat, Sidekick exposes single-shot operations at the cursor through the Suggest menu. Each one fixes goal, scope, and skill in advance, then runs a minimal harness over the local region:

Repair Analysis — detects and fixes broken local analysis around the cursor. Does not add names, types, or comments beyond what is needed to make the substrate trustworthy.
Suggest Names — recovers meaningful names for default-named identifiers: the function itself, its variables, parameters, default-named callees, and referenced data. Skips uncertain names rather than guessing.
Suggest Types — recovers structure definitions, field names, and applies types. Infers structures from pointer arithmetic, allocation patterns, and library function parameters.
Suggest Comments — adds comments for non-obvious code patterns. Only comments genuinely non-obvious things — over-commenting is worse than under-commenting.

Before applying names, types, or comments, the Suggest operations remediate obviously broken local analysis first rather than recovering on top of bad substrate. You can also invoke Repair Analysis directly when only the substrate needs fixing.

A related affordance, Recover Import Types, fills in signatures for imported symbols whose types are unresolved. It checks curated type libraries first, then infers signatures from known APIs (POSIX libc, Win32, OpenSSL, zlib, etc.) by validating against call-site usage. Recovered types are applied at a confidence level that lets a later-loaded curated type library still override them.

The attention-transformation cycle¶

Within any non-trivial task, Sidekick works in a disciplined loop:

Attend — read the relevant code, resolve names, find callers and callees, check for substrate problems.
Assess dependencies — identify default-named entities, missing types, unresolved references that block the goal.
Reason — form conclusions about what the code does and what the next transformation should be.
Transform — apply database changes (names, types, comments, structures).
Record — add outcomes to the notebook when they pass three tests: reuse (will a future turn reference this?), anchor (does it tie to an address or URI?), non-redundancy (is it already carried by a database change or chat reply?).
Continuation check — before ending the turn, ask whether anything is known but not yet persisted, and whether the goal is actually satisfied.

Sidekick does not reason through sub_* names it has identified as goal-relevant. It renames first, then reasons over named code. The renaming step makes the later reasoning more accurate and, when summarized back to you, easier to follow.

Skills: reusable playbooks¶

A skill is a reusable procedure keyed to a goal and an artifact kind — "hunt for use-after-free vulnerabilities in a C++ binary," "triage ransomware," "recover types in a stripped Go binary." Skills live in a curated catalog and carry checklists, tactical patterns, and decision points specific to that combination.

Sidekick maintains a small set of enabled skills for your workspace and imports a matching one when your goal activates it. You influence this by being specific about what you are analyzing ("this looks like ransomware", "I'm hunting for use-after-free", "this is a firmware image from a router"). The cue shifts which skills are enabled and imported, which in turn shapes how every specialist approaches the work.

The same skill can run inside Chat or inside a specialist — skills are portable procedure; specialists are execution harnesses that host them.

Continuity across turns¶

Sidekick tries to feel like one continuous analyst, not a fresh conversation every time.

Briefing — each new chat opens with a short orientation summary of the current analysis state: active investigations, blockers, recent evidence, open questions. You do not have to re-explain context.
Notebook — outcomes carry across turns and sessions. Validation runs in the background, so a finding that gets contradicted by later evidence transitions to rejected rather than silently staying true.
Behavioral model — persists at the workspace level. Each modeling pass extends the existing model.
Enabled skills — adjust as the analysis focus shifts, but prefer stability over churn.

When Sidekick has the context it needs, it uses it. When it doesn't (new workspace, fresh binary, first conversation), it builds it.

The Transaction Log¶

Every database change is a transaction. The Transaction Log lists them, shows what each one modified, and lets you undo. Transaction entries also appear in chat next to the agent action that produced them, so you can audit exactly what a Suggest operation or Transform pass did.

Working with Sidekick effectively¶

The Chat sidebar is the full agent — classification, delegation, notebook, modeling, continuation. Use Chat when:

You want to understand what a function or subsystem does
You are investigating a specific behavior or vulnerability class
You want to ask follow-up questions and steer the analysis
You need Sidekick to cross-reference findings across multiple functions

Suggest menu items are fixed-scope, single-shot. Use them when:

You want quick coverage of the function you are currently looking at
You want to apply names, types, or comments without opening a full investigation
The substrate around the cursor looks broken and you want to fix it before doing other recovery work

References and @-mentions¶

Sidekick resolves references you provide before acting on them. Refer to functions by name or address:

"What does sub_401200 do?"
"Trace the call chain from 0x401000 to any network I/O."
"Rename everything reachable from parse_config."

Default-named functions are valid references. If the reference does not resolve, Sidekick reports the mismatch rather than guessing a similar name.

Steering scope¶

You can guide the quick-action vs. tracked-effort decision:

For a broad investigation, expect a notebook entry and multi-turn execution: "reverse engineer the crypto subsystem end-to-end."
For a targeted lookup, expect a direct answer: "what's the third argument to send_packet?"
Mid-investigation, narrow or widen: "focus only on the decryption routine, skip everything else," or "expand this to cover all callers of process_message."

Continuing sessions¶

When Sidekick returns a thread ID for a debugger or automation session, reference it in follow-ups rather than starting over:

"Continue the debugger session — let the network call complete and capture the response."
"Update the automation to also handle indirect calls."

Combining specialists¶

The most effective pattern is often: let Sidekick do an initial code-recovery pass (Transform) on the region you care about, fix substrate problems it surfaces (Repair Analysis), then use Chat to reason about what the cleaned-up code reveals. The specialists handle mechanical recovery; Chat handles interpretation.

Tip

Tell Sidekick what you're looking at. "This is a Linux malware sample, likely a cryptominer" or "I'm reversing a custom parser for a proprietary file format" steers skill selection and scopes the analysis far more effectively than a generic "analyze this binary."