ThinkPad firmware analysis · open toolchain · long-form writeup

ThinkPads From the Inside: A Reproducible Path
From Archived BIOS to Named SoC Pads

A reverse-engineering writeup on turning Lenovo's published BIOS archive into something structured enough to drive coreboot ports, Hackintosh skeletons, GPIO security audits, and CVE-level firmware diffs — without owning the hardware, and without folklore.

Estimated read: ~35 minutes · 18 sections · covers Intel, AMD and (stub) Qualcomm

00At a glance

Where the project stands before you read the long version.

12 / 12
Diverse-sample ACPI extraction
21 / 21
Intel pipeline steps
18 / 18
AMD pipeline steps
9 / 9
BIOS-payload filter
3
SoC vendors supported
15
CVEs mapped
20
Lenovo advisories
18
Models in CVE→model index
~99%
8051 EC opcode coverage
0 / 8
OEM BIOSes w/ embedded SPD

01Prologue: why archived ThinkPad firmware is a treasure trove

The peculiar fact that a manufacturer hands you, for free, the exact bits that drive every machine it has ever sold.

The ThinkPad is one of the few mass-market PC product lines that has accumulated a parallel open-source culture around it: long-running coreboot ports for a handful of Sandy Bridge and Ivy Bridge classics, Linux quirks tables for every model since the T20, Hackintosh ports for the GoBook generation, a quietly enormous body of forum knowledge about exactly which kernel option calms which Lenovo BIOS bug. The reasons are not mysterious — ThinkPads ship with field-replaceable parts, decent keyboards, and BIOSes that the firmware team actually maintains for years — but the consequence is that a lot of independent engineering value lives downstream of Lenovo's "Drivers & Software" pages.

That archive is the part most people forget about. Lenovo Support publishes every BIOS update for every supported model, every driver bundle, every flash utility, organized by machine type and chronological. For a model still under support you get the complete historical chain — every microcode bump, every Intel ME revision, every SMM hardening, every CVE fix — downloadable as a string of .exe packages. For models that have rolled out of support the binaries remain mirrored and discoverable. The same site indexes tens of thousands of binaries that, taken together, are a near-comprehensive record of how Lenovo built each board.

Inside each BIOS package is the part that matters: the device tree the OS sees (ACPI), the firmware volumes (PEI, DXE, SMM), the embedded controller firmware, the Intel FSP remnants, the microcode files, the VBT, the flash descriptor on some models. Everything a coreboot porter needs to bring a new board up; everything a security auditor needs to map a CVE to a binary fix; everything a Hackintosh ports project needs to drive a sensible config.plist. The bits are public. They have been public the whole time.

What is not public is a reproducible pipeline that says: "give me a model number and a BIOS version, and you get back structured JSON describing every device, every GPIO it touches by name, every firmware-internal change since the previous revision, every CVE the changelog mentions, every privacy-sensitive GPIO and its lock posture, every SMI-routable input, every UEFI module that grew or shrank between the last two updates, every signal of where a fix landed in the binary." That is the gap this project sets out to close.

The output of the pipeline is not a beauty-contest reverse engineering of one board. It is a coverage statement: across the archive, the toolchain produces structured data on every model whose firmware can be carved — and as the archive grows, the model count grows with it. Bug-for-bug, the per-model output may still need a human pass before it can drive a real port; but the cost of preparing that human pass drops by an order of magnitude when you start from a clean auto-generated baseline.

What this writeup is not. It is not a beginner's tour of UEFI internals (there are excellent ones — Beyond BIOS from Intel Press, the EDK2 docs, and the Phrack 66 PixieDust series cover the basics far better). It is a focused field report on the specific problems you hit when you try to do this at fleet scale, what worked, what surprised, and what the firmware-only path cannot answer no matter how clever you get.

02Background: three primers in twelve paragraphs

ACPI / DSDT, Intel's GPIO architecture, and the GpioLib indirection — just enough to read the rest.

A. ACPI, DSDT, SSDTs

The Differentiated System Description Table (DSDT) is the central ACPI table. Firmware compiles it from ASL (ACPI Source Language) into AML (ACPI Machine Language) at build time, and the OS interprets the AML at runtime to discover devices, power states, GPIO connections, EC commands, and thermal zones. Every ACPI-capable OS — Linux, Windows, macOS, the BSDs — reads the same DSDT. The DSDT is therefore the most reliable single description of what hardware is present on the board, more reliable than PCI enumeration (because it includes non-PCI devices like the EC, GPIO-attached buttons, sensors), and more reliable than Linux DMI tables (which only carry strings).

Secondary System Description Tables (SSDTs) compile separately and load at runtime to extend or override pieces of the DSDT. Boards with multiple variants typically ship a generic DSDT plus per-variant SSDTs that patch in the right chunks; AcpiPlatform decides which SSDTs to load based on hardware identifiers it reads at boot. A modern ThinkPad commonly ships ten SSDTs alongside the DSDT.

Every device with a hardware GPIO connection has a _CRS (Current Resource Settings) method that returns a serialized list of resource descriptors. For GPIO that descriptor is either a GpioIo (the device can drive or read the pin) or a GpioInt (the pin is wired to the device as an interrupt source). Each carries a controller path (e.g. \_SB.PCI0.GPI0), a pin number, an edge/level/polarity selector, and a pull configuration.

On AMD and Qualcomm ThinkPads, the pin number in GpioIo/GpioInt is a direct integer that maps straight onto an AGPIO or GPIO pad. On Intel it is often a method call — and that is where the trouble starts.

B. The Intel GPIO architecture

Each Intel PCH (from Skylake / Sunrise Point onwards) carries one or more GPIO controllers (community controllers), each owning a number of groups (banks of about 24 pins each). Pins inside a group are referred to as pads: GPP_A22 means "General Purpose Programmable, group A, pad 22". Each pad has two 32-bit configuration registers stored in fixed PCH-memory-mapped space: PADCFG_DW0 and PADCFG_DW1.

PADCFG_DW0 (per pad, 32 bits)
  31      PADRSTCFG[1]    (pad reset config, hi bit)
  30      RXEVCFG[1]      (edge / level select, hi bit)
  29      RXRAW1          (force raw 1 on RX)
  28      RXEVCFG[0]
  27      PREGFRXSEL      (route raw into glitch filter)
  26      RXINV           (invert RX)
  25      GPIROUTIOXAPIC  (route to IOxAPIC)
  24      GPIROUTSCI      (route to SCI)
  23      GPIROUTSMI      (route to SMI)
  22      GPIROUTNMI      (route to NMI)
  21..20  PMODE           (pad mode: 0 = GPIO, 1..3 = native funcs)
  19..18  RXTXENCFG
  17      RXDIS           (RX disable)
  16      TXDIS           (TX disable)
  15..8   reserved
  7..1    reserved
  0       GPIORXSTATE / GPIOTXSTATE

PADCFG_DW1
  31..16  TERM            (termination / pull strength)
  ...
  PADCFG[lock] is held in a separate per-group LOCK register, NOT in DW0/DW1.

The decode of every field above comes from Intel 100-Series PCH Datasheet Volume 2 (document 332691), and from the per-generation datasheets that follow it. Pinning the decode to the official document matters: intelp2m in coreboot is downstream and occasionally lags the register layout, and the security report needs an authoritative source for what "locked" means before it can call a pad genuinely-safe-versus-attacker-controllable.

C. The GpioLib indirection problem

Inside the DSDT, an Intel ThinkPad does not bake the pad name into the _CRS. It writes something like:

// DSDT excerpt (lightly redacted for readability)
Device (FPNT) {
  Name (_HID, "VFS5011")
  Method (_CRS, 0, NotSerialized) {
    Return (ResourceTemplate () {
      SpiSerialBusV2 (..., 0x00000000, ...) { ... }
      GpioInt (Level, ActiveLow, Shared, PullDefault, 0,
               "\\_SB.PCI0.GPI0", 0, ResourceConsumer)
        { GNUM(GFPI) }
    })
  }
}

GNUM is a method elsewhere in the DSDT that takes a GNVS field name (here GFPI, the fingerprint-interrupt field) and returns an integer. GNVS is the Global NVS region — an ACPI-NVS memory range whose layout is declared by the firmware and whose values are populated at boot. It is RAM, not flash; nothing in the firmware image carries the runtime contents of GNVS directly.

The values written into GNVS come from a UEFI driver called AcpiPlatform that runs in DXE. AcpiPlatform reads the board identity from hardware (typically EC straps or an EEPROM), picks the right pad set for the current board variant, and stores each pad as an Intel GpioLib GPIO_PAD immediate at the GNVS offset corresponding to its field name. The encoded form of that immediate is:

GPIO_PAD
= 0xCC_GG_NNNN
CC
chipset id (per-SoC, e.g. 0x01 = SPT-LP, 0x07 = CNL-LP, 0x0B = TGL-LP)
GG
group / bank (A = 0x00, B = 0x01, … per-SoC mapping)
NNNN
pad index within the group (0..N-1)

So the static information needed to resolve a single device to a named pad is distributed across three pieces of the firmware: the DSDT (which GNVS field feeds which device), AcpiPlatform (which GPIO_PAD immediate gets written into that field), and PlatformInit's PADCFG table (which mode, direction and lock state apply to that pad). Recovering the answer means joining all three.

03The Lenovo BIOS corpus, in shape

What is actually on those Lenovo download pages, organized for the resolver.

Every supported ThinkPad model on Lenovo's support site has a "Drivers & Software" page with a downloads tree organized by category (BIOS, Audio, Chipset, LAN, etc.). A BIOS update is distributed as an .exe file — an InnoSetup-packaged WinFlash installer of roughly 6–12 MB, occasionally larger when carrying microcode or ME firmware updates. The same site indexes the installer under a stable URL of the form:

by_mt/<MachineType>/drivers/<DocId>/files/<id>w.exe

The MachineType is the 4-character Lenovo MT (e.g. 20Q5); the DocId is the Lenovo support article id; the id encodes the BIOS revision (e.g. r0buj26ww = T-series, build 26ww). The naming convention is stable enough that an aggressive scraper can mirror every BIOS package for every published model with a few thousand HTTP requests, and a surprising amount of metadata (release date, supported OS, change summary) sits on the article pages adjacent to each download.

Inside a BIOS .exe the InnoSetup payload contains:

Why driver bundles are the first problem

Lenovo ships driver bundles — audio, GPU, wifi, fingerprint — using the same <id>w.exe naming convention as BIOS updates. They are not BIOS updates; they don't carry an .fl1. A naive coverage harness that tries to acpi_extract every <id>w.exe in a model's download tree will fail noisily on every driver bundle, which is most of them.

The cheap fix is to look at the payload first, without extracting it. Each .exe can be probed with innoextract --list (or with 7-zip in list mode for non-InnoSetup wrappers), and the listing alone tells you whether a .FL1/.FL2/.CAP file is present. Packages that pass the check are extracted normally; packages that fail are classified as a non-BIOS payload and reported separately, so that coverage numbers reflect what was actually attempted.

Sidenote on file extension heuristics. The naive heuristic "a BIOS payload is whatever is bigger than 4 MB" works most of the time and fails in three cases worth knowing about: ME firmware updates (look like a BIOS but the .fl1 is much smaller), GPU VBIOS hotfixes (occasionally large enough to trip a size-only test), and combo packages that wrap a BIOS + driver bundle in one installer. The listing-based check handles all three correctly because it looks at the actual payload structure.

04Extraction: from .exe to ACPI tables

Each step is reversible and well-understood; the trouble is in handling all of them at once.

Once a BIOS-class package has been identified, the work is to walk it down to ACPI tables. The path looks like this:

<id>w.exe (InnoSetup WinFlash installer) | | innoextract (Inno format, falls back to 7z then binwalk) v WinFlash.exe + image.fl1 + readme.txt + *.PAT (extracted payload) | | carve at first _FVH (skips installer header; lands at firmware-volume start) v raw UEFI flash image, FFSv1/v2/v3 fileset | | uefiextract (primary; UEFITool CLI, LongSoft/UEFITool) | | uefi-firmware-parser (fallback for cleaner ACPI carve on known-good Intel images) | | nested-FFSv2 carve (fallback for AMD/Phoenix wrapper-FV layouts) v file tree by GUID, including AcpiTableStorage FFS (GUID 7E374E25-...) | | parse FFS sections; pull out raw ACPI tables; dedup by SHA-256 v *.aml (DSDT, SSDT1..N, FACP, RSDP, ...) | | iasl -d (ACPICA disassembler) v *.dsl (human-readable ASL, the analysis-ready form)

No part of that path is novel in isolation. The interesting work is in the cases where the path breaks — and in the diversity of those cases across the archive.

Why uefiextract is the primary FV extractor

The Lenovo archive spans Phoenix, AMI and Insyde flavored BIOSes, sometimes with FFSv1, FFSv2 and FFSv3 files within the same image; with LZMA, Tiano-EFI compression, LZMA-F86 variants; with Phoenix's wrapper-FV layout where the inner FVs are themselves compressed inside an outer FV. UEFITool's uefiextract is the union of all of those handlers in one tool. Promoting it from "fallback when uefi-firmware-parser fails" to primary turned a 7-of-12 extraction rate on the diverse sample into 12 of 12, with uefi-firmware-parser kept as a fallback (it sometimes produces a cleaner ACPI carve on known-good Intel images), and a nested-FFSv2 carve handling the AMD / Phoenix wrapper-FV layout as a last resort.

The AcpiTableStorage FFS

UEFI Platform Init defines a specific FFS file type and GUID for the firmware's ACPI table store: EFI_ACPI_TABLE_STORAGE_FILE_GUID = 7E374E25-8E01-4FEE-87F2-390C23C606CD. Every ACPI-capable UEFI BIOS carries the compiled DSDT, SSDTs and supporting tables inside a single FFS file of that GUID, with each table as a raw section. Dedup-by-hash is necessary because some BIOSes carry a duplicate copy of the AcpiTableStorage FFS as a fallback (for example, when a recovery FV is allowed to override the main one).

The output of this step is a directory of .aml files plus the ASL decompilation; nothing else in the pipeline needs to touch the raw firmware image, and downstream tools work entirely off the ASL plus the per-FV file tree that uefiextract left behind.

05The Intel GPIO resolution problem

Why the DSDT, on its own, is necessary but not sufficient on Intel.

On AMD ThinkPads (and on the small number of Qualcomm-based ones) the DSDT is the whole answer. A GpioInt resource for the fingerprint reader carries the AGPIO pin number directly:

// AMD-style: pin number is a literal
GpioInt (Edge, ActiveLow, Exclusive, PullUp, ...,
         "\\_SB.GPIO", 0, ResourceConsumer) { 0x002B }   // AGPIO 43

On Intel ThinkPads, the same descriptor passes through GNUM(FIELDNAME):

// Intel-style: pin number is a method call
GpioInt (Level, ActiveLow, Shared, PullDefault, ...,
         "\\_SB.PCI0.GPI0", 0, ResourceConsumer) { GNUM(GFPI) }

GNUM reads a field out of GNVS (the ACPI-NVS region) and decodes it into a pin number relative to the named controller. The decoded value is an Intel GpioLib GPIO_PAD immediate that AcpiPlatform wrote at boot. Until you find that write, the DSDT alone tells you only that the fingerprint interrupt is "the pin GNVS field GFPI says it is" — which is not a pin number.

The static information required to recover the answer lives in three places:

  1. The DSDT tells you which device's _CRS references which GNVS field. From this alone you get the (field, device) join — e.g. field GFPI → device FPNT.
  2. AcpiPlatform, the DXE driver, contains the runtime writes that populate GNVS. Statically extracting those writes from the PE32+ image gives you a list of (field, GPIO_PAD immediate). The same field is sometimes written multiple times under different conditional branches (board-variant switches), which is the hard half of the problem.
  3. PlatformInit's PADCFG table tells you, for each pad on the board, the configured mode, direction, pull, and lock state. Once you have a candidate pad for a device, you cross-check PADCFG to confirm the pad is configured for the device's role — for example, an SPI-CS pin must be in native function 1 with TX enabled.

Joining the three sources is mechanically straightforward but practically fiddly. AcpiPlatform is compiled X86 (or x86-64); the writes to GNVS are typically mov [GnvsBase + offset], imm32 sequences, sometimes preceded by a conditional that selects between multiple imm32 candidates. gpio_resolve.py walks the disassembly, collects every candidate immediate per field, and emits them with their guard conditions. The PADCFG cross-check then resolves the candidate set to a single pad per device.

For board-invariant pins — a fingerprint sensor wired the same way across every variant of a model — the candidate set is a single immediate and the resolution is exact. For board-variant pins — a GNSS or BT antenna routed differently depending on factory-installed radio — the candidate set carries several, and only the live board can tell you which one is yours. The toolkit emits the per-variant table directly so that follow-up work (or a collect_gpio.sh capture on hardware) can resolve it.

06A walkthrough: fingerprint reader on ThinkPad 13 Skylake

End-to-end resolution on a real BIOS, from the carved DSDT to the named GPP pad.

The ThinkPad 13 (1st gen, Skylake-U) is a useful walkthrough target: it is single-SoC (Sunrise Point-LP, INT344B), its DSDT is small enough to read end-to-end, and its fingerprint reader is board-invariant. The BIOS used here is r0buj26ww, the November 2022 update; the input was an 8.1 MB .exe downloaded from Lenovo Support.

Step 1: identify the device in the DSDT

Carve ACPI tables, decompile, grep for the fingerprint device. The relevant fragment looks like:

Device (FPNT) {
  Name (_HID, "VFS5011")
  Method (_STA, 0, NotSerialized) { Return (0x0F) }
  Method (_CRS, 0, NotSerialized) {
    Return (ResourceTemplate () {
      SpiSerialBusV2 (0x0001, PolarityLow, FourWireMode, 8,
                     ControllerInitiated, 0x007A1200, ClockPolarityLow,
                     ClockPhaseFirst, "\\_SB.PCI0.SPI1",
                     0, ResourceConsumer, , Exclusive, )
      GpioInt (Level, ActiveLow, Shared, PullDefault, 0,
               "\\_SB.PCI0.GPI0", 0, ResourceConsumer, , )
        { GNUM(GFPI) }
    })
  }
}

Two GPIOs touch this device: an SPI chip-select that is implicit in the SpiSerialBusV2 resource (pin owned by the SPI controller, not declared here), and the interrupt line that arrives as GpioInt with pin number GNUM(GFPI). The chip-select pin is declared elsewhere; the interrupt pin's identity depends on the GNVS field GFPI.

Step 2: find the AcpiPlatform writes to GNVS

AcpiPlatform.efi sits in the firmware volume tree under its own GUID. The disassembly contains, near the end of InstallAcpiPlatform, a sequence of stores to the GNVS region:

; ... earlier code computes GnvsBase into rdi
mov     dword [rdi + 0x10A], 0x01000016   ; GFPI = SPT-LP, group A, pad 0x16 (22)
mov     dword [rdi + 0x10E], 0x01000017   ; GFPS = SPT-LP, group A, pad 0x17 (23, same group)
mov     dword [rdi + 0x112], 0x01000040   ; GPLI = SPT-LP, group A, pad 0x40 (64)
...

The offsets 0x10A, 0x10E, 0x112 correspond to the GNVS field declarations the DSDT emits in its OperationRegion (GNVS, ...) Field blocks. Joining offsets to field names yields:

GFPI -> 0x01000016
GFPS -> 0x01000017
GPLI -> 0x01000040

Step 3: decode the immediate

Decoding 0x01000016 as CC=0x01, GG=0x00, NNNN=0x0016 gives chipset SPT-LP, group A, pad index 22 — or GPP_A22 in coreboot nomenclature. The interrupt sibling, GFPS = 0x01000017, is GPP_A23.

Step 4: PADCFG cross-check

PlatformInit's PADCFG table is a sequence of 12-byte triples, { GpioPad, PADCFG_DW0, PADCFG_DW1 }. Pulling out the entry for GPP_A22:

GPP_A22:
  PADCFG_DW0 = 0x44000300
  PADCFG_DW1 = 0x00003000
  Decoded:
    PMODE   = 1  (native function 1: SPI1_CS#)
    RXDIS   = 0  TXDIS = 0
    GPIROUT* = none
    TERM    = NoPullPad
    LOCKED  = no (held in group LOCK register; checked separately)

The native function is SPI1_CS, which matches the device's role as the chip-select of an SPI fingerprint reader; the role check passes. The interrupt sibling GPP_A23 is configured as a GPIO input with IOxAPIC routing, which matches its role as the fingerprint interrupt line.

ThinkPad 13 Skylake: fingerprint reader resolves to GPP_A22 (CS) / GPP_A23 (INT)

Resolution exact (board-invariant, no candidate ambiguity); PADCFG mode and direction consistent with role; cross-verified against the DSDT SpiSerialBusV2+GpioInt pair.

sample: r0buj26ww.exe · chipset: SPT-LP (INT344B) · tool chain: acpi_extract → gpio_resolve → gpio_padmap

07PADCFG decode — not coreboot folklore

Why the toolkit reads Intel 332691 rather than chasing community pad-config tables.

intelp2m (in coreboot) is the most popular community tool for decoding PADCFG into PAD_CFG_* macros. It is good, well-maintained, and sufficient for most coreboot work. It is also downstream — lagging the Intel datasheets by some number of months whenever a new PCH generation ships, and occasionally simplifying the decode (collapsing rare flag combinations to the closest-fit macro). For a coreboot port that is fine. For a security report that needs to be precise about what "locked" means and what the interrupt routing actually targets, it is not.

The toolkit decodes PADCFG directly from Intel 100-Series PCH Datasheet Volume 2 (332691, the Volume 2 that covers the GPIO controller architecture), with per-SoC supplements for Cannon Lake (332687), Tiger Lake (633331), Alder Lake (645549), and the Wildcat/Sunrise Point/Kaby Lake siblings where they diverge. The decode is structured per-field rather than pattern-matched-to-macros, so when the security check needs to ask "is bit 23 set on this PADCFG, and is the group LOCK register for group A also clear?" it can do that directly.

The fields that matter for security analysis

FieldBitsWhat it tells you
GPIROUTSMIDW0[23]If set, this pin can trigger an SMI when its edge condition fires. Combined with RXEVCFG, this is the SMI attack surface for the pad.
GPIROUTNMIDW0[22]NMI routing. Less common; when present, often used for chassis-intrusion or watchdog inputs.
GPIROUTSCIDW0[24]SCI (System Control Interrupt) routing. Used for wake events, lid switches, hot-plug detection.
GPIROUTIOXAPICDW0[25]IOxAPIC routing: this pin shows up as a normal device interrupt to the OS.
PMODEDW0[21:20]0 = GPIO, 1..3 = native functions. SPI, I2C, UART pins live here at native modes.
RXDIS / TXDISDW0[17:16]Input/output disable. An RXDIS=1 pad cannot be sampled by software no matter the mode.
Group LOCK / LOCKTXper-group registerOnce set, prevents further writes to PADCFG (or its TX state) until reset.

The combination of GPIROUTSMI=1 with the group LOCK clear is the canonical "real attack surface" signal: the pin will fire an SMI on its configured edge, and the firmware did not lock the configuration, so privileged software (or a malicious DXE driver in a future boot) could rewrite the edge condition and re-route it. The GPIO security report sorts pads by exactly this combination, with privacy device GPIOs (camera kill, mic mute, fingerprint power, TPM provisioning) called out separately because they have their own spoofability story regardless of SMI routing.

08Native-mode elimination

Solving for board-variant pins without the live board, by ruling out everything else.

When AcpiPlatform writes multiple candidate immediates into the same GNVS field, the writes are guarded by a switch on a board identifier read at boot. The disassembly looks like:

cmp     al, 0x01           ; board_info->variant == 1?
jne     .v2
mov     dword [rdi + 0x118], 0x01000045   ; GBTI = GPP_A69 on variant 1
jmp     .end
.v2:
cmp     al, 0x02
jne     .v3
mov     dword [rdi + 0x118], 0x01000047   ; GBTI = GPP_A71 on variant 2
jmp     .end
.v3:
mov     dword [rdi + 0x118], 0x0100008C   ; GBTI = GPP_C12 on variant 3
.end:

Without the live board's variant byte, all three candidates remain. The static elimination trick is to look at the PADCFG table for each candidate and check which ones could possibly serve the device's role. If GPP_A71 is configured as native function 2 in the PADCFG table (let's say I2C2_SDA), it cannot also be the Bluetooth host-wake input that GBTI is feeding — it is reserved for I2C. Variant 2 is eliminated. The candidate set shrinks; sometimes to one, sometimes still to two or three, but always strictly smaller than the union of all variants.

The same trick narrows board ID itself when the per-variant tables are disjoint: if exactly one variant's pad set is consistent with the live PADCFG, that's the variant. gpio_resolve emits per-variant candidate sets and a confidence per variant, and the report at the top-level says "this device is GPP_A69 on board variant 1, GPP_C12 on variant 3, and undefined on variant 2". That is the firmware's contribution; the rest needs the live board.

The EC firmware reads board ID from hardware straps. That is why the EC firmware is not, by itself, enough to pin down the variant: the bytes that decide it never enter the image at all. They live in physical pull-up/pull-down resistors on specific GPIO straps, sampled at first power-on by the EC and stored in EC SRAM. Only a live machine knows.

09Vendor matrix: Intel, AMD, Qualcomm

The same pipeline, three quite different GPIO models.

The vendor of the GPIO controller in the DSDT is the dispatch key. The toolkit's vendor.py looks at the _HID of the GPIO controller device and picks the resolution path; the rest of the pipeline runs identically regardless of vendor, except that the Intel-only resolver is skipped on AMD and Qualcomm images (their DSDTs already contain literal pad numbers).

VendorController _HIDGPIO modelResolution pathPad namingStatus
Intel INT34xx / INT33FF GNVS-indirected, board-variant gated full pipeline: AcpiPlatform immediates + PADCFG native-mode elimination GPP_<bank><n> 21/21 on TP13 Skylake
AMD AMD0030 / AMDI0030 AGPIO index direct in DSDT (no GNVS, no switch) gpio_report.py alone resolves it AGPIO<n> 18/18 on A275
Qualcomm QCOM… direct in DSDT (like AMD) gpio_report.py GPIO<n> stub — validate on X13s sample

acpi_extract.py handles both the Intel layout (a single FV with a standard AcpiTableStorage FFS at the top level) and the AMD / Phoenix nested-FFSv2 layout (a wrapper FV holding compressed inner FVs, with the real AcpiTableStorage one level deeper). The nested case took several iterations to get right because Phoenix's compression markers vary across versions; the toolkit handles the three variants observed so far in the archive.

An optional AMD extension decodes the FCH (Fusion Controller Hub) GPIO control register per pin from the AMD Platform Programming Reference, which gives full pad-config documentation comparable to the Intel PADCFG decode. It is not needed for device→pad resolution (because the DSDT already carries the pin number on AMD), but it is needed for the security report's pad-lock analysis on AMD boards.

10The Embedded Controller: an ITE 8051 with a custom SFR map

A second CPU, hidden in plain sight, with its own firmware and its own attack surface.

Almost every ThinkPad ships with an embedded controller (EC), a small 8-bit microcontroller on the LPC (or eSPI) bus that handles power button press, battery gas-gauging, fan PWM, charge control, lid switch, keyboard hotkeys, thermal sensor readout, and the keyboard backlight. The EC's firmware is independent of the BIOS proper and runs continuously from S5 onwards; the host CPU talks to it through a small region of LPC-mapped RAM (the EC-RAM) and a command/status register pair.

On ThinkPads, the EC is overwhelmingly an ITE part — an 8051-family MCU (MCS-51 instruction set), executing a vendor firmware image of roughly 64–128 KB. On the ThinkPad 13 Skylake the EC is an ITE 8051 v14.4, ~111 KB, reset vector at 0x0070, carved cleanly from a known region of the .fl1. AMD ThinkPads sometimes ship a Renesas part instead, with a different instruction set; the toolkit's MCS-51 disassembler does not handle those yet.

What the EC firmware contains

Working from the disassembly, the EC firmware breaks down into:

SFR-aware disassembly

The 8051 uses a separate address space for its Special Function Registers (0x80–0xFF in internal RAM). The base SFR map is standardised; ITE adds its own custom SFRs on top, in roughly the same range, for the host-interface registers (EC-RAM, command/status, SMI control) and the GPIO ports. The toolkit's MCS-51 disassembler annotates both: a mov A, P3 becomes mov A, P3 ; GPIO port 3; a mov A, 0x9E becomes mov A, EC_CMD ; ITE host command reg. With those annotations, finding every host-interface site in the firmware is a grep instead of a multi-day reverse. Coverage on the MCS-51 instruction set is ~99% with named annotations on most Lenovo/ITE SFR ranges.

Why the EC matters for the security report: the EC drives several of the "privacy" indicators (camera kill, mic mute) and is involved in firmware update flows (SMM → EC → SPI). A correct attack-surface report needs to account for the EC's role in mediating those signals; a sufficiently capable attacker that can address the EC has paths into the host that are not always closed by SMM hardening alone.

11Security: GPIO attack surface and spoofable privacy pins

Two distinct vulnerability classes, each derivable from the same per-pad PADCFG data.

SMI / NMI routable inputs

A pin configured as input with GPIROUTSMI=1 will trigger an SMI when its configured edge condition (rising, falling, level high, level low) is met. SMIs vector into SMM, which runs at the highest CPU privilege level and is therefore an interesting target: a primitive that lets an attacker influence the flow into SMM is the first step in many BIOS-level privilege escalations.

The first-order filter is "input + SMI routed". Not every such pin is reachable by a software attacker — some live on internal traces only, some are physically inaccessible to a non-root user, some are configured as level-high with an external pull that prevents firing in practice. The report flags candidates and leaves the physical-access part for a human pass; that is the right scope for a static pipeline.

The much more interesting signal is the lock state. The Intel PCH has a per-group LOCK register that, once set, prevents further writes to PADCFG until reset. Firmware sets the LOCK on critical pads after configuring them, so that later code (including a malicious DXE driver) cannot rewrite the routing. An unlocked SMI-routable input is qualitatively different from a locked one: the unlocked case lets a future code path re-aim the SMI to a chosen edge or invert the polarity, both of which can convert a benign signal into an attacker primitive. The toolkit cross-references PADCFG against the group LOCK register state captured in a collect_gpio.sh run from a live ThinkPad and reports unlocked + SMI-routable as the genuine risk class.

Software-controlled privacy GPIOs

A second class of report findings is the privacy / security device GPIO: a pin that controls something a user trusts visually (a camera kill switch, a microphone mute state, a TPM "physical presence" line, a fingerprint reader power line) but that can be written to from software. The trust assumption behind a privacy LED is that a lit LED means the camera is on and a dark LED means the camera is off — that the indicator and the underlying device share fate. If the indicator is driven by a software-controlled GPIO, that fate-sharing is software-mediated, and software can lie about it.

Hard-wired privacy switches (a physical slide that opens/closes a circuit before the camera's power line) do not have this problem and are the right answer to it. Most ThinkPads with a "ThinkShutter" sliding cover are in this category. ThinkPads that use software mute or software camera-disable, with no hardware interlock, are in the soft-privacy category and the report flags them as such.

Privacy GPIO posture is a per-model property derivable statically

gpio_security.py identifies which devices in the DSDT carry a software-controlled GPIO matching the privacy / kill-switch heuristic (camera, mic, fingerprint, TPM provisioning). With a live capture, the report adds the lock-state field. The result is a per-model posture summary: which privacy indicators on this ThinkPad model are spoofable from privileged software.

tool: gpio_security.py · live-capture input: collect_gpio.sh

12Security: CVE intelligence and module-level diff

From "the BIOS readme says CVE-2022-xxxxx is fixed" to "here is the UEFI module that grew".

Lenovo's BIOS readme files include a section that, on most models, lists the security updates included in the new revision: CVE IDs, Lenovo advisory IDs (LEN-12345), the subsystems touched (SMM, BootGuard, TXT, microcode, TPM), and sometimes a short text description. The format drifts gently over the years and across product lines, so the readme miner is intentionally schema-light: it looks for anything that matches a CVE pattern (CVE-\d{4}-\d+), a Lenovo advisory pattern (LEN-\d+), and the standard subsystem keywords (SMM, SMI, TPM, microcode, BootGuard, TXT, flash, secure boot), and then groups them per BIOS revision.

Across the archive sample so far, the miner surfaces 15 unique CVEs, 20 advisories, across 18 models. Spectre, MDS and Foreshadow fixes appear repeatedly because they were mitigated incrementally across many microcode and SMM revisions, and the dataset gives a clean fleet-view of who was patched when. The output is a per-model inventory and a fleet CVE→model index that lets you ask "which models, on which BIOS versions, fixed CVE-2022-xxxxx?".

Module-level diff: where did the fix land?

The natural next question is where in the binary the fix landed. The module-diff tool answers it. Two BIOS images are run through uefiextract, every File in the resulting tree is hashed by its body contents, and the trees are diffed: which modules changed, which were added or removed, and how much each one grew or shrank. A "module" here is a UEFI File keyed by its UI Section name when available (e.g. FlashUtilitySmm, TcgPei) and by its GUID otherwise.

On the ThinkPad 13 transition r0buj24wwr0buj26ww, the module diff reports 66 of 546 modules changed, with a small number of SMM-class modules growing noticeably: FlashUtilitySmm (+1.1 KB), SystemSecureFlashSleepTrapSmm (+0.4 KB), TcgPei (+0.7 KB). Each of those changes correlates well with a security claim in the new BIOS readme: an SMI handler fix, a TPM self-test fix.

The secdiff capstone

fw_secdiff.py is the auto-correlator. It takes the per-revision readme miner output and the module diff, and for each readme security claim it proposes the changed module(s) most likely to implement the fix — based on name keywords (an SMI fix tends to land in a module with Smm or SMI in the name), size delta (a fix usually grows the module), and co-occurrence across multiple revisions (a module that consistently grows alongside a particular subsystem's fixes is a strong candidate). The output is, per readme claim, a ranked list of likely-fix modules with the per-module size delta and the keyword match.

r0buj24ww → r0buj26ww: TPM and SMM fixes correlate to TcgPei and the SMI modules

Readme claims a TPM self-test improvement → only TcgPei contains Tcg in its name and grew across the diff. Readme claims an SMM hardening → FlashUtilitySmm and SystemSecureFlashSleepTrapSmm grew and match the keyword Smm. Both attributions confirmed by hand-decompilation of the changed PE32+ images.

tools: fw_moddiff + fw_secdiff · samples: r0buj24ww, r0buj26ww

13Coreboot porting from the archived BIOS

How much of a coreboot board port can be generated from the firmware image alone.

A new coreboot board port traditionally starts with a working board, an inteltool capture, and a few days of careful hand-translation: walk the PCI tree, transcribe each device into devicetree.cb, decode the live GPIO PADCFG and translate to PAD_CFG_* macros, write a flash descriptor map, transcribe board straps. The toolkit's claim is that the archived BIOS, on its own, contains enough of that information to skip the inteltool step entirely and produce most of the boilerplate.

What can be auto-generated

coreboot artifactSource in the OEM BIOSToolkit stepConfidence
devicetree.cbDSDT PCI tree + ACPI device listgen_devicetree.pyhigh
gpio.h (PAD_CFG_*)PlatformInit PADCFG tablecoreboot_gpio.pyhigh
board.fmdIntel flash descriptor at offset 0x10flash_fmd.pyhigh
microcode files (.PAT)BIOS package + carved 0x800-aligned blobsblob_extract.pyhigh
VBT (iGPU)FV section, known GUIDblob_extract.pyhigh
FSP binaryIntegrated, often compressed, often partialfsp_upd.pylow — use Intel reference FSP
Memory-down SPDVerified absent in OEM BIOSesmrc_spd.pyN/A — read live
Board strap mappingEC firmware reads it at boot, not in imagerequires hardware

The FSP reality

The Intel Firmware Support Package (FSP) is the binary blob that initializes memory, the CPU complex, and the PCH on Intel platforms. coreboot consumes it as a binary input; without it, a port cannot boot. The naive assumption is that the OEM BIOS embeds the FSP at a known offset, ready to be carved out.

The reality, validated across the sample, is more nuanced. OEM Lenovo BIOSes integrate the FSP into their PEI flow rather than carrying it as a standalone binary, and the standalone FspUpdRegion is frequently compressed inside another FFS file. A clean carve is possible on some images and not others; fsp_upd.py was tightened after confirming that the false-positive rate on naive carving was high, and it now honestly reports "FSP UPD region not present" when the integration is too aggressive to recover. For a real coreboot port, the practical answer is to use Intel's reference FSP for the matching SoC.

The memory-down SPD finding

A coreboot port for a memory-down board (one with soldered DRAM, no DIMM slot) needs a JEDEC SPD byte stream describing the DRAM's geometry, timing, and refresh parameters. The natural place to look is the OEM BIOS, which already has the same information — it must, in order to bring up memory. mrc_spd.py is a structural probe that scans every flash payload for a candidate SPD byte sequence and validates each candidate using JEDEC's CRC-16, with type and revision filtering to reject coincidental hits. The CRC implementation was end-to-end verified by reproducing the stored CRC of known-good coreboot DDR4 SPDs exactly.

Across 8 diverse OEM ThinkPad BIOSes, the probe found zero embedded SPDs. Lenovo's MRC keeps memory-down configuration in proprietary PEI policy structures rather than as a flash SPD image, so a memory-down coreboot port must read the SPD from the live board with decode-dimms or use Intel reference values for the matching DRAM part. The detector still produces a useful geometry decode when a real SPD is present (early test images, third-party BIOSes, coreboot snapshots in the same archive).

FSP and SPD are the firmware-only path's hard limits

Two pieces of a coreboot port cannot be reliably extracted from an OEM BIOS image: the FSP binary (integrated and compressed) and a memory-down SPD (kept in proprietary PEI policy, not embedded as a flash image). For both, the toolkit reports honestly and points at the right alternate source. Everything else in the table above is recoverable directly from the archive.

tools: fsp_upd.py + mrc_spd.py · sample: 8 OEM BIOSes across Intel + AMD

14Hackintosh porting: an OpenCore skeleton from the BIOS

macOS does not run on a ThinkPad out of the box. The boilerplate it needs to get close is, however, derivable.

The Hackintosh community has converged on OpenCore as the bootloader and on a standard set of patches to make a generic Intel laptop boot a recent macOS: ACPI SSDTs for EC, USB, RTC, brightness, sleep wake; kernel extensions (kexts) for audio, ethernet, wifi, trackpad, iGPU; an SMBIOS impersonation of a similar real Mac so the kernel takes the right code paths. Every Hackintosh port starts from the same boilerplate — and that boilerplate is what the toolkit auto-generates.

The SSDTs

gen_ssdt.py emits the standard set of patch SSDTs from facts in the decompiled DSDT:

The output is iasl-clean ASL: the SSDTs compile without warnings under stock ACPICA, which matters because Hackintosh ports historically ship hand-edited SSDTs that accumulate iasl warnings over years, and untangling those is a real time sink.

The kext map and the iGPU framebuffer

kext_map.py walks the device inventory (PCI vendor/device IDs + ACPI HIDs) and emits the kext set: Lilu and WhateverGreen as base, IntelMausi for e1000-class NICs, AppleALC with a codec layout id derived from the audio device's subsystem ID, VoodooPS2Controller for the trackpad, and so on. igpu_fb.py picks a WhateverGreen ig-platform-id based on the iGPU's PCI device id and CPU generation, and emits a connector layout (an internal eDP panel plus two external DP / HDMI connectors, matching the typical ThinkPad chassis).

SMBIOS impersonation

The kernel's behavior changes based on the SMBIOS product name (which Mac it thinks it is running on). For a ThinkPad of a given CPU generation and form factor, the right impersonation is the matching mobile Mac of the same era — a quad-core Coffee Lake ThinkPad maps to MacBookPro15,2, a dual-core Whiskey Lake to MacBookPro15,4, etc. smbios_pick.py encodes the CPU family + segment matrix and picks a sensible default; gen_opencore.py folds the choice into the PlatformInfo section of the generated config.plist.

Hackintosh support is CPU-gen-gated, and the skeleton is just a skeleton

The generator handles Skylake through Comet Lake well; Ice Lake and Tiger Lake partially (the iGPU layouts shift); Alder Lake and AMD are not Hackintosh candidates and the toolkit declines them. Beyond the skeleton, real Hackintosh bring-up still needs human work for audio layout, trackpad calibration, sleep stability, and the persistent NVRAM. The skeleton's value is that it skips the mechanical day-one work; it does not skip the bring-up week.

tool: gen_opencore.py · cpu coverage: SKL..CML good, ICL/TGL partial

15Coverage harness: making the "handles all firmwares" claim measurable

A claim that scales to a fleet only if it is continuously measured.

The temptation when building an extraction pipeline is to test it on a few representative images, declare it good, and move on. The temptation gets punished the first time someone runs the pipeline on a new generation, a new vendor, or a legacy non-UEFI BIOS, and finds out by silently producing nothing useful. batch_extract.py exists to make that failure mode loud and structured: every BIOS in a sweep gets classified by outcome, every failure carries a reason code, and new failure classes accumulate at the top of the report until they get handled.

The classifications, in order of frequency:

The current sweep produces 12 of 12 extractions on the diverse sample (multi-vendor, multi-generation) and 9 of 9 on the BIOS-payload-classification check. As the archive grows the sweep grows with it; new failure classes surface naturally, and the toolkit hardens against them one at a time.

16Live ground truth: closing the loop with collect_gpio.sh

The last mile, on hardware, with a POSIX-sh script and no special tools.

Some pad data is not in the firmware image. The resolved gpio → consumer map only exists on a running machine, because board-variant selection runs at boot from hardware straps; the runtime PADCFG with the actual LOCK bits set is also only available live. collect_gpio.sh is a short POSIX-sh script that runs as root on any live ThinkPad and produces a self-describing tarball, with no dependencies beyond coreutils, dmidecode and an optional python3 for the NVS region dump.

File in the tarballSource on the live systemWhat it gives the resolver
gpio.txt/sys/kernel/debug/gpioresolved gpio → consumer map — the primary ground truth
pinctrl/<ctrl>/*/sys/kernel/debug/pinctrlper-pad config + gpio-ranges (gpio# → pad mapping)
acpi/DSDT.aml, SSDT*/sys/firmware/acpi/tablescross-check against the archived BIOS tables
acpi_nvs_*.bin/dev/mem (per /proc/iomem)GNVS region with runtime values — the resolver's missing half
acpi_devices.txt/sys/bus/acpi/devicesfull ACPI device list (HID → path)
dmi_id.txt/sys/class/dmi/idmodel, MTM, BIOS version (identity only; redact serial before sharing)

On a Whiskey Lake ThinkPad with the CNL-LP PCH (GPIO controller HID INT34BB), a capture exercises every step of the Intel resolver: the chipset id CC at the top byte of every GPIO_PAD matches the runtime; the GNVS values written by AcpiPlatform match the values visible in acpi_nvs_*.bin; the PADCFG mode/direction in the runtime pinctrl matches the static decode. Comet Lake, Tiger Lake, Alder Lake and AMD platforms work identically; only the controller HID changes. The script degrades gracefully on older kernels and on machines without a pinctrl driver, recording what it could not capture and continuing.

The shape of the capture matters more than any single machine's bytes: the same script run on a hundred ThinkPads would produce a hundred per-board posture reports, indexable by model and BIOS version, that together describe the security posture of an entire fleet. The toolkit is structured to consume the captures in aggregate, not just one at a time.

17Honest boundaries

Where the firmware-only path stops and the physical board starts.

18What's next

The interesting unfinished work, in rough order of leverage.

The project is, in the end, a small bet that the value in the Lenovo firmware archive is mostly latent — that the bytes are out there, the tools to look at them are out there, but the glue that holds the pipeline together and turns one model's BIOS into useful answers about that model has not been written. Writing that glue, carefully and reproducibly, is what the codebase is for. The findings above are what came out of doing it.

19Contact & contributing

Where the code lives, where to file bugs, and how to reach the maintainer.

Repository

Sourcecodeberg.org/tetdrad0n/thinkpad-fw-analysis
Issuescodeberg.org/tetdrad0n/thinkpad-fw-analysis/issues
Pull requestscodeberg.org/tetdrad0n/thinkpad-fw-analysis/pulls
Maintainer profilecodeberg.org/tetdrad0n

Direct contact

Emailtetdrad0n@proton.me
Telegram@tetdrad0n
Tox (uTox)2032774D78DD625E94814247FB454846B41F320A98A24125D84107D88A6A5C19E3565D6AC07D

Contributing

Contributions are welcome. The highest-leverage open areas are listed in "What's next": new chipset-family entries (Alder/Raptor Lake on Intel; Rembrandt/Phoenix on AMD), Renesas EC disassembly, a Qualcomm validation against a real X13s capture, and ground-truth tarballs from collect_gpio.sh runs on hardware not yet in the coverage matrix.

Bug reports are useful at any level of detail; if you have a failing batch_extract.py run on a particular BIOS, attach the package URL or the model + MT + BIOS revision, and the classifier's output. New failure classes are how the pipeline hardens.

Pull requests should target main. Keep the no-folklore rule: PADCFG and register decodes from the official Intel datasheet (332691 and the per-SoC successors), UEFI / ACPI references from the published specifications, AMD work from the PPR rather than community write-ups.