MHCXGraph.core.pdb_graph_builder

class MHCXGraph.core.pdb_graph_builder.AtomBundle[source]

Bases: TypedDict

ca_cb_map: dict[str, NodeCoords]
node_centroids: pandas.DataFrame
raw_df: pandas.DataFrame
class MHCXGraph.core.pdb_graph_builder.BuiltGraph(graph: nx.Graph, residue_index: list[tuple[str, Residue]], residue_centroids: np.ndarray, water_index: list[tuple[str, Residue]] = <factory>, water_centroids: np.ndarray | None = None, distance_matrix: np.ndarray | None = None, raw_pdb_df: pd.DataFrame | None = None, node_centroids: pd.DataFrame | None = None, dssp_df: pd.DataFrame | None = None)[source]

Bases: object

Container returned by PDBGraphBuilder.

graph

Constructed graph with node/edge attributes.

Type:

networkx.Graph

residue_index

Node id to residue pairing for amino acid residues used as the main distance base.

Type:

list of tuple[str, Bio.PDB.Residue]

residue_centroids

Centroids for residue_index in the same order, shape (N, 3).

Type:

numpy.ndarray

water_index

Node id to residue pairing for water residues (if included).

Type:

list of tuple[str, Bio.PDB.Residue]

water_centroids

Water centroids in the same order as water_index, shape (W, 3).

Type:

numpy.ndarray or None

distance_matrix

Pairwise centroid distance matrix for residues in residue_index.

Type:

numpy.ndarray or None

raw_pdb_df

Atom-level table used to derive centroids and CA/CB coordinates.

Type:

pandas.DataFrame or None

node_centroids

DataFrame indexed by node_id with centroid coordinates x_coord, y_coord, z_coord.

Type:

pandas.DataFrame or None

dssp_df

DSSP summary with an added “rsa” column, aligned to graph nodes.

Type:

pandas.DataFrame or None

distance_matrix: np.ndarray | None = None
dssp_df: pd.DataFrame | None = None
graph: nx.Graph
node_centroids: pd.DataFrame | None = None
raw_pdb_df: pd.DataFrame | None = None
residue_centroids: np.ndarray
residue_index: list[tuple[str, Residue]]
water_centroids: np.ndarray | None = None
water_index: list[tuple[str, Residue]]
class MHCXGraph.core.pdb_graph_builder.FixedPDBIO(*args: Any, **kwargs: Any)[source]

Bases: PDBIO

Overrides PDBIO to fix the left-aligned element bug in Biopython 1.86. Intercepts the generated string and right-aligns columns 77-78.

class MHCXGraph.core.pdb_graph_builder.NodeCoords[source]

Bases: TypedDict

ca_coord: tuple[float, float, float]
cb_coord: tuple[float, float, float]
cb_is_virtual: bool
class MHCXGraph.core.pdb_graph_builder.PDBGraphBuilder(pdb_path: str, config: GraphConfig | None = None)[source]

Bases: object

Build a structural graph from a PDB/mmCIF file.

Parameters:
  • pdb_path (str) – Path to the structure file.

  • config (GraphBuildConfig, optional) – Graph construction options.

Notes

The distance matrix and the node labels are exported to .pmhc_tmp/ with filenames <stem>_distmat.npy and <stem>_residue_labels.txt.

build_graph() BuiltGraph[source]

Run the full pipeline: load → select chains → ASA/RSA → distances → graph.

Returns:

Graph object and associated tables/arrays.

Return type:

BuiltGraph

Notes

  • The residue–residue distance matrix and node labels are saved under .pmhc_tmp/<stem>_distmat.npy and .pmhc_tmp/<stem>_residue_labels.txt.

  • Waters are added as nodes with rsa=1.0 and connected to nearby residues.

load() None[source]

Load the structure into memory.

Raises:

Exception – If parsing fails.

structure: Structure | None
static to_graphml(G: Graph, path: str) None[source]

Export the graph to GraphML.

Parameters:
  • G (networkx.Graph) – Graph to export.

  • path (str) – Output path.

static to_json(G: Graph, path: str) None[source]

Export the graph to JSON (nodes/edges with attributes).

Parameters:
  • G (networkx.Graph) – Graph to export.

  • path (str) – Output path.

class MHCXGraph.core.pdb_graph_builder.ResidueInfo[source]

Bases: TypedDict

canonical_aminoacid_residues: list[tuple[str, Bio.PDB.Residue.Residue, Literal['canonical_aminoacid', 'noncanonical_aminoacid', 'ligand', 'water'], ndarray]]
ligands: list[tuple[str, Bio.PDB.Residue.Residue, Literal['canonical_aminoacid', 'noncanonical_aminoacid', 'ligand', 'water'], ndarray]]
noncanonical_aminoacid_residues: list[tuple[str, Bio.PDB.Residue.Residue, Literal['canonical_aminoacid', 'noncanonical_aminoacid', 'ligand', 'water'], ndarray]]
waters: list[tuple[str, Bio.PDB.Residue.Residue, Literal['canonical_aminoacid', 'noncanonical_aminoacid', 'ligand', 'water'], ndarray]]
class MHCXGraph.core.pdb_graph_builder.StructureDict[source]

Bases: TypedDict

chains: dict
chains_obj: list[Bio.PDB.Chain.Chain]
residues: ResidueInfo
MHCXGraph.core.pdb_graph_builder.capture_c_stderr(logger)[source]

Capture stderr output from C libraries and redirect it to logging.

Parameters:

logger (logging.Logger) – Logger instance used to record intercepted messages.

Yields:

None – Context manager that temporarily redirects the C-level standard error stream.

Notes

This utility is primarily used to intercept verbose output from FreeSASA and forward it to the project logging system.

MHCXGraph.core.pdb_graph_builder.check_res_inconsistencies(res_tuples)[source]

Validate consistency between node identifiers and residue objects.

Parameters:

res_tuples (list[tuple]) – List containing tuples of the form (node_id, residue_object, kind, centroid).

Returns:

inconsistencies – List of formatted messages describing mismatches between the residue information encoded in the node identifier and the values stored in the Bio.PDB residue object.

Return type:

list[str]

MHCXGraph.core.pdb_graph_builder.log = <VerboseLoggerAdapter MHCXGraph (WARNING)>

Structural graph builder for pMHC using Bio.PDB and NetworkX.

Features

  • Residue and water centroid computation.

  • ASA via Shrake–Rupley and RSA normalization (Tien et al. tables via Bio.PDB).

  • Optional RSA via DSSP with fallback for non-canonicals.

  • Distance-based graph construction (residue–residue and residue–water).

  • Distance matrix and label export for reproducibility/debugging.

Requirements

  • biopython >= 1.79

  • networkx

  • numpy

  • pandas

Classes

AtomBundle

BuiltGraph(graph, residue_index, Residue]], ...)

Container returned by PDBGraphBuilder.

FixedPDBIO(*args, **kwargs)

Overrides PDBIO to fix the left-aligned element bug in Biopython 1.86.

NodeCoords

PDBGraphBuilder(pdb_path[, config])

Build a structural graph from a PDB/mmCIF file.

ResidueInfo

StructureDict

Functions

capture_c_stderr(logger)

Capture stderr output from C libraries and redirect it to logging.

check_res_inconsistencies(res_tuples)

Validate consistency between node identifiers and residue objects.