MHCXGraph.core.subgraphs

exception MHCXGraph.core.subgraphs.ProteinGraphConfigurationError[source]

Bases: RuntimeError

Raised when required graph/node annotations are missing.

MHCXGraph.core.subgraphs.compute_distmat(pdb_df: pandas.DataFrame) ndarray[source]

Compute Euclidean distance matrix between nodes.

Multiple rows per node_id are averaged first.

Parameters:

pdb_df (pandas.DataFrame) – Must contain: [‘node_id’, ‘x_coord’, ‘y_coord’, ‘z_coord’].

Returns:

Distance matrix (N, N) in the order of first occurrence of each node_id.

Return type:

np.ndarray

MHCXGraph.core.subgraphs.extract_interface_subgraph(g: Graph, interface_list: list[str] | None = None, chain_list: list[str] | None = None, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes at chain-chain interfaces.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_k_hop_subgraph(g: Graph, central_node: str, k: int, k_only: bool = False, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes by k-hop neighborhood.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph(g: Graph, node_list: list[str] | None = None, sequence_positions: list[int] | None = None, chains: list[str] | None = None, residue_types: list[str] | None = None, atom_types: list[str] | None = None, bond_types: list[str] | None = None, centre_point: ndarray | tuple[float, float, float] | None = None, radius: float | None = None, ss_elements: list[str] | None = None, rsa_threshold: float | None = None, asa_threshold: float | None = None, k_hop_central_node: str | None = None, k_hops: int | None = None, k_only: bool | None = None, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str][source]

Aggregate subgraph selector with a unified API.

Parameters:
  • g (nx.Graph) – Input graph.

  • node_list (list of str, optional) – Explicit nodes to include.

  • sequence_positions (list of int, optional) – Residue numbers to include.

  • chains (list of str, optional) – Chain IDs to include.

  • residue_types (list of str, optional) – Residue names to include.

  • atom_types (list of str, optional) – Atom types to include.

  • bond_types (list of str, optional) – Edge kinds whose incident nodes to include.

  • centre_point (array-like, optional) – Center for point-radius selection.

  • radius (float, optional) – Radius for point-radius selection.

  • ss_elements (list of str, optional) – Secondary structure labels to include.

  • rsa_threshold (float, optional) – Minimum RSA to include.

  • k_hop_central_node (str, optional) – Node ID for k-hop selection.

  • k_hops (int, optional) – Number of hops for k-hop selection.

  • k_only (bool, optional) – If True, include exactly k-hop nodes; else all <= k.

  • filter_dataframe – See extract_subgraph_from_node_list().

  • update_coords – See extract_subgraph_from_node_list().

  • recompute_distmat – See extract_subgraph_from_node_list().

  • inverse – See extract_subgraph_from_node_list().

  • return_node_list – See extract_subgraph_from_node_list().

Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_by_bond_type(g: Graph, bond_types: list[str] | set[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes incident to edges of specified kinds.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_by_sequence_position(g: Graph, sequence_positions: list[int], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes by residue index.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_from_atom_types(g: Graph, atom_types: list[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes by atom type.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_from_chains(g: Graph, chains: list[str] | set[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes by chain IDs.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_from_node_list(g: Graph, node_list: list[str] | None, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str][source]

Build a subgraph from an explicit node list.

Parameters:
  • g (nx.Graph) – Input graph.

  • node_list (list of str or None) – Nodes to keep. If None, returns g.

  • filter_dataframe (bool, default=True) – Filter graph-level DataFrames to subgraph nodes.

  • update_coords (bool, default=True) – Rebuild graph[‘coords’] from node attributes.

  • recompute_distmat (bool, default=False) – Recompute graph[‘dist_mat’] from pdb_df if available.

  • inverse (bool, default=False) – If True, keep the complement of node_list.

  • return_node_list (bool, default=False) – If True, return the resolved node list instead of a subgraph.

Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_from_point(g: Graph, centre_point: ndarray | tuple[float, float, float], radius: float, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes within a sphere.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_from_residue_types(g: Graph, residue_types: list[str] | set[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes by residue name.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_subgraph_from_secondary_structure(g: Graph, ss_elements: list[str], inverse: bool = False, filter_dataframe: bool = True, recompute_distmat: bool = False, update_coords: bool = True, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes by secondary structure label.

Parameters:
Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

Raises:

ProteinGraphConfigurationError – If any node lacks the ‘ss’ attribute.

MHCXGraph.core.subgraphs.extract_surface_subgraph_asa(g: Graph, asa_threshold: float, inverse: bool = False, filter_dataframe: bool = True, recompute_distmat: bool = False, update_coords: bool = True, return_node_list: bool = False) Graph | list[str] | None[source]

Select nodes by absolute solvent accessibility (ASA).

Parameters:
  • g (nx.Graph) – Input graph. Nodes are expected to carry ‘asa’ (float, in Å^2).

  • asa_threshold (float) – Minimum ASA to include.

  • inverse (bool, default=False) – If True, include ASA < threshold.

  • filter_dataframe (bool, default=True) – Filter graph-level DataFrames to subgraph nodes.

  • recompute_distmat (bool, default=False) – Recompute graph[‘dist_mat’] from pdb_df if available.

  • update_coords (bool, default=True) – Rebuild graph[‘coords’] from node attributes.

  • return_node_list (bool, default=False) – If True, return the resolved node list instead of a subgraph.

Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

MHCXGraph.core.subgraphs.extract_surface_subgraph_rsa(g: Graph, rsa_threshold: float = 0.2, inverse: bool = False, filter_dataframe: bool = True, recompute_distmat: bool = False, update_coords: bool = True, return_node_list: bool = False, *, treat_water_as_surface: bool = True, unknown_policy: str = 'skip', unknown_value: float | None = None) Graph | list[str] | None[source]

Select nodes by relative solvent accessibility (RSA).

Parameters:
  • g (nx.Graph) – Input graph. Nodes may carry ‘rsa’ in [0, 1].

  • rsa_threshold (float, default=0.2) – Minimum RSA to include.

  • inverse (bool, default=False) – If True, include RSA < threshold.

  • filter_dataframe (bool, default=True) – Filter graph-level DataFrames to subgraph nodes.

  • recompute_distmat (bool, default=False) – Recompute graph[‘dist_mat’] from pdb_df if available.

  • update_coords (bool, default=True) – Rebuild graph[‘coords’] from node attributes.

  • return_node_list (bool, default=False) – If True, return the resolved node list instead of a subgraph.

  • treat_water_as_surface (bool, default=True) – If True, nodes with residue name typical of water (e.g. HOH/WAT/DOD/TIP3) are treated as RSA=1.0 when ‘rsa’ is missing.

  • unknown_policy ({'skip', 'value', 'error'}, default='skip') – Behavior for nodes missing ‘rsa’ that are not water: - ‘skip’ : ignore node (do not include, do not raise); - ‘value’: use unknown_value as RSA; - ‘error’: raise ProteinGraphConfigurationError.

  • unknown_value (float, optional) – RSA value to use when unknown_policy=’value’.

Returns:

Subgraph or node list.

Return type:

nx.Graph or list of str

Raises:

ProteinGraphConfigurationError – If unknown_policy=’error’ and a node lacks ‘rsa’.

MHCXGraph.core.subgraphs.log = <VerboseLoggerAdapter MHCXGraph (WARNING)>

Subgraph utilities for protein structure graphs.

Assumptions

  • Nodes represent residues or atoms and may carry: ‘chain_id’ (or ‘chain’), ‘residue_number’ (or ‘resseq’), ‘residue_name’ (or ‘resname’), and coordinates in ‘coords’ or ‘centroid’.

  • Graph-level metadata (G.graph) may include: ‘pdb_df’, ‘raw_pdb_df’, ‘rgroup_df’, ‘coords’, ‘distance_matrix’, ‘dssp_df’, ‘residue_labels’, ‘water_labels’, ‘water_positions’.

The functions below select subsets by chain, residue type, spatial radius, secondary structure, RSA, edge kind, k-hop, etc., and propagate/update relevant graph metadata to the returned subgraph.

Functions

compute_distmat(pdb_df)

Compute Euclidean distance matrix between nodes.

extract_interface_subgraph(g[, ...])

Select nodes at chain-chain interfaces.

extract_k_hop_subgraph(g, central_node, k[, ...])

Select nodes by k-hop neighborhood.

extract_subgraph(g[, node_list, ...])

Aggregate subgraph selector with a unified API.

extract_subgraph_by_bond_type(g, bond_types)

Select nodes incident to edges of specified kinds.

extract_subgraph_by_sequence_position(g, ...)

Select nodes by residue index.

extract_subgraph_from_atom_types(g, atom_types)

Select nodes by atom type.

extract_subgraph_from_chains(g, chains[, ...])

Select nodes by chain IDs.

extract_subgraph_from_node_list(g, node_list)

Build a subgraph from an explicit node list.

extract_subgraph_from_point(g, centre_point, ...)

Select nodes within a sphere.

extract_subgraph_from_residue_types(g, ...)

Select nodes by residue name.

extract_subgraph_from_secondary_structure(g, ...)

Select nodes by secondary structure label.

extract_surface_subgraph_asa(g, asa_threshold)

Select nodes by absolute solvent accessibility (ASA).

extract_surface_subgraph_rsa(g[, ...])

Select nodes by relative solvent accessibility (RSA).

Exceptions

ProteinGraphConfigurationError

Raised when required graph/node annotations are missing.