MHCXGraph.core.subgraphs¶
- exception MHCXGraph.core.subgraphs.ProteinGraphConfigurationError[source]¶
Bases:
RuntimeErrorRaised when required graph/node annotations are missing.
- MHCXGraph.core.subgraphs.compute_distmat(pdb_df: pandas.DataFrame) ndarray[source]¶
Compute Euclidean distance matrix between nodes.
Multiple rows per node_id are averaged first.
- Parameters:
pdb_df (pandas.DataFrame) – Must contain: [‘node_id’, ‘x_coord’, ‘y_coord’, ‘z_coord’].
- Returns:
Distance matrix (N, N) in the order of first occurrence of each node_id.
- Return type:
np.ndarray
- MHCXGraph.core.subgraphs.extract_interface_subgraph(g: Graph, interface_list: list[str] | None = None, chain_list: list[str] | None = None, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes at chain-chain interfaces.
- Parameters:
g (nx.Graph) – Input graph.
interface_list (list of str, optional) – Allowed chain pair labels (e.g., [“AB”,”BC”]). If None, any pairwise inter-chain contact qualifies.
chain_list (list of str, optional) – Restrict to interactions among these chains.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_k_hop_subgraph(g: Graph, central_node: str, k: int, k_only: bool = False, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes by k-hop neighborhood.
- Parameters:
g (nx.Graph) – Input graph.
central_node (str) – Center node ID.
k (int) – Number of hops.
k_only (bool, default=False) – If True, include exactly k-hop nodes; otherwise include all <= k.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph(g: Graph, node_list: list[str] | None = None, sequence_positions: list[int] | None = None, chains: list[str] | None = None, residue_types: list[str] | None = None, atom_types: list[str] | None = None, bond_types: list[str] | None = None, centre_point: ndarray | tuple[float, float, float] | None = None, radius: float | None = None, ss_elements: list[str] | None = None, rsa_threshold: float | None = None, asa_threshold: float | None = None, k_hop_central_node: str | None = None, k_hops: int | None = None, k_only: bool | None = None, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str][source]¶
Aggregate subgraph selector with a unified API.
- Parameters:
g (nx.Graph) – Input graph.
node_list (list of str, optional) – Explicit nodes to include.
sequence_positions (list of int, optional) – Residue numbers to include.
residue_types (list of str, optional) – Residue names to include.
bond_types (list of str, optional) – Edge kinds whose incident nodes to include.
centre_point (array-like, optional) – Center for point-radius selection.
radius (float, optional) – Radius for point-radius selection.
ss_elements (list of str, optional) – Secondary structure labels to include.
rsa_threshold (float, optional) – Minimum RSA to include.
k_hop_central_node (str, optional) – Node ID for k-hop selection.
k_hops (int, optional) – Number of hops for k-hop selection.
k_only (bool, optional) – If True, include exactly k-hop nodes; else all <= k.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_by_bond_type(g: Graph, bond_types: list[str] | set[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes incident to edges of specified kinds.
- Parameters:
g (nx.Graph) – Input graph.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_by_sequence_position(g: Graph, sequence_positions: list[int], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes by residue index.
- Parameters:
g (nx.Graph) – Input graph.
sequence_positions (list of int) – Residue numbers to include.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_from_atom_types(g: Graph, atom_types: list[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes by atom type.
- Parameters:
g (nx.Graph) – Input graph.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_from_chains(g: Graph, chains: list[str] | set[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes by chain IDs.
- Parameters:
g (nx.Graph) – Input graph.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_from_node_list(g: Graph, node_list: list[str] | None, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str][source]¶
Build a subgraph from an explicit node list.
- Parameters:
g (nx.Graph) – Input graph.
node_list (list of str or None) – Nodes to keep. If None, returns g.
filter_dataframe (bool, default=True) – Filter graph-level DataFrames to subgraph nodes.
update_coords (bool, default=True) – Rebuild graph[‘coords’] from node attributes.
recompute_distmat (bool, default=False) – Recompute graph[‘dist_mat’] from pdb_df if available.
inverse (bool, default=False) – If True, keep the complement of node_list.
return_node_list (bool, default=False) – If True, return the resolved node list instead of a subgraph.
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_from_point(g: Graph, centre_point: ndarray | tuple[float, float, float], radius: float, filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes within a sphere.
- Parameters:
g (nx.Graph) – Input graph.
centre_point (array-like of shape (3,)) – Sphere center.
radius (float) – Sphere radius.
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_from_residue_types(g: Graph, residue_types: list[str] | set[str], filter_dataframe: bool = True, update_coords: bool = True, recompute_distmat: bool = False, inverse: bool = False, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes by residue name.
- Parameters:
g (nx.Graph) – Input graph.
residue_types (list of str) – Allowed residue names (3-letter).
filter_dataframe – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().inverse – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_subgraph_from_secondary_structure(g: Graph, ss_elements: list[str], inverse: bool = False, filter_dataframe: bool = True, recompute_distmat: bool = False, update_coords: bool = True, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes by secondary structure label.
- Parameters:
g (nx.Graph) – Input graph. Nodes must carry ‘ss’.
ss_elements (list of str) – Allowed secondary structure labels.
inverse (bool, default=False) – If True, exclude ss_elements.
filter_dataframe – See
extract_subgraph_from_node_list().recompute_distmat – See
extract_subgraph_from_node_list().update_coords – See
extract_subgraph_from_node_list().return_node_list – See
extract_subgraph_from_node_list().
- Returns:
Subgraph or node list.
- Return type:
- Raises:
ProteinGraphConfigurationError – If any node lacks the ‘ss’ attribute.
- MHCXGraph.core.subgraphs.extract_surface_subgraph_asa(g: Graph, asa_threshold: float, inverse: bool = False, filter_dataframe: bool = True, recompute_distmat: bool = False, update_coords: bool = True, return_node_list: bool = False) Graph | list[str] | None[source]¶
Select nodes by absolute solvent accessibility (ASA).
- Parameters:
g (nx.Graph) – Input graph. Nodes are expected to carry ‘asa’ (float, in Å^2).
asa_threshold (float) – Minimum ASA to include.
inverse (bool, default=False) – If True, include ASA < threshold.
filter_dataframe (bool, default=True) – Filter graph-level DataFrames to subgraph nodes.
recompute_distmat (bool, default=False) – Recompute graph[‘dist_mat’] from pdb_df if available.
update_coords (bool, default=True) – Rebuild graph[‘coords’] from node attributes.
return_node_list (bool, default=False) – If True, return the resolved node list instead of a subgraph.
- Returns:
Subgraph or node list.
- Return type:
- MHCXGraph.core.subgraphs.extract_surface_subgraph_rsa(g: Graph, rsa_threshold: float = 0.2, inverse: bool = False, filter_dataframe: bool = True, recompute_distmat: bool = False, update_coords: bool = True, return_node_list: bool = False, *, treat_water_as_surface: bool = True, unknown_policy: str = 'skip', unknown_value: float | None = None) Graph | list[str] | None[source]¶
Select nodes by relative solvent accessibility (RSA).
- Parameters:
g (nx.Graph) – Input graph. Nodes may carry ‘rsa’ in [0, 1].
rsa_threshold (float, default=0.2) – Minimum RSA to include.
inverse (bool, default=False) – If True, include RSA < threshold.
filter_dataframe (bool, default=True) – Filter graph-level DataFrames to subgraph nodes.
recompute_distmat (bool, default=False) – Recompute graph[‘dist_mat’] from pdb_df if available.
update_coords (bool, default=True) – Rebuild graph[‘coords’] from node attributes.
return_node_list (bool, default=False) – If True, return the resolved node list instead of a subgraph.
treat_water_as_surface (bool, default=True) – If True, nodes with residue name typical of water (e.g. HOH/WAT/DOD/TIP3) are treated as RSA=1.0 when ‘rsa’ is missing.
unknown_policy ({'skip', 'value', 'error'}, default='skip') – Behavior for nodes missing ‘rsa’ that are not water: - ‘skip’ : ignore node (do not include, do not raise); - ‘value’: use unknown_value as RSA; - ‘error’: raise ProteinGraphConfigurationError.
unknown_value (float, optional) – RSA value to use when unknown_policy=’value’.
- Returns:
Subgraph or node list.
- Return type:
- Raises:
ProteinGraphConfigurationError – If unknown_policy=’error’ and a node lacks ‘rsa’.
- MHCXGraph.core.subgraphs.log = <VerboseLoggerAdapter MHCXGraph (WARNING)>¶
Subgraph utilities for protein structure graphs.
Assumptions¶
Nodes represent residues or atoms and may carry: ‘chain_id’ (or ‘chain’), ‘residue_number’ (or ‘resseq’), ‘residue_name’ (or ‘resname’), and coordinates in ‘coords’ or ‘centroid’.
Graph-level metadata (G.graph) may include: ‘pdb_df’, ‘raw_pdb_df’, ‘rgroup_df’, ‘coords’, ‘distance_matrix’, ‘dssp_df’, ‘residue_labels’, ‘water_labels’, ‘water_positions’.
The functions below select subsets by chain, residue type, spatial radius, secondary structure, RSA, edge kind, k-hop, etc., and propagate/update relevant graph metadata to the returned subgraph.
Functions¶
|
Compute Euclidean distance matrix between nodes. |
|
Select nodes at chain-chain interfaces. |
|
Select nodes by k-hop neighborhood. |
|
Aggregate subgraph selector with a unified API. |
|
Select nodes incident to edges of specified kinds. |
Select nodes by residue index. |
|
|
Select nodes by atom type. |
|
Select nodes by chain IDs. |
|
Build a subgraph from an explicit node list. |
|
Select nodes within a sphere. |
Select nodes by residue name. |
|
Select nodes by secondary structure label. |
|
|
Select nodes by absolute solvent accessibility (ASA). |
|
Select nodes by relative solvent accessibility (RSA). |
Exceptions¶
Raised when required graph/node annotations are missing. |