MHCXGraph.utils.tools

MHCXGraph.utils.tools.association_product(graphs_data: list, config: dict) dict[str, list] | None[source]

Compute the cross-protein association product.

This function orchestrates the full multi-protein association pipeline including triad detection, chunked combination, graph construction, and frame generation.

Parameters:
  • graphs_data (list) – List of graph metadata structures.

  • config (dict) – Association configuration dictionary.

Returns:

result – Dictionary containing associated graphs or None if no associations were produced.

Return type:

dict or None

MHCXGraph.utils.tools.build_graph_from_cross_combos(cross_combos) set[tuple[tuple[str, ...], tuple[str, ...]]][source]

Build graph edges from cross-protein triad combinations.

Parameters:

cross_combos (dict) – Cross-protein triad combination structure produced by cross_protein_triads.

Returns:

edges – Set of edges between associated residue nodes across proteins.

Return type:

set[tuple]

MHCXGraph.utils.tools.build_threshold_vector(nodes, maps, threshold_cfg)[source]

Return the upper-triangular threshold vector instead of a full KxK matrix.

MHCXGraph.utils.tools.convert_edges_to_residues(edges: set[frozenset], maps: dict) tuple[list, list, list][source]

Convert edge representations from node indices to residue labels.

Parameters:
  • edges (set[frozenset]) – Set of edges represented as frozensets of node indices.

  • maps (dict) –

    Mapping structure containing: - residue_maps_unique : dict[int, tuple]

    Mapping from node index to residue tuple.

    • possible_nodesdict

      Mapping from node index to node tuple.

Returns:

  • original_edges (list) – Original edge objects as provided in the input set.

  • edges_indices (list[tuple]) – Edge representation using tuples of node indices.

  • converted_edges (list[tuple]) – Edge representation where node indices are converted to residue labels of the form "CHAIN:RESNAME:RESNUM".

MHCXGraph.utils.tools.create_coherent_matrices(nodes, matrices: dict, maps: dict, threshold: float | dict = 3.0)[source]

Compute coherence matrices across proteins using a memory-efficient streaming approach.

Parameters:
  • nodes (list) – Node index lists representing aligned nodes across proteins.

  • matrices (dict) – Dictionary containing distance matrices.

  • maps (dict) – Residue mapping dictionary.

  • threshold (float or dict, default=3.0) – Distance difference threshold used to determine coherence.

Returns:

  • new_matrices (dict) – Dictionary containing coherence masks and standard deviation matrices.

  • maps (dict) – Updated node mapping dictionary.

MHCXGraph.utils.tools.create_graph(edges_dict: dict, typeEdge: str = 'edges_indices', comp_id=0, *, edge_std_matrix: ndarray | None = None, node_index_map: dict[Any, int] | None = None)[source]

Construct NetworkX graphs from frame edge definitions.

Parameters:
  • edges_dict (dict) – Frame dictionary containing edge definitions.

  • typeEdge (str, default="edges_indices") – Key specifying which edge representation to use.

  • comp_id (int, default=0) – Component identifier used for logging.

  • edge_std_matrix (ndarray, optional) – Matrix of edge standard deviations used for visualization.

  • node_index_map (dict, optional) – Mapping between node identifiers and matrix indices.

Returns:

graphs – List of constructed NetworkX graphs.

Return type:

list[networkx.Graph]

MHCXGraph.utils.tools.cross_protein_triads(step_idx, chunk_idx, triads_per_protein, diff, check_distances=True)[source]

Generate cross-protein combinations of compatible triads.

Parameters:
  • step_idx (int) – Current hierarchical association step.

  • chunk_idx (int) – Index of the chunk being processed.

  • triads_per_protein (list[dict]) – List of triad dictionaries for each protein.

  • diff (float) – Maximum allowed distance difference across proteins.

  • check_distances (bool, default=True) – If True, distance bounds are used to filter candidate triad combinations.

Returns:

cross – Dictionary describing cross-protein triad combinations.

Return type:

dict

MHCXGraph.utils.tools.execute_step(step_idx: int, graph_collection, max_chunks: int, current_filtered_cross_combos, graphs_data, global_state, residue_tracker)[source]

Execute a single hierarchical association step.

Parameters:
  • step_idx (int) – Current step index.

  • graph_collection (dict) – Graph collection produced during preprocessing.

  • max_chunks (int) – Maximum chunk size used for hierarchical grouping.

  • current_filtered_cross_combos (list) – Cross-combo results from the previous step.

  • graphs_data (list) – Graph metadata structures.

  • global_state (dict) – Shared global state containing matrices, maps, and configuration parameters.

  • residue_tracker (ResidueTracker, optional) – Tracking object used for debugging and provenance logging.

Returns:

  • filtered_cross_combos (list) – Filtered cross combinations for the next step.

  • step_graphs (list) – Graphs produced during the step.

MHCXGraph.utils.tools.filter_maps_by_nodes(data: dict, matrices_dict: dict, distance_threshold: float = 10.0) tuple[dict, dict][source]

Filter contact and RSA maps according to graph nodes.

Parameters:
  • data (dict) – Input data containing contact maps, RSA values, residue maps, and node lists for each protein.

  • matrices_dict (dict) – Dictionary used to store derived matrices produced during preprocessing.

  • distance_threshold (float, default=10.0) – Maximum allowed contact distance for adjacency.

Returns:

  • matrices_dict (dict) – Updated dictionary containing pruned and thresholded matrices.

  • maps (dict) – Mapping structure describing residue indices and filtered residue maps.

MHCXGraph.utils.tools.find_class(classes: dict[str, dict[str, float]], value: float)[source]

Find class intervals that contain a numeric value.

Parameters:
  • classes (dict[str, dict[str, float]]) – Dictionary defining class intervals with keys "low" and "high".

  • value (float) – Value to evaluate against interval definitions.

Returns:

class_name – Name of the matching class, a list of classes if multiple intervals match, or None if no interval contains the value.

Return type:

str or list[str] or None

MHCXGraph.utils.tools.find_triads(graph_data, classes, config, checks, protein_index, tracker: ResidueTracker | None = None)[source]

Identify residue triads within a protein interaction graph.

Parameters:
  • graph_data (dict) – Graph metadata containing the graph object, contact map, RSA values, and residue mappings.

  • classes (dict) – Classification dictionaries defining bins for residues, distances, or solvent accessibility.

  • config (dict) – Association configuration controlling thresholds, discretization, and filtering rules.

  • checks (dict) – Dictionary controlling optional filters such as RSA checks.

  • protein_index (int) – Index of the protein currently being processed.

  • tracker (ResidueTracker, optional) – Tracking object used for debugging and provenance recording of triad generation.

Returns:

triads – Dictionary mapping triad tokens to metadata including counts and absolute triad instances.

Return type:

dict

MHCXGraph.utils.tools.generate_frames(component_graph, matrices, maps, len_component, chunk_id, step, config, debug=False, debug_every=5000, nodes=None, steps_end=False, residue_tracker: ResidueTracker | None = None)[source]

Generate coherent structural frames from a component graph.

Frames correspond to coherent subgraphs satisfying distance and adjacency constraints.

Parameters:
  • component_graph (networkx.Graph) – Graph component under analysis.

  • matrices (dict) – Coherence matrices and adjacency matrices.

  • maps (dict) – Residue mapping dictionary.

  • len_component (int) – Number of nodes in the component.

  • chunk_id (int) – Chunk identifier.

  • step (int) – Association step index.

  • config (dict) – Association configuration.

  • debug (bool, default=False) – Enable debug logging.

  • debug_every (int, default=5000) – Interval for progress logging during search.

  • nodes (list, optional) – Node ordering corresponding to matrix indices.

  • steps_end (bool, default=False) – If True, perform final frame filtering.

  • residue_tracker (ResidueTracker, optional) – Tracking object used for recording accepted frames.

Returns:

  • frames (dict) – Dictionary describing generated frames.

  • union_graph (dict) – Graph representation combining all accepted frames.

MHCXGraph.utils.tools.get_memory_usage_mb()[source]

Retorna uso de memória RSS em MB, se psutil estiver disponível. Caso contrário, retorna None.

MHCXGraph.utils.tools.parse_node(node: str) tuple[str, str, int][source]
MHCXGraph.utils.tools.process_chunk(step_idx, chunk_idx, chunk_triads, global_state, residue_tracker)[source]

Process a chunk of triads during hierarchical association.

Parameters:
  • step_idx (int) – Current association step.

  • chunk_idx (int) – Index of the chunk being processed.

  • chunk_triads (list) – Triad groups contained within the chunk.

  • global_state (dict) – Global state containing matrices, maps, and configuration.

  • residue_tracker (ResidueTracker, optional) – Tracking object used for recording intermediate states.

Returns:

  • rebuilt_combos (dict or list or None) – Reconstructed combinations used in the next step.

  • final_graphs (list) – Graphs generated from the processed chunk.

MHCXGraph.utils.tools.rebuild_cross_combos(cross_combos: dict[dict, list[tuple[tuple, ...]]], graph_nodes)[source]

Reconstruct cross-combo structures after graph pruning.

Parameters:
  • cross_combos (dict) – Original cross-combination dictionary.

  • graph_nodes (iterable) – Nodes currently present in the graph.

Returns:

new_combos – Filtered cross-combination dictionary containing only combinations consistent with the remaining graph nodes.

Return type:

dict

MHCXGraph.utils.tools.residue_to_tuple(res)[source]
MHCXGraph.utils.tools.sym_from_packed_bool(k: int, packed: ndarray) ndarray[source]
MHCXGraph.utils.tools.sym_from_packed_float(k: int, packed: ndarray, fill_diag: float = nan) ndarray[source]
MHCXGraph.utils.tools.triad_chirality_with_cb(ca_a: ndarray, ca_b: ndarray, ca_c: ndarray, cb_a: ndarray, cb_b: ndarray, cb_c: ndarray, *, weights: tuple[float, float, float] | None = None, outward_normal: ndarray | None = None, majority_only: bool = True) dict[str, Any][source]

Compute the chirality of a residue triad using Cα and Cβ atoms.

The method estimates a pose-invariant but mirror-sensitive chirality signature based on side-chain orientation relative to the triangle defined by three Cα atoms.

Parameters:
  • ca_a (ndarray of shape (3,)) – Cartesian coordinates of Cα atoms.

  • ca_b (ndarray of shape (3,)) – Cartesian coordinates of Cα atoms.

  • ca_c (ndarray of shape (3,)) – Cartesian coordinates of Cα atoms.

  • cb_a (ndarray of shape (3,)) – Cartesian coordinates of Cβ atoms.

  • cb_b (ndarray of shape (3,)) – Cartesian coordinates of Cβ atoms.

  • cb_c (ndarray of shape (3,)) – Cartesian coordinates of Cβ atoms.

  • weights (tuple[float, float, float], optional) – Optional per-residue weights applied when averaging side-chain direction vectors.

  • outward_normal (ndarray of shape (3,), optional) – Reference outward direction used to orient side-chain vectors.

  • majority_only (bool, default=True) – If True, only side chains consistent with the majority orientation relative to the triangle normal contribute to the final direction.

Returns:

result – Dictionary containing chirality information, including handedness bit, score, side-chain consistency, and intermediate geometric vectors.

Return type:

dict

MHCXGraph.utils.tools.value_to_class(value: float, bin_width: float, threshold: float, diff_threshold: float, inverse: bool = False, upper_bound: float = 100.0, close_tolerance: float = 0.1) int | list[int] | None[source]

Assign a numeric value to one or more discretized bins.

Parameters:
  • value (float) – Numeric value to classify.

  • bin_width (float) – Width of each bin interval.

  • threshold (float) – Boundary separating lower and upper classification domains.

  • inverse (bool, default=False) – If True, classification occurs in the range [threshold, upper_bound].

  • upper_bound (float, default=100.0) – Maximum allowed value in inverse classification mode.

  • close_tolerance (float, default=0.1) – Absolute tolerance used to detect values close to bin centers.

Returns:

classes – Bin index or indices representing the classification result. Returns None if the value lies outside the allowed domain.

Return type:

int or list[int] or None

Functions

association_product(graphs_data, config)

Compute the cross-protein association product.

build_graph_from_cross_combos(cross_combos)

Build graph edges from cross-protein triad combinations.

build_threshold_vector(nodes, maps, ...)

Return the upper-triangular threshold vector instead of a full KxK matrix.

convert_edges_to_residues(edges, maps)

Convert edge representations from node indices to residue labels.

create_coherent_matrices(nodes, matrices, maps)

Compute coherence matrices across proteins using a memory-efficient streaming approach.

create_graph(edges_dict[, typeEdge, ...])

Construct NetworkX graphs from frame edge definitions.

cross_protein_triads(step_idx, chunk_idx, ...)

Generate cross-protein combinations of compatible triads.

execute_step(step_idx, graph_collection, ...)

Execute a single hierarchical association step.

filter_maps_by_nodes(data, matrices_dict[, ...])

Filter contact and RSA maps according to graph nodes.

find_class(classes, value)

Find class intervals that contain a numeric value.

find_triads(graph_data, classes, config, ...)

Identify residue triads within a protein interaction graph.

generate_frames(component_graph, matrices, ...)

Generate coherent structural frames from a component graph.

get_memory_usage_mb()

Retorna uso de memória RSS em MB, se psutil estiver disponível.

parse_node(node)

process_chunk(step_idx, chunk_idx, ...)

Process a chunk of triads during hierarchical association.

rebuild_cross_combos(cross_combos, graph_nodes)

Reconstruct cross-combo structures after graph pruning.

residue_to_tuple(res)

sym_from_packed_bool(k, packed)

sym_from_packed_float(k, packed[, fill_diag])

triad_chirality_with_cb(ca_a, ca_b, ca_c, ...)

Compute the chirality of a residue triad using Cα and Cβ atoms.

value_to_class(value, bin_width, threshold, ...)

Assign a numeric value to one or more discretized bins.