PAX2GRAPHML python package documentation

Indices and tables

pax2graphml.pax_import

This module contains function to manipulate BIOPAX and GRAPHML files.

pax_import.annotation_dict(alias_file)[source]

Create a dictionary from an annotation json file:

Parameters

alias_file -- annotation json file

Returns

annotation dictionary

Return type

dict

pax_import.biopax_filter(biopax_file, datasources, output_file='output.owl')[source]

Remove Datasources from a BIOPAX file. The process use PAXTOOLS, need a lot of memory and can be slow for big BIOPAX files:

Parameters
  • biopax_file -- input BIOPAX files

  • datasources -- list of datasources to exclude

  • output_file -- output BIOPAX file

Returns

void

Return type

None

pax_import.biopax_merge(biopax_list, output_file='output.owl')[source]

Merge multiple BIOPAX (RDSF/XML) files. The process use PAXTOOLS, need a lot of memory and can be slow for big BIOPAX files:

Parameters
  • biopax_list -- a list of input BIOPAX files to be merged

  • output_file -- output BIOPAX file

Returns

void

Return type

None

pax_import.biopax_to_reaction_graph(biopax_file, graphml_file, black_list=None, control_mode=2)[source]

Generate a reaction graph with binary interactions as a GRAPHML file from a BIOPAX file. Tte BIOPAX file is filtered, keeping only the regulation part (metabolism and genes). The process use PAXTOOLS, need a lot of memory and can be slow for big BIOPAX files:

Parameters
  • biopax_file -- input BIOPAX file

  • graphml_file -- output GRAPHML reaction file

  • black_list -- entity black_list (e.g. hubs, h2o...)

  • control_mode -- control mecanism model representation, use default (2) for compressed control representation, or 1 for extended with entity duplication

Returns

void

Return type

None

pax_import.influence_graph(input_graph, output_graph, output_image)[source]

generate an influence graph as a graphml file from a checked reaction graphml file

Parameters
  • input_graph -- input graphml file containing the raw reaction graph

  • output_graph -- output graphml file containing the checked reaction graph

Returns

void

Return type

None

pax_import.influence_subgraph(input_graph, output_graph, output_image, min_node, max_node)[source]

Generate an influence graph as a graphml file from a checked reaction graphml file. The graph is generated from one connected component:

Parameters
  • input_graph -- input graphml file containing the raw reaction graph

  • output_graph -- output graphml file containing the checked reaction graph

  • output_image -- output png file for graph visualization

  • min_node -- minimum node count of tne connected component

  • max_node -- maximum node count of tne connected component

Returns

void

Return type

None

pax_import.join_annotation(g, alias_file, annot_field, dest_field, default_val, property_type='string')[source]

Populate a new node property with values extracted from an annotation json file:

Parameters
  • properties_file -- json annotation file

  • annot_field -- field in json data dictionary to be processed

  • dest_field -- new property name

  • default_val -- property default value for None

  • property_type -- new property type

Returns

annotation dictionary

Return type

dict

pax_import.name_alias(biopax_file, output_file='entities_aliases.json', opt='--uri-ids')[source]

#generate a json file with annottaions extracted from a BIOPAX file. The process use PAXTOOLS :

Parameters
  • biopax_file -- input BIOPAX files

  • output_file -- output json file

  • opt -- output generation options

Returns

void

Return type

None

pax_import.prepare_spaim(input_graph, output_graph, output_image, checkInvertP=True)[source]

generate an checked reaction graph from a raw reaction graph:

Parameters
  • input_graph -- input graphml file containing the raw reaction graph

  • output_graph -- output graphml file containing the checked reaction graph

Returns

void

Return type

None

pax_import.reaction_to_influence_graph(reaction_graph)[source]

generate an influence graph from a checked reaction graph:

Parameters

reaction_graph -- checked reaction graph

Returns

influence graph

Return type

graph object

pax2graphml.extract

This module contains function to extract graph and connected components from a graph in graphml and assemble such graphs

extract.connected_component_by_annotation(g, targ, annot_keys, add_void=False, extend_to_cc=True)[source]

Extract a subgraph build by merging a subset of connected components of the original graph. Each connected component contains nodes with properties matching provided values:

Parameters
  • g -- a graph

  • targ -- a list of values that must match the values of the properties defined by annot_keys

  • annot_keys -- a list of properties

  • add_void -- a boolean defining if we keeo void values.void values will have the string value ""

  • extend_to_cc -- if true add all members of a connectes compoenent which, at least one node matches a properties value defined in targ

Returns

a subgraph

Return type

graph

extract.define_boolean_filter(gr, att, val, usecase=True)[source]

Create a boolean filter property:

Parameters
  • gr -- a graph

  • att -- a existing property name

  • val -- a value for the property att. Each node with this property value will be selected by the filter. Optionaly, val can be a list of values

:param usecase:if False, when val is a string, upper or lower strings will match :return: a filter as a dictionnary (node index,True/False) :rtype: dict

extract.filter_by_node_attribute(gr, att, val)[source]

Create a subgraph where the nodes matches a property value:

Parameters
  • gr -- a graph

  • att -- an existing property name

  • val -- a value for the property att. Each node with this property value will be selected by the filter

Returns

a subgraph

Return type

graph

extract.filter_from_boolean_filter(gr, vfilter)[source]

Create a subgraph using a filter:

Parameters
  • gr -- a graph

  • vfilter -- a filter as a dictionnary (node index,True/False)

Returns

a subgraph

Return type

graph

extract.largest_connected_component(g, directed=False)[source]

Select the largest connected component:

Parameters

g -- a graph

Returns

a subgraph

Return type

graph

extract.merge_graph(gr1, gr2, properties_list, add_void=False, caseSensitive=True)[source]

Merge two graphs. all nodes of both graphs, that share the same value of a list of properties are merged:

Parameters
  • gr1 -- first graph

  • gr2 -- second graph

  • properties_list -- a list of node properties

  • add_void -- if True, nodes with void values are merged

  • caseSensitive -- if True, the value match is case sensitive

Returns

the merged graph

Return type

graph

extract.merge_node_by_property(gr1, properties_list, add_void=False, caseSensitive=True)[source]

Merge all nodes of a graph, that share the same value of a list of properties:

Parameters
  • gr -- a graph

  • properties_list -- a list of node properties

  • caseSensitive -- if True, the value match is case sensitive

Returns

the modified graph

Return type

graph

extract.merge_nodes(gr, first_node, second_node, remove_node=True)[source]

Merge two nodes of a graph, presserving edges and properties:

Parameters
  • gr -- a graph

  • first_node -- first node to ne merged

  • second_node -- second node to ne merged

  • remove_node -- if True, remove first_node

Returns

the modified graph

Return type

graph

extract.remove_largest_cc(g, directed=False)[source]

Remove the largest connected component:

Parameters

g -- a graph

Returns

a subgraph

Return type

graph

extract.sub_graph_by_value(g, targets, annot_keys, add_void=False, void_symbol=None)[source]

Extract a subgraph list. Each graph contains nodes with properties matching provided values:

Parameters
  • g -- a graph

  • targets -- a list of values that must match the values of the properties defined by annot_keys

  • annot_keys -- a list of properties

  • add_void -- a boolean defining if we keeo void values

  • void_symbol -- void_symbol represents the void value symbol, used if add_void is true

Returns

a list of dictionnary where the key "subgraph" represents the subgraph

Return type

list

extract.sub_graph_filter(g, iteration_count, central_node, direction=<built-in function all>, node_limit=None, neighbour_count=None)[source]

define a subgraph filter according to parameters:

Parameters
  • gr -- a graph

  • iteration_count -- number of iterations

  • central_node -- selected node id

  • direction -- edge direction (all,in,out)

  • node_limit -- maximum node number

  • neighbour_count -- number of neighbours

Returns

the filter

Return type

dict

extract.subgraph_by_direction(g, iteration_count, chosen_node_id=None, direction='all', node_limit=None, neighbour_count=None)[source]

extract a subgraph according to parameters:

Parameters
  • gr -- a graph

  • iteration_count -- number of iterations

  • chosen_node_id -- selected node id

  • direction -- edge direction (all,in,out)

  • node_limit -- maximum node number

  • neighbour_count -- number of neighbours

Returns

the subgraph

Return type

graph

extract.subgraph_by_node(input_graph, output_graph, nodeid, direction='in', neighbour_count=3)[source]

extract a connected component holding a node specified by node id:

Parameters
  • input_graph -- input graphml file

  • output_graph -- output graphml file

  • nodeid -- selected node id

  • direction -- edge direction (all,in,out)

  • neighbour_count -- number of neighbours

Returns

the subgraph

Return type

graph

extract.subgraphs_by_datasource(g, add_void=False, void_symbol='')[source]

Extract a subgraph list. Each graph contains nodes with datasource/provider matching input values:

Parameters
  • g -- a graph

  • add_void -- a boolean defining if we keep void values

  • void_symbol -- void_symbol represents the void value symbol, used if add_void is true

Returns

a list of dictionnary where the key "subgraph" represents the subgraph

Return type

list

pax2graphml.properties

This module contains function to manipulate edge and node properties

properties.annot_edge_from_file(g, annot_file, map_key, new_prop, new_prop_type='string', delimiter=',')[source]

Populate the edges with a new property. The values of the property are extract from a tabular file:

Parameters
  • g -- a graph

  • annot_file -- the tabular annotation file

  • map_key -- the node property holding the primary key that must be present as a named column in the file. 'index' references the edge index (from 0)

  • new_prop -- the new property to be created that must be present as a named column in the file

  • new_prop_type -- type of the new property ('string','int', 'float', 'long','bool')

  • delimiter -- tabular file delimiter

Return type

void

properties.annot_edge_to_file(g, output_prop_file, key_prop, annot_prop, defval=None, excluded_keys=[None, ''], delimiter=',')[source]

Export two properties to a tabular file. The first property act as a key to identify the edge, the second as an additionnal annotaion attribute. The Unicity of the key is not tested :

Parameters
  • g -- a graph

  • output_prop_file -- the tabular output file

  • key_prop -- the key edge property that will be present as a named column in the file. 'index' references the edge index (from 0)

  • annot_prop -- the additionnal property to be exported as a named column in the file

  • defval -- value to replace None values in output

  • excluded_keys -- list of key_prop values that will be excluded

  • delimiter -- tabular file delimiter

Return type

void

properties.annot_node_from_file(g, annot_file, map_key, new_prop, new_prop_type='string', delimiter=',')[source]

Populate the nodes with a new property. The values of the property are extract from a tabular file:

Parameters
  • g -- a graph

  • annot_file -- the tabular annotation file

  • map_key -- the node property holding the primary key that must be present as a named column in the file. 'index' references the node index (from 0)

  • new_prop -- the new property to be created that must be present as a named column in the file

  • new_prop_type -- type of the new property ('string','int', 'float', 'long','bool')

  • delimiter -- tabular file delimiter

Return type

void

properties.annot_node_to_file(g, output_prop_file, key_prop, annot_prop, defval=None, excluded_keys=[None, ''], delimiter=',')[source]

Export two properties to a tabular file. The first property act as a key to identify the node, the second as an additionnal annotaion attribute. The Unicity of the key is not tested :

Parameters
  • g -- a graph

  • output_prop_file -- the tabular output file

  • key_prop -- the key node property that will be present as a named column in the file. 'index' references the node index (from 0)

  • annot_prop -- the additionnal property to be exported as a named column in the file

  • defval -- value to replace None values in output

  • excluded_keys -- list of key_prop values that will be excluded

  • delimiter -- tabular file delimiter

Return type

void

properties.change_property_type(g, property_name, property_type, entity='node')[source]

change a node or edge property type :

Parameters
  • g -- a graph

  • property_name -- the name of the property to be affected

  • property_type -- the new primitive property type (string,int,bool,float,double), is None, the intial property is replaced

  • entity -- define is the properties are related to nodes or edges

Returns

void

properties.client_annot_impl(prot, conf=None)[source]

Configure a mart for Uniprot to GO annotation "":

Parameters
  • prot -- list of Uniprot gene symbols

  • conf -- the configuration dictionary

Returns

a dictionary of annotations

Return type

dict

properties.copy_edge_properties(g, source_edge, target_edge)[source]

Copy all properties of a source edge to a target edge:

Parameters
  • g -- a graph

  • source_edge -- source node

  • target_edge -- target node

Return type

void

properties.copy_node_properties(g, sourceNode, targetNode)[source]

Copy all properties of a source node to a target node:

Parameters
  • g -- a graph

  • sourceNode -- source node

  • targetNode -- target node

Return type

void

properties.count_edges_by_values(gr, att)[source]

Count edges for each value of an input property:

Parameters
  • gr -- a graph

  • att -- a existing property name

Returns

a dictionnary (property value/count)

Return type

dict

properties.count_nodes_by_values(gr, att)[source]

Count nodes for each value of an input property:

Parameters
  • gr -- a graph

  • att -- a existing property name

Returns

a dictionnary (property value/count)

Return type

dict

properties.create_property_from_map(g, annot_map, primary_key, new_property, case_sensitive=False)[source]

Create a new node property from a dictionary "":

Parameters
  • g -- a graph

  • annot_map -- a dictionary

  • primary_key -- the primary key property (e.g. uri, uniprot...)

  • new_property -- the new property name. The expected type is 'object'

  • case_sensitive -- define if the primary key mapping is case sensitive or not

Returns

void

properties.defaultNodeValue(gr, prop, default_val)[source]

asign a userdefined value to a node property when it is None ar equal to "":

Parameters
  • gr -- a graph

  • prop -- an existing property name

  • default_val -- the value to be used to replace None and "" string

Returns

void

Return type

None

properties.default_edge_value(gr, prop, default_val)[source]

asign a userdefined value to an edge property when it is None ar equal to "":

Parameters
  • gr -- a graph

  • prop -- an existing property name

  • default_val -- the value to be used to replace None and "" string

Returns

void

Return type

None

properties.define_biomart_server(url, mart_name)[source]

define a biomart server "":

Parameters
  • url -- the url of the biomartserver

  • mart_name -- the mart name

Returns

mart

Return type

mart Object

properties.describe_properties(g, name=None)[source]

Return a description of node and edge properties with names and types:

Parameters
  • g -- a graph

  • name -- property name (optional). If None, all properties are described

Returns

a description of edge and node properties

Return type

string

properties.edge_property_values(g, annot_key)[source]

Return a list of unique values corresponding to an existing edge property:

Parameters
  • g -- a graph

  • annot_key -- an existing propety name

Returns

a list of edge property values

Return type

list

properties.ensembl_api(in_list, conf=None, chunck_size=50)[source]

Configure a mart for any annotation "":

Parameters
  • in_list -- list of inputs identifiers

  • conf -- the configuration dictionary

  • chunck_size -- the size of each chunk of inputs to be submitted in one time

Returns

a dictionary of annotations

Return type

dict

properties.is_unique(g, key_prop, exclude_void=True)[source]

Evaluate if a property contains one unique value for each node:

Parameters
  • g -- a graph

  • key_prop -- the key node property to be evaluated

  • exclude_void -- define is we include None values

Return type

boolean

properties.list_to_string_property(g, string_prop, new_property=None, sep=';', entity='node')[source]

Convert a property contains a list to a concatened string property, for each node or edge:

Parameters
  • g -- a graph

  • string_prop -- initial property

  • new_property -- new property name, is None, the intial property is replaced

  • sep -- string separator used in the string property

  • entity -- define is the properties are related to nodes or edges

Returns

void

properties.node_property_values(g, annot_key)[source]

Return a list of unique values corresponding to an existing node property:

Parameters
  • g -- a graph

  • annot_key -- an existing propety name

Returns

a list of node property values

Return type

list

properties.populate_color(g, colors=None)[source]

define the node colors:

Parameters
  • g -- a graph

  • color -- the target property name

  • colors -- optional dictionnary of existing values (dict keys) assopiated with the new values (dict values)

Returns

count of modified nodes

properties.populate_shape(g, shapes=None)[source]

define the node shapes:

Parameters
  • g -- a graph

  • color -- the target property name

  • shapes -- optional dictionnary of existing values (dict keys) associated with the new values (dict values)

Returns

count of modified nodes

properties.property_values(g, annot_key)[source]

Alias of node_property_values

properties.replace_property_values(g, prop_name, map_values, entity_type='node')[source]

Replace the values of a property by the specified values:

Parameters
  • g -- a graph

  • prop_name -- the target property name

  • map_values -- dictionnary of existing values (dict keys) assopiated with the new values (dict values)

  • entity_type -- related entity type, "node" for node, "edge" for edge

Returns

count of modified entities

properties.string_to_list_property(g, string_prop, new_property=None, sep=';', entity='node')[source]

Convert a string property to a property contains a list, for each node or edge:

Parameters
  • g -- a graph

  • string_prop -- initial property

  • new_property -- new property name, is None, the intial property is replaced

  • sep -- string separator used in the string property

  • entity -- define is the properties are related to nodes or edges

Returns

void

properties.uniprot_to_go(protein_list, conf=None, chunck_size=50)[source]

Configure a mart for Uniprot to GO annotation "":

Parameters
  • prot -- list of Uniprot gene symbols

  • conf -- the configuration dictionary

  • chunck_size -- the size of each chunk of inputs to be submitted in one time

Returns

a dictionary of annotations

Return type

dict

pax2graphml.graph_explore

This module contains function to read and write graphml and compute topological statistics on graphs

graph_explore.color_edges(g)[source]

Generate a edge color property that Differentiates the edge semantic (subtsrat, product, activator, inhibitor, modulator) using the spaim edge property (s,p,a,i,m):

Parameters

g -- a graph instance

Returns

void

Return type

None

graph_explore.color_nodes(g)[source]

Generate a node color property that Differentiates the node entities (reaction, chemical):

Parameters

g -- a graph instance

Returns

void

Return type

None

graph_explore.compute_betweenness(g)[source]

Compute the graph Betweenness:

Parameters

g -- a graph instance

Returns

dictionary holding the metrics data

Return type

dict

graph_explore.compute_closeness(g)[source]

Compute the graph Closeness:

Parameters

g -- a graph instance

Returns

dictionary holding the metrics data

Return type

dict

graph_explore.compute_graph_metrics(g)[source]

Compute multiple topological graph metrics (degree distribution, betweenness, pagerank, closeness):

Parameters

g -- a graph instance

Returns

dictionary holding the metrics data

Return type

dict

graph_explore.compute_page_rank(g)[source]

Compute the graph PageRank:

Parameters

g -- a graph instance

Returns

dictionary holding the metrics data

Return type

dict

graph_explore.degree_distribution(g)[source]

Generate the distribution of degrees of the node of the graph:

Parameters

g -- a graph instance

Returns

distibution of the degrees of the nodes

Return type

DataFrame object

graph_explore.describe_graph(g)[source]

Return a string describing the graph will all edges and nodes with properties values :

Parameters

g -- a graph

Return type

string

graph_explore.graphml_xml_string(graphml_file, ids=1, entity='node')[source]

Return the XML content extract of the graphml file:

Parameters
  • graphml_file -- graphml file path

  • ids -- an intger or list or integers that correspondn to the id attribute values of the selected entities

  • entity -- "edge" or ""node" value to define which entity should be selected

Returns

an XML string

Return type

string

graph_explore.largest_cc_degree_dist(g)[source]

Generate the distribution of degrees of the nodes of the largest connected component:

Parameters

g -- a graph instance

Returns

distibution of the degrees of the nodes

Return type

DataFrame object

graph_explore.load_graphml(graphml_file, directed=True)[source]

Return a graph instance from a GRAPHML file :

Parameters
  • graphml_file -- a graphml file

  • directed -- a boolean that defines if the edges of the graph are oriented

Returns

graph

Return type

graph object

graph_explore.save_graphml(g, graphml_file, friendly=False)[source]

Save a graph instance as a graphml file:

Parameters
  • g -- a graph instance

  • graphml_file -- graphml output file path

Returns

void

Return type

None

graph_explore.save_image(g, image_file, size=3000, conf=None)[source]

Generate an image from a graph instance :

Parameters
  • g -- a graph instance

  • image_file -- png file path

  • size -- image size

  • conf -- image configuration dictionary with nodelabel and edgelabel keys

Returns

void

Return type

None

graph_explore.save_yed_graphml(g, graphmlOutFile)[source]

save graphml file enriched by graphics to be displayed by yEd editor

Parameters
  • g -- a graph instance

  • graphml_file -- a graphml file

Returns

void

Return type

None

graph_explore.summary(g)[source]

Return a string with graph nodes count and edges count :

Parameters

g -- a graph

Return type

string

pax2graphml.utils

This module contains utilitary functions related to graph and file manipulation, package management and execution.

utils.cc_by_node_count(g, min, max)[source]
select a sub graph from a graph, using the minimum and maximum node number

of each connected component as as a filter

Parameters
  • g -- a graph

  • min -- minimum node count of each connected component

  • max -- maximum node count of each connected component

Returns

a subgraph

Return type

graph

utils.color_range_hexa(color_number=20)[source]
utils.count_edge(n, mode='all')[source]

compute edges count from a selected node

Parameters
  • n -- graph node

  • mode -- count mode. values :"all","in", "out"

Returns

the edges count

Return type

int

utils.data_path()[source]

return the data folder path with example datasets

Returns

a string representing the data folder path containing example data files like BIOPAX

utils.defineXmx(xmx)[source]

redefine xmx java parameter for lareg BIOPAX file processing

utils.edge_description(g, e)[source]

return a string giving al details from an edge, incuding source an target description

Returns

a string

utils.edge_list(g)[source]

return a simple list of all edges of a graph (without iterator)

Returns

a list of edges

utils.edge_to_string(gh, e, sep='\n')[source]

return a string representing all the properties values from an edge

Returns

a string

utils.friendly_format_graphml(graphml_file, usetemp=False)[source]

modify in place a graphml_file to have more human readable properties data key

Parameters

graphml_file -- a graphml file file folling the graph.tools generation rules

utils.node_list(g)[source]

return a simple list of all nodes of a graph (without iterator)

Returns

a list of nodes

utils.node_shape_to_color(code, colors)[source]

convert biopax type numeric code as defined in shape node property to yEd compatible shape name

utils.node_to_string(gh, n, sep='\n')[source]

return a string representing all the properties values from a node

Returns

a string

utils.resource_path()[source]

return the resources path

Returns

a string representing the resource path containing additional files like template and jar

utils.spaim_edge_label(code)[source]

convert spaim code as defined in spaim edge property to human readable labels

utils.to_string(gh, n, sep='\n')[source]

alias of node_to_string

Returns

a string