PAX2GRAPHML python package documentation¶
Indices and tables¶
pax2graphml.pax_import¶
This module contains function to manipulate BIOPAX and GRAPHML files.
-
pax_import.
annotation_dict
(alias_file)[source]¶ Create a dictionary from an annotation json file:
- Parameters
alias_file -- annotation json file
- Returns
annotation dictionary
- Return type
dict
-
pax_import.
biopax_filter
(biopax_file, datasources, output_file='output.owl')[source]¶ Remove Datasources from a BIOPAX file. The process use PAXTOOLS, need a lot of memory and can be slow for big BIOPAX files:
- Parameters
biopax_file -- input BIOPAX files
datasources -- list of datasources to exclude
output_file -- output BIOPAX file
- Returns
void
- Return type
None
-
pax_import.
biopax_merge
(biopax_list, output_file='output.owl')[source]¶ Merge multiple BIOPAX (RDSF/XML) files. The process use PAXTOOLS, need a lot of memory and can be slow for big BIOPAX files:
- Parameters
biopax_list -- a list of input BIOPAX files to be merged
output_file -- output BIOPAX file
- Returns
void
- Return type
None
-
pax_import.
biopax_to_reaction_graph
(biopax_file, graphml_file, black_list=None, control_mode=2)[source]¶ Generate a reaction graph with binary interactions as a GRAPHML file from a BIOPAX file. Tte BIOPAX file is filtered, keeping only the regulation part (metabolism and genes). The process use PAXTOOLS, need a lot of memory and can be slow for big BIOPAX files:
- Parameters
biopax_file -- input BIOPAX file
graphml_file -- output GRAPHML reaction file
black_list -- entity black_list (e.g. hubs, h2o...)
control_mode -- control mecanism model representation, use default (2) for compressed control representation, or 1 for extended with entity duplication
- Returns
void
- Return type
None
-
pax_import.
influence_graph
(input_graph, output_graph, output_image)[source]¶ generate an influence graph as a graphml file from a checked reaction graphml file
- Parameters
input_graph -- input graphml file containing the raw reaction graph
output_graph -- output graphml file containing the checked reaction graph
- Returns
void
- Return type
None
-
pax_import.
influence_subgraph
(input_graph, output_graph, output_image, min_node, max_node)[source]¶ Generate an influence graph as a graphml file from a checked reaction graphml file. The graph is generated from one connected component:
- Parameters
input_graph -- input graphml file containing the raw reaction graph
output_graph -- output graphml file containing the checked reaction graph
output_image -- output png file for graph visualization
min_node -- minimum node count of tne connected component
max_node -- maximum node count of tne connected component
- Returns
void
- Return type
None
-
pax_import.
join_annotation
(g, alias_file, annot_field, dest_field, default_val, property_type='string')[source]¶ Populate a new node property with values extracted from an annotation json file:
- Parameters
properties_file -- json annotation file
annot_field -- field in json data dictionary to be processed
dest_field -- new property name
default_val -- property default value for None
property_type -- new property type
- Returns
annotation dictionary
- Return type
dict
-
pax_import.
name_alias
(biopax_file, output_file='entities_aliases.json', opt='--uri-ids')[source]¶ #generate a json file with annottaions extracted from a BIOPAX file. The process use PAXTOOLS :
- Parameters
biopax_file -- input BIOPAX files
output_file -- output json file
opt -- output generation options
- Returns
void
- Return type
None
-
pax_import.
prepare_spaim
(input_graph, output_graph, output_image, checkInvertP=True)[source]¶ generate an checked reaction graph from a raw reaction graph:
- Parameters
input_graph -- input graphml file containing the raw reaction graph
output_graph -- output graphml file containing the checked reaction graph
- Returns
void
- Return type
None
pax2graphml.extract¶
This module contains function to extract graph and connected components from a graph in graphml and assemble such graphs
-
extract.
connected_component_by_annotation
(g, targ, annot_keys, add_void=False, extend_to_cc=True)[source]¶ Extract a subgraph build by merging a subset of connected components of the original graph. Each connected component contains nodes with properties matching provided values:
- Parameters
g -- a graph
targ -- a list of values that must match the values of the properties defined by annot_keys
annot_keys -- a list of properties
add_void -- a boolean defining if we keeo void values.void values will have the string value ""
extend_to_cc -- if true add all members of a connectes compoenent which, at least one node matches a properties value defined in targ
- Returns
a subgraph
- Return type
graph
-
extract.
define_boolean_filter
(gr, att, val, usecase=True)[source]¶ Create a boolean filter property:
- Parameters
gr -- a graph
att -- a existing property name
val -- a value for the property att. Each node with this property value will be selected by the filter. Optionaly, val can be a list of values
:param usecase:if False, when val is a string, upper or lower strings will match :return: a filter as a dictionnary (node index,True/False) :rtype: dict
-
extract.
filter_by_node_attribute
(gr, att, val)[source]¶ Create a subgraph where the nodes matches a property value:
- Parameters
gr -- a graph
att -- an existing property name
val -- a value for the property att. Each node with this property value will be selected by the filter
- Returns
a subgraph
- Return type
graph
-
extract.
filter_from_boolean_filter
(gr, vfilter)[source]¶ Create a subgraph using a filter:
- Parameters
gr -- a graph
vfilter -- a filter as a dictionnary (node index,True/False)
- Returns
a subgraph
- Return type
graph
-
extract.
largest_connected_component
(g, directed=False)[source]¶ Select the largest connected component:
- Parameters
g -- a graph
- Returns
a subgraph
- Return type
graph
-
extract.
merge_graph
(gr1, gr2, properties_list, add_void=False, caseSensitive=True)[source]¶ Merge two graphs. all nodes of both graphs, that share the same value of a list of properties are merged:
- Parameters
gr1 -- first graph
gr2 -- second graph
properties_list -- a list of node properties
add_void -- if True, nodes with void values are merged
caseSensitive -- if True, the value match is case sensitive
- Returns
the merged graph
- Return type
graph
-
extract.
merge_node_by_property
(gr1, properties_list, add_void=False, caseSensitive=True)[source]¶ Merge all nodes of a graph, that share the same value of a list of properties:
- Parameters
gr -- a graph
properties_list -- a list of node properties
caseSensitive -- if True, the value match is case sensitive
- Returns
the modified graph
- Return type
graph
-
extract.
merge_nodes
(gr, first_node, second_node, remove_node=True)[source]¶ Merge two nodes of a graph, presserving edges and properties:
- Parameters
gr -- a graph
first_node -- first node to ne merged
second_node -- second node to ne merged
remove_node -- if True, remove first_node
- Returns
the modified graph
- Return type
graph
-
extract.
remove_largest_cc
(g, directed=False)[source]¶ Remove the largest connected component:
- Parameters
g -- a graph
- Returns
a subgraph
- Return type
graph
-
extract.
sub_graph_by_value
(g, targets, annot_keys, add_void=False, void_symbol=None)[source]¶ Extract a subgraph list. Each graph contains nodes with properties matching provided values:
- Parameters
g -- a graph
targets -- a list of values that must match the values of the properties defined by annot_keys
annot_keys -- a list of properties
add_void -- a boolean defining if we keeo void values
void_symbol -- void_symbol represents the void value symbol, used if add_void is true
- Returns
a list of dictionnary where the key "subgraph" represents the subgraph
- Return type
list
-
extract.
sub_graph_filter
(g, iteration_count, central_node, direction=<built-in function all>, node_limit=None, neighbour_count=None)[source]¶ define a subgraph filter according to parameters:
- Parameters
gr -- a graph
iteration_count -- number of iterations
central_node -- selected node id
direction -- edge direction (all,in,out)
node_limit -- maximum node number
neighbour_count -- number of neighbours
- Returns
the filter
- Return type
dict
-
extract.
subgraph_by_direction
(g, iteration_count, chosen_node_id=None, direction='all', node_limit=None, neighbour_count=None)[source]¶ extract a subgraph according to parameters:
- Parameters
gr -- a graph
iteration_count -- number of iterations
chosen_node_id -- selected node id
direction -- edge direction (all,in,out)
node_limit -- maximum node number
neighbour_count -- number of neighbours
- Returns
the subgraph
- Return type
graph
-
extract.
subgraph_by_node
(input_graph, output_graph, nodeid, direction='in', neighbour_count=3)[source]¶ extract a connected component holding a node specified by node id:
- Parameters
input_graph -- input graphml file
output_graph -- output graphml file
nodeid -- selected node id
direction -- edge direction (all,in,out)
neighbour_count -- number of neighbours
- Returns
the subgraph
- Return type
graph
-
extract.
subgraphs_by_datasource
(g, add_void=False, void_symbol='')[source]¶ Extract a subgraph list. Each graph contains nodes with datasource/provider matching input values:
- Parameters
g -- a graph
add_void -- a boolean defining if we keep void values
void_symbol -- void_symbol represents the void value symbol, used if add_void is true
- Returns
a list of dictionnary where the key "subgraph" represents the subgraph
- Return type
list
pax2graphml.properties¶
This module contains function to manipulate edge and node properties
-
properties.
annot_edge_from_file
(g, annot_file, map_key, new_prop, new_prop_type='string', delimiter=',')[source]¶ Populate the edges with a new property. The values of the property are extract from a tabular file:
- Parameters
g -- a graph
annot_file -- the tabular annotation file
map_key -- the node property holding the primary key that must be present as a named column in the file. 'index' references the edge index (from 0)
new_prop -- the new property to be created that must be present as a named column in the file
new_prop_type -- type of the new property ('string','int', 'float', 'long','bool')
delimiter -- tabular file delimiter
- Return type
void
-
properties.
annot_edge_to_file
(g, output_prop_file, key_prop, annot_prop, defval=None, excluded_keys=[None, ''], delimiter=',')[source]¶ Export two properties to a tabular file. The first property act as a key to identify the edge, the second as an additionnal annotaion attribute. The Unicity of the key is not tested :
- Parameters
g -- a graph
output_prop_file -- the tabular output file
key_prop -- the key edge property that will be present as a named column in the file. 'index' references the edge index (from 0)
annot_prop -- the additionnal property to be exported as a named column in the file
defval -- value to replace None values in output
excluded_keys -- list of key_prop values that will be excluded
delimiter -- tabular file delimiter
- Return type
void
-
properties.
annot_node_from_file
(g, annot_file, map_key, new_prop, new_prop_type='string', delimiter=',')[source]¶ Populate the nodes with a new property. The values of the property are extract from a tabular file:
- Parameters
g -- a graph
annot_file -- the tabular annotation file
map_key -- the node property holding the primary key that must be present as a named column in the file. 'index' references the node index (from 0)
new_prop -- the new property to be created that must be present as a named column in the file
new_prop_type -- type of the new property ('string','int', 'float', 'long','bool')
delimiter -- tabular file delimiter
- Return type
void
-
properties.
annot_node_to_file
(g, output_prop_file, key_prop, annot_prop, defval=None, excluded_keys=[None, ''], delimiter=',')[source]¶ Export two properties to a tabular file. The first property act as a key to identify the node, the second as an additionnal annotaion attribute. The Unicity of the key is not tested :
- Parameters
g -- a graph
output_prop_file -- the tabular output file
key_prop -- the key node property that will be present as a named column in the file. 'index' references the node index (from 0)
annot_prop -- the additionnal property to be exported as a named column in the file
defval -- value to replace None values in output
excluded_keys -- list of key_prop values that will be excluded
delimiter -- tabular file delimiter
- Return type
void
-
properties.
change_property_type
(g, property_name, property_type, entity='node')[source]¶ change a node or edge property type :
- Parameters
g -- a graph
property_name -- the name of the property to be affected
property_type -- the new primitive property type (string,int,bool,float,double), is None, the intial property is replaced
entity -- define is the properties are related to nodes or edges
- Returns
void
-
properties.
client_annot_impl
(prot, conf=None)[source]¶ Configure a mart for Uniprot to GO annotation "":
- Parameters
prot -- list of Uniprot gene symbols
conf -- the configuration dictionary
- Returns
a dictionary of annotations
- Return type
dict
-
properties.
copy_edge_properties
(g, source_edge, target_edge)[source]¶ Copy all properties of a source edge to a target edge:
- Parameters
g -- a graph
source_edge -- source node
target_edge -- target node
- Return type
void
-
properties.
copy_node_properties
(g, sourceNode, targetNode)[source]¶ Copy all properties of a source node to a target node:
- Parameters
g -- a graph
sourceNode -- source node
targetNode -- target node
- Return type
void
-
properties.
count_edges_by_values
(gr, att)[source]¶ Count edges for each value of an input property:
- Parameters
gr -- a graph
att -- a existing property name
- Returns
a dictionnary (property value/count)
- Return type
dict
-
properties.
count_nodes_by_values
(gr, att)[source]¶ Count nodes for each value of an input property:
- Parameters
gr -- a graph
att -- a existing property name
- Returns
a dictionnary (property value/count)
- Return type
dict
-
properties.
create_property_from_map
(g, annot_map, primary_key, new_property, case_sensitive=False)[source]¶ Create a new node property from a dictionary "":
- Parameters
g -- a graph
annot_map -- a dictionary
primary_key -- the primary key property (e.g. uri, uniprot...)
new_property -- the new property name. The expected type is 'object'
case_sensitive -- define if the primary key mapping is case sensitive or not
- Returns
void
-
properties.
defaultNodeValue
(gr, prop, default_val)[source]¶ asign a userdefined value to a node property when it is None ar equal to "":
- Parameters
gr -- a graph
prop -- an existing property name
default_val -- the value to be used to replace None and "" string
- Returns
void
- Return type
None
-
properties.
default_edge_value
(gr, prop, default_val)[source]¶ asign a userdefined value to an edge property when it is None ar equal to "":
- Parameters
gr -- a graph
prop -- an existing property name
default_val -- the value to be used to replace None and "" string
- Returns
void
- Return type
None
-
properties.
define_biomart_server
(url, mart_name)[source]¶ define a biomart server "":
- Parameters
url -- the url of the biomartserver
mart_name -- the mart name
- Returns
mart
- Return type
mart Object
-
properties.
describe_properties
(g, name=None)[source]¶ Return a description of node and edge properties with names and types:
- Parameters
g -- a graph
name -- property name (optional). If None, all properties are described
- Returns
a description of edge and node properties
- Return type
string
-
properties.
edge_property_values
(g, annot_key)[source]¶ Return a list of unique values corresponding to an existing edge property:
- Parameters
g -- a graph
annot_key -- an existing propety name
- Returns
a list of edge property values
- Return type
list
-
properties.
ensembl_api
(in_list, conf=None, chunck_size=50)[source]¶ Configure a mart for any annotation "":
- Parameters
in_list -- list of inputs identifiers
conf -- the configuration dictionary
chunck_size -- the size of each chunk of inputs to be submitted in one time
- Returns
a dictionary of annotations
- Return type
dict
-
properties.
is_unique
(g, key_prop, exclude_void=True)[source]¶ Evaluate if a property contains one unique value for each node:
- Parameters
g -- a graph
key_prop -- the key node property to be evaluated
exclude_void -- define is we include None values
- Return type
boolean
-
properties.
list_to_string_property
(g, string_prop, new_property=None, sep=';', entity='node')[source]¶ Convert a property contains a list to a concatened string property, for each node or edge:
- Parameters
g -- a graph
string_prop -- initial property
new_property -- new property name, is None, the intial property is replaced
sep -- string separator used in the string property
entity -- define is the properties are related to nodes or edges
- Returns
void
-
properties.
node_property_values
(g, annot_key)[source]¶ Return a list of unique values corresponding to an existing node property:
- Parameters
g -- a graph
annot_key -- an existing propety name
- Returns
a list of node property values
- Return type
list
-
properties.
populate_color
(g, colors=None)[source]¶ define the node colors:
- Parameters
g -- a graph
color -- the target property name
colors -- optional dictionnary of existing values (dict keys) assopiated with the new values (dict values)
- Returns
count of modified nodes
-
properties.
populate_shape
(g, shapes=None)[source]¶ define the node shapes:
- Parameters
g -- a graph
color -- the target property name
shapes -- optional dictionnary of existing values (dict keys) associated with the new values (dict values)
- Returns
count of modified nodes
-
properties.
replace_property_values
(g, prop_name, map_values, entity_type='node')[source]¶ Replace the values of a property by the specified values:
- Parameters
g -- a graph
prop_name -- the target property name
map_values -- dictionnary of existing values (dict keys) assopiated with the new values (dict values)
entity_type -- related entity type, "node" for node, "edge" for edge
- Returns
count of modified entities
-
properties.
string_to_list_property
(g, string_prop, new_property=None, sep=';', entity='node')[source]¶ Convert a string property to a property contains a list, for each node or edge:
- Parameters
g -- a graph
string_prop -- initial property
new_property -- new property name, is None, the intial property is replaced
sep -- string separator used in the string property
entity -- define is the properties are related to nodes or edges
- Returns
void
-
properties.
uniprot_to_go
(protein_list, conf=None, chunck_size=50)[source]¶ Configure a mart for Uniprot to GO annotation "":
- Parameters
prot -- list of Uniprot gene symbols
conf -- the configuration dictionary
chunck_size -- the size of each chunk of inputs to be submitted in one time
- Returns
a dictionary of annotations
- Return type
dict
pax2graphml.graph_explore¶
This module contains function to read and write graphml and compute topological statistics on graphs
-
graph_explore.
color_edges
(g)[source]¶ Generate a edge color property that Differentiates the edge semantic (subtsrat, product, activator, inhibitor, modulator) using the spaim edge property (s,p,a,i,m):
- Parameters
g -- a graph instance
- Returns
void
- Return type
None
-
graph_explore.
color_nodes
(g)[source]¶ Generate a node color property that Differentiates the node entities (reaction, chemical):
- Parameters
g -- a graph instance
- Returns
void
- Return type
None
-
graph_explore.
compute_betweenness
(g)[source]¶ Compute the graph Betweenness:
- Parameters
g -- a graph instance
- Returns
dictionary holding the metrics data
- Return type
dict
-
graph_explore.
compute_closeness
(g)[source]¶ Compute the graph Closeness:
- Parameters
g -- a graph instance
- Returns
dictionary holding the metrics data
- Return type
dict
-
graph_explore.
compute_graph_metrics
(g)[source]¶ Compute multiple topological graph metrics (degree distribution, betweenness, pagerank, closeness):
- Parameters
g -- a graph instance
- Returns
dictionary holding the metrics data
- Return type
dict
-
graph_explore.
compute_page_rank
(g)[source]¶ Compute the graph PageRank:
- Parameters
g -- a graph instance
- Returns
dictionary holding the metrics data
- Return type
dict
-
graph_explore.
degree_distribution
(g)[source]¶ Generate the distribution of degrees of the node of the graph:
- Parameters
g -- a graph instance
- Returns
distibution of the degrees of the nodes
- Return type
DataFrame object
-
graph_explore.
describe_graph
(g)[source]¶ Return a string describing the graph will all edges and nodes with properties values :
- Parameters
g -- a graph
- Return type
string
-
graph_explore.
graphml_xml_string
(graphml_file, ids=1, entity='node')[source]¶ Return the XML content extract of the graphml file:
- Parameters
graphml_file -- graphml file path
ids -- an intger or list or integers that correspondn to the id attribute values of the selected entities
entity -- "edge" or ""node" value to define which entity should be selected
- Returns
an XML string
- Return type
string
-
graph_explore.
largest_cc_degree_dist
(g)[source]¶ Generate the distribution of degrees of the nodes of the largest connected component:
- Parameters
g -- a graph instance
- Returns
distibution of the degrees of the nodes
- Return type
DataFrame object
-
graph_explore.
load_graphml
(graphml_file, directed=True)[source]¶ Return a graph instance from a GRAPHML file :
- Parameters
graphml_file -- a graphml file
directed -- a boolean that defines if the edges of the graph are oriented
- Returns
graph
- Return type
graph object
-
graph_explore.
save_graphml
(g, graphml_file, friendly=False)[source]¶ Save a graph instance as a graphml file:
- Parameters
g -- a graph instance
graphml_file -- graphml output file path
- Returns
void
- Return type
None
-
graph_explore.
save_image
(g, image_file, size=3000, conf=None)[source]¶ Generate an image from a graph instance :
- Parameters
g -- a graph instance
image_file -- png file path
size -- image size
conf -- image configuration dictionary with nodelabel and edgelabel keys
- Returns
void
- Return type
None
pax2graphml.utils¶
This module contains utilitary functions related to graph and file manipulation, package management and execution.
-
utils.
cc_by_node_count
(g, min, max)[source]¶ - select a sub graph from a graph, using the minimum and maximum node number
of each connected component as as a filter
- Parameters
g -- a graph
min -- minimum node count of each connected component
max -- maximum node count of each connected component
- Returns
a subgraph
- Return type
graph
-
utils.
count_edge
(n, mode='all')[source]¶ compute edges count from a selected node
- Parameters
n -- graph node
mode -- count mode. values :"all","in", "out"
- Returns
the edges count
- Return type
int
-
utils.
data_path
()[source]¶ return the data folder path with example datasets
- Returns
a string representing the data folder path containing example data files like BIOPAX
-
utils.
edge_description
(g, e)[source]¶ return a string giving al details from an edge, incuding source an target description
- Returns
a string
-
utils.
edge_list
(g)[source]¶ return a simple list of all edges of a graph (without iterator)
- Returns
a list of edges
-
utils.
edge_to_string
(gh, e, sep='\n')[source]¶ return a string representing all the properties values from an edge
- Returns
a string
-
utils.
friendly_format_graphml
(graphml_file, usetemp=False)[source]¶ modify in place a graphml_file to have more human readable properties data key
- Parameters
graphml_file -- a graphml file file folling the graph.tools generation rules
-
utils.
node_list
(g)[source]¶ return a simple list of all nodes of a graph (without iterator)
- Returns
a list of nodes
-
utils.
node_shape_to_color
(code, colors)[source]¶ convert biopax type numeric code as defined in shape node property to yEd compatible shape name
-
utils.
node_to_string
(gh, n, sep='\n')[source]¶ return a string representing all the properties values from a node
- Returns
a string
-
utils.
resource_path
()[source]¶ return the resources path
- Returns
a string representing the resource path containing additional files like template and jar