Optimize the Queries & The Object-to-Triples Mapper¶
Different functions are proposed for querying datasets using patterns. Here you will find how to:
- Automatically populate a collection of object-oriented model instances.
- Minimize memory usage and query time for large datasets.
- Work with iterators.
from biopax_explorer.pattern.rack import Rack
from biopax_explorer.pattern.pattern import PatternExecutor, Pattern
from biopax_explorer.query import EntityNode
from biopax_explorer.biopax import *
import json
Constants for database (change "localhost" for "db" if necessary)¶
datasetN = "netpath"
datasetP = "panther"
db = "http://db:3030" # with local triple store deployed with docker-compose
#db = "https://rdf-ds.genouest.org" # using an online default triple store
r = Rack()
peN = PatternExecutor(db,datasetN) # create a Pattern executor for a dataset
#peN.verbose() # display logs
peP = PatternExecutor(db,datasetP) # create a Pattern executor for a dataset
#peP.verbose() # display logs
# Function to print the URI of the 10 first lines of the results.
def simplePattern():
prot = EntityNode("P", Protein())
entityReference=EntityNode("E", EntityReference())
prot.connectedWith(entityReference, "entityReference")
prot.whereAttribute("displayName", "FRK","CONTAINS")
return p
A Pattern can be executed to extract only entity references or the all attributes and relations of each entity¶
#p = r.inComplexWith() # a pattern from the rack
by_reference=True: We force the pattern result set to hold only references¶
this is the default mode in the rack each pattern as a default mode (by_reference True or False)
res_P2 = peN.executePattern(p,by_reference=True, max_count=3)
extract entity reference only¶
because of the 'by_reference' mode, the results are not instances of the BIOPAX model classes but of the class 'PK', with 2 attributes :pk: primary key (uri) and cls, the class name The parameter 'max_count' is important when you design a new Query Pattern, to have a quick preview of the results, with only the first rows The meta_label attribute allows to retrieve an entity labelled within the Pattern
print("first results (uris and class name only: for better memory management and faster results)")
for entitylist in res_P2:
for entity in entitylist:
print("uri:%s, cls: %s, meta_label :%s, __class__:%s" %(entity.pk, entity.cls, entity.meta_label, entity.__class__))
first results (uris and class name only: for better memory management and faster results) __row__ uri:http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3, cls: Protein, meta_label :P, __class__:<class 'rdfobj.mapper.PK'> uri:http://identifiers.org/uniprot/P42685, cls: ProteinReference, meta_label :E, __class__:<class 'rdfobj.mapper.PK'> __row__ uri:http://pathwaycommons.org/pc12/#Protein_e327253efbb4440eb6664fc6e69627c4, cls: Protein, meta_label :P, __class__:<class 'rdfobj.mapper.PK'> uri:http://identifiers.org/uniprot/P42685, cls: ProteinReference, meta_label :E, __class__:<class 'rdfobj.mapper.PK'>
The executePattern function most often returns a list of lists of PK() instances that only hold URIs (PK) and class names. To obtain fully object-oriented entities, we use the fill([PK]) method. We define the selected entities with attribute values.
To extract the fully populated entities, the level parameter defines the extended neighborhood as follows:¶
level=1: only selected entities are included.
level=2: entities in direct relation with the selected entities are populated.
level=3: entities at one more level are added (this option is costly).
When using fetchEntities, all entities defined in the pattern are returned in the same row.
p = simplePattern()
res_P2 = peN.fetchEntities(p, level=1, max_count=3)
print("first results (object oriented ). instances from the BIOPAX model")
for entitylist in res_P2:
for entity in entitylist:
print("pk(uri):%s, class: %s, meta_label: %s" %(entity.pk, entity.__class__, entity.meta_label))
for entitylist in res_P2:
for entity in entitylist:
print( entity.to_json()) # a method to display the entity details using json serialization
resultref = peN.executePattern(p,by_reference=True, max_count=3)
# first extract entity references only
# for quick query and memory optimization
for eref in resultref:
for eref in reflist:
if eref.cls=="Protein" and eref.meta_label=="P":
# then populate only the selected entities
first results (object oriented ). instances from the BIOPAX model #------------------------------------------------------------- __row__ #------------------------ pk(uri):http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3, class: <class 'biopax.protein.Protein'>, meta_label: P #------------------------ pk(uri):http://identifiers.org/uniprot/P42685, class: <class 'biopax.proteinreference.ProteinReference'>, meta_label: E __row__ #------------------------ pk(uri):http://pathwaycommons.org/pc12/#Protein_e327253efbb4440eb6664fc6e69627c4, class: <class 'biopax.protein.Protein'>, meta_label: P #------------------------ pk(uri):http://identifiers.org/uniprot/P42685, class: <class 'biopax.proteinreference.ProteinReference'>, meta_label: E #------------------------------------------------------------- __row__json #------------------------ { "uri": "http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3", "dataSource": { "__uri__": "http://pathwaycommons.org/pc12/#Provenance_0cfa90c3ddb627e2e7f3af3d5bd9497d", "comment": null, "xref": null, "displayName": null, "name": null, "standardName": null }, "evidence": null, "xref": null, "availability": null, "comment": "REPLACED http://pathwaycommons.org/pc12/Protein_fe9aa3569b03eb22514796f21db8dea3", "displayName": "FRK", "name": "FRK__9606", "standardName": null, "cellularLocation": null, "feature": null, "memberPhysicalEntity": null, "notFeature": null, "entityReference": [ { "__class__": "EntityReference", "uri": "http://identifiers.org/uniprot/P42685" } ], "__class__": "Protein" } #------------------------ { "uri": "http://identifiers.org/uniprot/P42685", "comment": "REPLACED http://www.phosphosite.org/phosphosite.owl#po_1963", "entityFeature": { "__uri__": "http://pathwaycommons.org/pc12/#ModificationFeature_d45be1a02e6f260d835e1d7ce7ab48c2", "comment": null, "evidence": null, "featureLocation": null, "featureLocationType": null, "memberFeature": null }, "entityReferenceType": null, "evidence": null, "memberEntityReference": null, "xref": { "__class__": "Xref", "uri": "http://pathwaycommons.org/pc12/#UnificationXref_uniprot_knowledgebase_P42685" }, "displayName": "FRK_HUMAN", "name": "RAK", "standardName": "Tyrosine-protein kinase FRK", "organism": { "__class__": "BioSource", "uri": "http://identifiers.org/taxonomy/9606" }, "sequence": null, "__class__": "ProteinReference" } __row__json #------------------------ { "uri": "http://pathwaycommons.org/pc12/#Protein_e327253efbb4440eb6664fc6e69627c4", "dataSource": { "__uri__": "http://pathwaycommons.org/pc12/#Provenance_0cfa90c3ddb627e2e7f3af3d5bd9497d", "comment": null, "xref": null, "displayName": null, "name": null, "standardName": null }, "evidence": null, "xref": null, "availability": null, "comment": "REPLACED http://pathwaycommons.org/pc12/Protein_e327253efbb4440eb6664fc6e69627c4", "displayName": "FRK", "name": "FRK__Phosphorylation", "standardName": null, "cellularLocation": { "__class__": "CellularLocationVocabulary", "uri": "http://pathwaycommons.org/pc12/#CellularLocationVocabulary_dc68fffeee0259e0d3bd7a3f6d0cc067" }, "feature": [ { "__class__": "EntityFeature", "uri": "http://pathwaycommons.org/pc12/#ModificationFeature_1157fef5ecac6dc59c2edbc1d9370ddb" }, { "__class__": "EntityFeature", "uri": "http://pathwaycommons.org/pc12/#ModificationFeature_3288185f8cb17b9ce299453361bb19b8" } ], "memberPhysicalEntity": null, "notFeature": null, "entityReference": [ { "__class__": "EntityReference", "uri": "http://identifiers.org/uniprot/P42685" } ], "__class__": "Protein" } #------------------------ { "uri": "http://identifiers.org/uniprot/P42685", "comment": "REPLACED http://www.phosphosite.org/phosphosite.owl#po_1963", "entityFeature": { "__uri__": "http://pathwaycommons.org/pc12/#ModificationFeature_d45be1a02e6f260d835e1d7ce7ab48c2", "comment": null, "evidence": null, "featureLocation": null, "featureLocationType": null, "memberFeature": null }, "entityReferenceType": null, "evidence": null, "memberEntityReference": null, "xref": { "__class__": "Xref", "uri": "http://pathwaycommons.org/pc12/#UnificationXref_uniprot_knowledgebase_P42685" }, "displayName": "FRK_HUMAN", "name": "RAK", "standardName": "Tyrosine-protein kinase FRK", "organism": { "__class__": "BioSource", "uri": "http://identifiers.org/taxonomy/9606" }, "sequence": null, "__class__": "ProteinReference" }
using meta_label to select the desired entities¶
This is particulary usefull when you have different entities of the same class returned in a result row of a Pattern query
p = simplePattern()
res_P2 = peN.fetchEntities(p, level=1, max_count=1)
print("#----we can now process the related Protein 'P' selected within each row not only by its class name but as well by its meta_label--------------------")
for entitylist in res_P2:
for entity in entitylist:
if entity.meta_label in ["P"]:
print( entity.to_json())
#----we can now process the related Protein 'P' selected within each row not only by its class name but as well by its meta_label-------------------- __row__json { "uri": "http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3", "dataSource": { "__uri__": "http://pathwaycommons.org/pc12/#Provenance_0cfa90c3ddb627e2e7f3af3d5bd9497d", "comment": null, "xref": null, "displayName": null, "name": null, "standardName": null }, "evidence": null, "xref": null, "availability": null, "comment": "REPLACED http://pathwaycommons.org/pc12/Protein_fe9aa3569b03eb22514796f21db8dea3", "displayName": "FRK", "name": "FRK__9606", "standardName": null, "cellularLocation": null, "feature": null, "memberPhysicalEntity": null, "notFeature": null, "entityReference": [ { "__class__": "EntityReference", "uri": "http://identifiers.org/uniprot/P42685" } ], "__class__": "Protein" }