Optimize the Queries & The Object-to-Triples Mapper¶

Different functions are proposed for querying datasets using patterns. Here you will find how to:

  • Automatically populate a collection of object-oriented model instances.
  • Minimize memory usage and query time for large datasets.
  • Work with iterators.
imports¶
In [1]:
from biopax_explorer.pattern.rack import Rack
from biopax_explorer.pattern.pattern import PatternExecutor, Pattern
from biopax_explorer.query import  EntityNode
from biopax_explorer.biopax import *
import json
Constants for database (change "localhost" for "db" if necessary)¶
In [2]:
datasetN = "netpath"
datasetP = "panther"
db = "http://db:3030" # with local triple store deployed with docker-compose
#db = "https://rdf-ds.genouest.org" # using an online default triple store



r = Rack()
peN = PatternExecutor(db,datasetN) # create a Pattern executor for a dataset
#peN.verbose() # display logs
peP = PatternExecutor(db,datasetP) # create a Pattern executor for a dataset
#peP.verbose() # display logs

# Function to print the URI of the 10 first lines of the results.
 

def simplePattern():
  p=Pattern()    
  prot = EntityNode("P", Protein())
  entityReference=EntityNode("E", EntityReference())
  prot.connectedWith(entityReference, "entityReference")
  prot.whereAttribute("displayName", "FRK","CONTAINS")
  p.define(prot,entityReference)
  return p

A Pattern can be executed to extract only entity references or the all attributes and relations of each entity¶

In [3]:
 

#p = r.inComplexWith() # a pattern from the rack
p=simplePattern()

by_reference=True: We force the pattern result set to hold only references¶

this is the default mode in the rack each pattern as a default mode (by_reference True or False)

In [4]:
res_P2 = peN.executePattern(p,by_reference=True, max_count=3)

extract entity reference only¶

because of the 'by_reference' mode, the results are not instances of the BIOPAX model classes but of the class 'PK', with 2 attributes :pk: primary key (uri) and cls, the class name The parameter 'max_count' is important when you design a new Query Pattern, to have a quick preview of the results, with only the first rows The meta_label attribute allows to retrieve an entity labelled within the Pattern

In [5]:
print("first  results (uris and class name only: for better memory management and faster results)")
 
for entitylist in res_P2:
  print("__row__")
  for entity in entitylist:
   print("uri:%s,  cls: %s, meta_label :%s, __class__:%s" %(entity.pk, entity.cls, entity.meta_label, entity.__class__))
 
first  results (uris and class name only: for better memory management and faster results)
__row__
uri:http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3,  cls: Protein, meta_label :P, __class__:<class 'rdfobj.mapper.PK'>
uri:http://identifiers.org/uniprot/P42685,  cls: ProteinReference, meta_label :E, __class__:<class 'rdfobj.mapper.PK'>
__row__
uri:http://pathwaycommons.org/pc12/#Protein_e327253efbb4440eb6664fc6e69627c4,  cls: Protein, meta_label :P, __class__:<class 'rdfobj.mapper.PK'>
uri:http://identifiers.org/uniprot/P42685,  cls: ProteinReference, meta_label :E, __class__:<class 'rdfobj.mapper.PK'>

The executePattern function most often returns a list of lists of PK() instances that only hold URIs (PK) and class names. To obtain fully object-oriented entities, we use the fill([PK]) method. We define the selected entities with attribute values.

To extract the fully populated entities, the level parameter defines the extended neighborhood as follows:¶

level=1: only selected entities are included.
level=2: entities in direct relation with the selected entities are populated.
level=3: entities at one more level are added (this option is costly).

When using fetchEntities, all entities defined in the pattern are returned in the same row.

In [6]:
p = simplePattern()
res_P2 = peN.fetchEntities(p, level=1, max_count=3)

print("first  results  (object oriented ). instances from the BIOPAX model")
print("#-------------------------------------------------------------") 
for entitylist in res_P2:
  print("__row__")
  for entity in entitylist:
   print("#------------------------")
   print("pk(uri):%s, class: %s, meta_label: %s" %(entity.pk, entity.__class__, entity.meta_label))
print("#-------------------------------------------------------------")
for entitylist in res_P2:
  print("__row__json")
  for entity in entitylist:
   print("#------------------------")
   print( entity.to_json())  # a method to display the entity details using json serialization


###############################################################

resultref = peN.executePattern(p,by_reference=True, max_count=3)
reflist=[]
# first extract entity references only
# for quick query and memory optimization
for eref in resultref:
  for eref in reflist:
   if eref.cls=="Protein" and eref.meta_label=="P":
      reflist.append(eref)
# then populate only the selected entities
result=peN.fill(reflist,level=1)
first  results  (object oriented ). instances from the BIOPAX model
#-------------------------------------------------------------
__row__
#------------------------
pk(uri):http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3, class: <class 'biopax.protein.Protein'>, meta_label: P
#------------------------
pk(uri):http://identifiers.org/uniprot/P42685, class: <class 'biopax.proteinreference.ProteinReference'>, meta_label: E
__row__
#------------------------
pk(uri):http://pathwaycommons.org/pc12/#Protein_e327253efbb4440eb6664fc6e69627c4, class: <class 'biopax.protein.Protein'>, meta_label: P
#------------------------
pk(uri):http://identifiers.org/uniprot/P42685, class: <class 'biopax.proteinreference.ProteinReference'>, meta_label: E
#-------------------------------------------------------------
__row__json
#------------------------
{
  "uri": "http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3",
  "dataSource": {
    "__uri__": "http://pathwaycommons.org/pc12/#Provenance_0cfa90c3ddb627e2e7f3af3d5bd9497d",
    "comment": null,
    "xref": null,
    "displayName": null,
    "name": null,
    "standardName": null
  },
  "evidence": null,
  "xref": null,
  "availability": null,
  "comment": "REPLACED http://pathwaycommons.org/pc12/Protein_fe9aa3569b03eb22514796f21db8dea3",
  "displayName": "FRK",
  "name": "FRK__9606",
  "standardName": null,
  "cellularLocation": null,
  "feature": null,
  "memberPhysicalEntity": null,
  "notFeature": null,
  "entityReference": [
    {
      "__class__": "EntityReference",
      "uri": "http://identifiers.org/uniprot/P42685"
    }
  ],
  "__class__": "Protein"
}
#------------------------
{
  "uri": "http://identifiers.org/uniprot/P42685",
  "comment": "REPLACED http://www.phosphosite.org/phosphosite.owl#po_1963",
  "entityFeature": {
    "__uri__": "http://pathwaycommons.org/pc12/#ModificationFeature_d45be1a02e6f260d835e1d7ce7ab48c2",
    "comment": null,
    "evidence": null,
    "featureLocation": null,
    "featureLocationType": null,
    "memberFeature": null
  },
  "entityReferenceType": null,
  "evidence": null,
  "memberEntityReference": null,
  "xref": {
    "__class__": "Xref",
    "uri": "http://pathwaycommons.org/pc12/#UnificationXref_uniprot_knowledgebase_P42685"
  },
  "displayName": "FRK_HUMAN",
  "name": "RAK",
  "standardName": "Tyrosine-protein kinase FRK",
  "organism": {
    "__class__": "BioSource",
    "uri": "http://identifiers.org/taxonomy/9606"
  },
  "sequence": null,
  "__class__": "ProteinReference"
}
__row__json
#------------------------
{
  "uri": "http://pathwaycommons.org/pc12/#Protein_e327253efbb4440eb6664fc6e69627c4",
  "dataSource": {
    "__uri__": "http://pathwaycommons.org/pc12/#Provenance_0cfa90c3ddb627e2e7f3af3d5bd9497d",
    "comment": null,
    "xref": null,
    "displayName": null,
    "name": null,
    "standardName": null
  },
  "evidence": null,
  "xref": null,
  "availability": null,
  "comment": "REPLACED http://pathwaycommons.org/pc12/Protein_e327253efbb4440eb6664fc6e69627c4",
  "displayName": "FRK",
  "name": "FRK__Phosphorylation",
  "standardName": null,
  "cellularLocation": {
    "__class__": "CellularLocationVocabulary",
    "uri": "http://pathwaycommons.org/pc12/#CellularLocationVocabulary_dc68fffeee0259e0d3bd7a3f6d0cc067"
  },
  "feature": [
    {
      "__class__": "EntityFeature",
      "uri": "http://pathwaycommons.org/pc12/#ModificationFeature_1157fef5ecac6dc59c2edbc1d9370ddb"
    },
    {
      "__class__": "EntityFeature",
      "uri": "http://pathwaycommons.org/pc12/#ModificationFeature_3288185f8cb17b9ce299453361bb19b8"
    }
  ],
  "memberPhysicalEntity": null,
  "notFeature": null,
  "entityReference": [
    {
      "__class__": "EntityReference",
      "uri": "http://identifiers.org/uniprot/P42685"
    }
  ],
  "__class__": "Protein"
}
#------------------------
{
  "uri": "http://identifiers.org/uniprot/P42685",
  "comment": "REPLACED http://www.phosphosite.org/phosphosite.owl#po_1963",
  "entityFeature": {
    "__uri__": "http://pathwaycommons.org/pc12/#ModificationFeature_d45be1a02e6f260d835e1d7ce7ab48c2",
    "comment": null,
    "evidence": null,
    "featureLocation": null,
    "featureLocationType": null,
    "memberFeature": null
  },
  "entityReferenceType": null,
  "evidence": null,
  "memberEntityReference": null,
  "xref": {
    "__class__": "Xref",
    "uri": "http://pathwaycommons.org/pc12/#UnificationXref_uniprot_knowledgebase_P42685"
  },
  "displayName": "FRK_HUMAN",
  "name": "RAK",
  "standardName": "Tyrosine-protein kinase FRK",
  "organism": {
    "__class__": "BioSource",
    "uri": "http://identifiers.org/taxonomy/9606"
  },
  "sequence": null,
  "__class__": "ProteinReference"
}

using meta_label to select the desired entities¶

This is particulary usefull when you have different entities of the same class returned in a result row of a Pattern query

In [7]:
p = simplePattern()


res_P2 = peN.fetchEntities(p, level=1, max_count=1)
print("#----we can now process the related Protein 'P' selected within each row not only by its class name but as well by its meta_label--------------------")

for entitylist in res_P2:  
  print("__row__json")
  for entity in entitylist:
      if entity.meta_label in ["P"]:
        print( entity.to_json())  
#----we can now process the related Protein 'P' selected within each row not only by its class name but as well by its meta_label--------------------
__row__json
{
  "uri": "http://pathwaycommons.org/pc12/#Protein_fe9aa3569b03eb22514796f21db8dea3",
  "dataSource": {
    "__uri__": "http://pathwaycommons.org/pc12/#Provenance_0cfa90c3ddb627e2e7f3af3d5bd9497d",
    "comment": null,
    "xref": null,
    "displayName": null,
    "name": null,
    "standardName": null
  },
  "evidence": null,
  "xref": null,
  "availability": null,
  "comment": "REPLACED http://pathwaycommons.org/pc12/Protein_fe9aa3569b03eb22514796f21db8dea3",
  "displayName": "FRK",
  "name": "FRK__9606",
  "standardName": null,
  "cellularLocation": null,
  "feature": null,
  "memberPhysicalEntity": null,
  "notFeature": null,
  "entityReference": [
    {
      "__class__": "EntityReference",
      "uri": "http://identifiers.org/uniprot/P42685"
    }
  ],
  "__class__": "Protein"
}
In [ ]:
 
In [ ]: