Above is what I tweeted ... and then people asked for the link... So I have to write the recipe in a blog. Hope this answers the questions.
The result is a Web User Interface (and REST) to do SPARQL on the SNOMED CT ontology.
The ingredients:
- SNOMED CT RF2 files - We have a license and receive the CD version.
- Java 8 <https://www.oracle.com/java/>
- Git <https://git-scm.com/downloads>
- Apache Maven <https://maven.apache.org/download.cgi>
- IHTSDO snomed-publish <https://github.com/IHTSDO/snomed-publish>
- Apache Jena & Fuseki <https://jena.apache.org/>
- Patience or fast machines ;-)
The machine:
- I used a VM 2.8Ghz / 4Gb - I guess 8Gb is better, but this worked for me
Directions:
1. Preparations
The IHTSDO snomed-publish package contains a component called the rdfs-exporter that can convert the RF2 format into Tripples. The Triples file is needed to populate the Triple Store (TDB) that Fuseki uses.
- Make sure you have Java 8 installed
- Get your hands on the RF2 files from SNOMED CT (I used the "snapshot" version)
- Delta contains SNOMED CT changes since the last release
- Full contains full history of SNOMED CT since 2002
- Snapshot contains the latest version of SNOMED CT
- Download and install Git
- I like the command line tools, but you could also go for the GUI version SourceTree <https://www.sourcetreeapp.com/>
- Download and install Maven
2. Build IHTSDO snomed-publish
Some background information:
https://github.com/IHTSDO/snomed-publish/blob/master/config/README.md
https://github.com/IHTSDO/snomed-publish/tree/master/client/rdfs-export-main
Some package don't compile, I don't know why. We only need rdfs-export-main, but it depends on some of the other components. Luckily Maven has the "-fn" option, that makes maven not stop when a component fails.
Open a cmd box and go to the snomed-publish folder and run:
> mvn clean install -fn
You will see a lot of output and some errors. Just ignore.
Now go to client/rdfs-export folder and run:
> mvn install
The result will be the rdfs-export.jar file in the client/rdfs-export-main/target folder.
3. Now we need the RF2 files to convert into Triples
Go the the RF2 folder and run (fill in the jar ):
> java -Xms4000m -jar /rdfs-export.jar -c sct2_Concept_Snapshot_INT_20150731.txt –d sct2_Description_Snapshot_INT_20150731.txt -t sct2_Relationship_Snapshot_INT_20150731.txt -if RF2 -of N3 -o sct.n3
The output will look something like this:
4a.
Use N3 can be directly used bij jena tdbloader! ?? rdfparse still seems nessesary..
> rdfparse sct.n3 > sct-pure.n3
Create tripple database from SCT N3 file...
> tbdloader –-loc=d:\work\TBD sct-pure.n3
5. Start Fuseki
> fuseki-server –-loc=d:\work\TDB /sct
N.B.
Default Shiro config of Fuseki stops Fuseki from showing any datasets when accessed not from localhost. Simpy edit run/shiro.ini and change “/$/** = localhostFilter” to “/$/** = anon” does the trick.
Now you can browse to localhost:3030 and you should see something like this:
Click on "query" and have fun sparql-ing SNOMED CT!
Some SPARXL examples:
Get properties and values of a specific class "Procedure (procedure)"
PREFIX sct:
PREFIX rdfs:
SELECT ?property ?obj
WHERE {
sct:71388002 ?property ?obj .
}
Check if a class is a kind of "Procedure (procedure)" regardless of path length.
SELECT ?property ?obj
WHERE {
sct:250404007 ?property ?obj .
sct:250404007 rdfs:subClassOf+ sct:71388002 .
}