Skip to content

Exploring Oz Curriculum - Dev log 2#

See also: exploring-australian-curriculum

Tasks#

  • Download and import v9 of the Australian Curriculum
  • Identify components of v9 curriculum
  • Explore notions of ontology, RDF, and graph databases
  • What ontologies re: mathematical concepts exist?
  • Design how to link with memex

v9 curriculum - download and import#

Download#

Provided in HTML, RDF/XML, JSON LD, and a SPARQL endpoint. The HTML version is a just a page with a description and link to the RDF.

Organised into the following categories each with individual downloads for separate values:

  • Learning Areas
  • General Capability
  • Cross-curriulum priorities

Questions

  1. Which version to use: RDF, JSON LD?

    RDF appears the core version and Python has RDFLib. sqlite-utils can import JSON, but doesn't appear to have anything explicit for JSON-LD. Suggesting a bit more work, which may be required for RDF anyway. 2. How to get those into sqllite? 3. Other options for querying/interacting with the data?

Experiment with Mathematics learning area#

Download RDF and JSON files

RDF initial explorations not great. Wondering if JSON LD might be more recent/accessible.

Going to have to learn more about rdf

Explore use of rdf-to-sqlite

rdf-to-sqlite python module to convert RDF file into SQLite database tables (inspiried by datasette related work)

Does require identification of the RDF serialisation format. This suggests that RDF/XML is the most common and what is used by the given file

rdf-to-sqlite v9.db ../data/v9/MAT.rdf --format xml --context https://schema.org/docs/jsonldcontext.jsonld
Generated v9.db (33604 rows in 26 tables)

That's a lot of rows and tables. Suggesting a possible limitation. Illustrated in the following table. Not a lot of curriculum specific structure visible in the tables.

Part of the challenge will be disentangling the complexity of the vocabulary etc. and transforming that into something specific. Putting the context back into data, reducing the reusability. reusability-paradox

Digging deeper#

RDF triplets consist of subject, predictate, object. The subject is unique and may have multipled predicates. Write a script that groups all the predicates for a specific subject.

Hash keyed on subject value, with members object and predicate

ontospy#

ontospy - Python module with various options to convert RDF graph into representations. None appear to work well with the MAT.rdf file.

Starting again - 3 Dec#

  • Summarise RDF and Python
  • Learn how to use RDFLib

Summary#

rdf.py - is able to parse the MAT.rdf file using rdflib and do some basic exploration. No understanding no my part of what's happening.

Learning - early explorations#

Serialize a graph - turn this into the dumpGraph function

rdf-basics

The graph consists of

  • subject
  • predicate
  • object

Gephi Visualisation#

Apparently gephi will import/visualise RDF graphs. -- didn't work

construct{ ?x ?r ?y } where { ?x ?r ?y }

Mapping out content#

Manual digging in

Content description (e.g. label AC9M7N01)

  • found in tags <statementNotation> - which is a property
  • the value is in <dcterms:title>
  • dcterms:description also useful for expanded information
  • also <nominalYearLevel> grade level
  • <statementLabel> - has the value "Content Description"

Uses <hasChild> and <isChildOf> to form - not a standard for RDF?

statementLabel values

  • 1 Learning Area

    1 learning area (maths) should be the parent of all?

  • 1 Subject

    1 subject (mathematics) should be the parent of all? What's the relationship between learnign area and subject

  • 11 Level

    The year level - perhaps the next parent level?

  • 11 Achievement Standard

    1 achievement standard per year (Prep to Year 10)

  • 185 Achievement Standard Component

  • 240 Content Description

    Individual content descriptors - should be children of??

  • 996 Elaboration

    Each content description may have multiple elaborations

  • 63 Strand

    That seems to be quite a lot - what is this?

In the RDF, statementLabel is a predicate with the object being the literal and the the actual object given it's unique identifier - in turn it has other predicates

rdflib.term.URIRef('http://vocabulary.curriculum.edu.au/MRAC/2023/07/LA/MAT/4353387d-39f5-4222-bf3b-357193bb0221') ----- predicate rdflib.term.URIRef('http://purl.org/ASN/schema/core/statementLabel') ----- object rdflib.term.Literal('Content Description', lang='en-au')

each rdf:Description#

Seems related to a particular subject and then specifies a list of properties?

Each subject has a unique identifier (a long hypenated hex number) and various common properties that belong to different "groupings" (ontologies???)

  • rdf

  • rdf:about - the identifier

  • rdf:type - the type of thing it is - can have multipled

    Question What are the values for this

  • Dublin Core

  • dcterms:title - the title of the thing

  • dcterms:educationLevel - using Oz Curriculum vocab link
  • dcterms:isPartOf -
  • dcterm:modified -

  • ?? unresolved

  • authorityStats

  • skillEmbodied
  • statementNotation
  • statementLabel
  • nominalYearLevel

The groupings (may) belong to namespaces included at the beginning of the RDF file

Name space Description
RDF RDF vocabulary
RDFS RDF Schema vocabulary
skos Simple Knowledge Organization System
skosxl SKOS eXtension for Labels
owl Web Ontology Language
dc Dublin Core
dcterms Dublin Core Terms
xsd XML Schema
tags Tags Ontology
cycAnnot Cyc Annotations
foaf Friend of a Friend
csw CSW Ontology
dbpedia DBpedia
freebase Freebase
opencyc OpenCyc
cyc Cyc
ctag Common Tag