001 exploring oz dev log

001 - Exploring Australian Curriculum - Dev log#

Cleaning data to sqlite-utils and Datasette#

Following the cleaning data tutorial from DataSette to convert the Excel spreadsheet provided into a database. The spreadsheet contains multiple sheets.

Sheet/table	Description
CD Elb-table	Includes the details on the content descriptors - the core data
CD CCP tagging	Link content descriptors to cross-curriculum priorities
CD GC tagging	Link content descriptors to general capabilities
AS-table	The achievement standards
Copyright table	Copyright information, including note that Oz Curriculum is released under Creative Commons

VSCode - network of tools helpful

With the CSV files added to the repo, I was able to use VSCode to view the files. The plugins integrated into VSCode are helpful. The assemblage of tools being helpful, but not immediately available to others.

sqlite-utils to create database#

To create initial version of the database

sqlite-utils insert oz_curriculum.db content_descriptors \
  "../data/F-10 CD Elb-Table 1.csv" --csv -d

All the fields of the CSV become TEXT columns. sqlite-utils can be used to query and examine the table, but time to use it in Datasette.

sqlite-utils also offers ways to transform the data and the database.

Initial Datasette visualisation of Australian Curriculum (click to see larger)

Datasette and immediate exploration#

Datasette automatically identifies facets through which to examine the data through the lens of Learning Area, Pathway, Sequence, level, Topic, Depth Study, and Elective.

For example, learning area helps identify the 9 learning areas in the data and the number of rows for each. Each of these learning areas (e.g. Mathematics) can be drilled down. For example, drilling down on Mathematics and Datasette breaks down the 447 rows into year levels - year 7 having the most with 51!

Learning area	# rows
Languages 12,612
Humanities and Social Sciences 1,513
English	788
Science	698
The Arts	684
Technologies	468
Mathematics	447
Health and Physical Education	426
Work Studies	175

Datasette plugins add other visualisation options - e.g. map location data.

sqlite-utils for tidying up the database#

Exploration reveals some questions, more detail in the data and its structure

Should the content descriptors table be split further?#

Initially, I thought I wouldn't do this, but when I tried to make CdCode the primary key it failed, because...

Use Datasettes ability to run custom SQL and identify that some CdCodes are appear in multiple rows. Yes, because some content descriptors have multiple elaborations (ACADAM001)

The data appears to be v8.4 - not 9

Sadly, the Excel file (based on the content descriptor codes) is for the v8.4, not v9.

Ended up suggesting a need for a fair bit of splitting. Leading to a shell script to automate the process of combining (perhaps inefficiently) multiple sqlite-utils commands to produce a more correct database with the following schema.

CREATE TABLE "content_descriptors" (
   [id] INTEGER,
   [CdCode] TEXT PRIMARY KEY,
   [ContentDesc] TEXT
);
CREATE TABLE "elaborations" (
   [CdCode] TEXT REFERENCES [content_descriptors]([CdCode]),
   [Elaboration] TEXT
);
CREATE TABLE "learning_areas" (
   [LearningArea] TEXT,
   [Subject] TEXT,
   [Pathway] TEXT,
   [Sequence] TEXT,
   [Level] TEXT,
   [Strand] TEXT,
   [Substrand] TEXT,
   [Topic] TEXT,
   [Depth Study] TEXT,
   [Elective] TEXT,
   [CdCode] TEXT REFERENCES [content_descriptors]([CdCode])
);
CREATE TABLE "achievement_standards" (
   [LearningArea] TEXT,
   [Subject] TEXT,
   [Pathway] TEXT,
   [Sequence] TEXT,
   [Level] TEXT,
   [AchStd] TEXT
);
CREATE TABLE "general_capabilities" (
   [LearningArea] TEXT,
   [Subject] TEXT,
   [Pathway] TEXT,
   [Sequence] TEXT,
   [Level] TEXT,
   [Strand] TEXT,
   [Substrand] TEXT,
   [CdCode] TEXT REFERENCES [content_descriptors]([CdCode]),
   [ContentDesc] TEXT,
   [GC] TEXT,
   [Element] TEXT,
   [Subelement] TEXT
);
CREATE TABLE "cross_curriculum_priorities" (
   [LearningArea] TEXT,
   [Subject] TEXT,
   [Pathway] TEXT,
   [Sequence] TEXT,
   [Level] TEXT,
   [Strand] TEXT,
   [Substrand] TEXT,
   [CdCode] TEXT REFERENCES [content_descriptors]([CdCode]),
   [ContentDesc] TEXT
   [CCP] TEXT
);

There is a JSON API built into Datasette. Once deployed you could make calls. Appears to be table based.

There is a graphql plugin...nice. Install the plugin via pip, restart datasette and the interface is modified to include option to use GraphiQL to interact with

And there are various options for publishing and deploying and DataSette cloud is designed to make that easier.