Hello, Quarto

Markdown

Markdown is an easy to read and write text format:

It’s plain text so works well with version control
It can be rendered into HTML, PDF, and more
Learn more at: https://quarto.org/docs/authoring/

Code Cell

Here is a Python code cell.

Install the VS Code Python Extension to enable running this cell interactively.

import os
os.cpu_count()

Equation

Use LaTeX to write equations:

\[ \chi' = \sum_{i=1}^n k_i s_i^2 \]

Observable JS

You can insert Observable code snippets in your Quarto document.

Representing data

With Observable you can easily draw data representations, such as this histogram:

height = 250
numbers = Float64Array.from({length: 1000}, d3.randomNormal())
import {chart as histogram} with {numbers as data, height} from "@d3/histogram@261"

histogram

Manipulating data

First, load the data with the FileAttachment method. We’ll be using a dataset from the New Zealand’s official data agency, specifically the Occupied and unoccupaied dwellings census:

Correctly format your data!

Right now, integer columns are formatted as string, they should converted to a number. For the sake of simplicity, we’ll be applying the {typed: true} option when parsing the CSV files.

data = FileAttachment("../data/Data8278.csv").csv({typed: true})
// Print the data columns
data.columns

We can represent the data based on its columns. We can have a look a its data as a Table by invoking the Inputs.table method. Also, as Tables are view-compatible, we can store it in a viewof variable for further usage:

// Create the view-compatible object. This will automatically render it
viewof data_table = Inputs.table(data)

Calling the previous viewof object will show the info related to the selected rows in the table:

// Call the viewof object and select some rows from the previous table
data_table

Right now the available fields aren’t very descriptive, they are just IDs of some other lookup tables. We can use join operations to overwrite them.

First load in separate tables each lookup table:

dwell_rec_type_lookup = FileAttachment("../data/DimenLookupDwellRecType8278.csv").csv({typed: true})

// Load the DwellStatus lookup table
dwell_status_lookup = FileAttachment("../data/DimenLookupDwellStatus8278.csv").csv({typed: true})

// Load the Area lookup Table
area_lookup = FileAttachment("../data/DimenLookupArea8278.csv").csv({typed: true})

Check its contents:

Dwell Rec Type lookup:

viewof dwell_rec_type_lookup_table = Inputs.table(dwell_rec_type_lookup)

Dwell Status lookup:

viewof dwell_status_lookup_table = Inputs.table(dwell_status_lookup)

Area lookup:

viewof area_lookup_table = Inputs.table(area_lookup)

As explained in the official ObservableHQ page, we can use a more performant form of join:

// create the join function
function join(lookupTable, mainTable, lookupKey, mainKey, select) {
    var l = lookupTable.length,
        m = mainTable.length,
        lookupIndex = [],
        output = [];
    for (var i = 0; i < l; i++) { // loop through the lookupTable items
        var row = lookupTable[i];
        lookupIndex[row[lookupKey]] = row; // create an index for lookup table
    }

    for (var j = 0; j < m; j++) { // loop through the mainTable items
        var y = mainTable[j];
        var x = lookupIndex[y[mainKey]]; // get corresponding row from lookupTable
        output.push(select(y, x)); // select only the columns you need
    }
    return output;
}

Let’s try it with a simple example first, as shown in the page:

articles = [{
    "id": 1,
    "name": "vacuum cleaner",
    "weight": 9.9,
    "price": 89.9,
    "brand_id": 2
}, {
    "id": 2,
    "name": "washing machine",
    "weight": 540,
    "price": 230,
    "brand_id": 1
}, {
    "id": 3,
    "name": "hair dryer",
    "weight": 1.2,
    "price": 24.99,
    "brand_id": 2
}, {
    "id": 4,
    "name": "super fast laptop",
    "weight": 400,
    "price": 899.9,
    "brand_id": 3
}]

// Create a brands lookup table
brands = [{
    "id": 1,
    "name": "SuperKitchen"
}, {
    "id": 2,
    "name": "HomeSweetHome"
}]

// Call the join function
join(brands, articles, "id", "brand_id", function(article, brand) {
    return {
        id: article.id,
        name: article.name,
        weight: article.weight,
        price: article.price,
        brand: (brand !== undefined) ? brand.name : null
    };
})

Now that we know that the function works, we join each lookup table with the main data table one at a time:

first_join_table = join(dwell_rec_type_lookup, data, "Code", "DwellRecType", function(item, dwell_rec_type) {
    return {
        Year: item.Year,
        DwellRecType: item.DwellRecType,
        DwellRec: dwell_rec_type.Description,
        DwellStatus: item.DwellStatus,
        Area: item.Area,
        Count: item.Count
    };
})

// Join Dwell Status with first join
status_join_table = join(dwell_status_lookup, first_join_table, "Code", "DwellStatus", function(item, dwell_status) {
    return {
        Year: item.Year,
        DwellRecType: item.DwellRecType,
        DwellRec: item.DwellRec,
        DwellStatus: item.DwellStatus,
        DwellStatusDesc: dwell_status.Description,
        Area: item.Area,
        Count: item.Count
    };
})

// Join Area description with second join
area_join_table = join(area_lookup, status_join_table, "Code", "Area", function(item, area) {
    return {
        Year: item.Year,
        DwellRecType: item.DwellRecType,
        DwellRec: item.DwellRec,
        DwellStatus: item.DwellStatus,
        DwellStatusDesc: item.DwellStatusDesc,
        Area: item.Area,
        AreaDesc: area.Description,
        Count: item.Count
    };
})

With that done, we can show the resulting table:

Inputs.table(area_join_table)

Data needs to be further manipulated to be plottable. First we’ll start by getting the unique values of some of the columns:

unique_areas = [...new Set(area_join_table
    .flatMap((entry) => entry.AreaDesc))]

// Get the unique Dwell Recs
unique_dwell_recs = [...new Set(area_join_table
    .flatMap((entry) => entry.DwellRec))]

After that, define some inputs to filter what results will be shown in the graph:

Output cell disabled due to high list of areas

viewof area_select = Inputs.checkbox(
    unique_areas,
    {
        label: "Area:"
    }
)

// Define a checkbox input for the Dwell Recs
viewof dwell_rec_select = Inputs.checkbox(
    unique_dwell_recs,
    {
        label: "Dwell Rec:"
    }
)

// Write the filtering function
filtered = data.filter(function(entry) {
    return area_select.includes(entry.AreaDesc) && dwell_rec_select.includes(entry.DwellRec);
})

TODO: Do some more data engineering on data

Right now there’s no simple way to represent the data as is, some aggregation/grouping should be done before

Check official ObservableHQ documentation to finish this example