Voeg alle posts toe.

author Frederik Vanrenterghem <frederik@vanrenterghem.biz>

Sat, 25 Nov 2023 12:15:44 +0000 (20:15 +0800)

committer Frederik Vanrenterghem <frederik@vanrenterghem.biz>

Sat, 25 Nov 2023 12:15:44 +0000 (20:15 +0800)
author Frederik Vanrenterghem <frederik@vanrenterghem.biz>
Sat, 25 Nov 2023 12:15:44 +0000 (20:15 +0800)
committer Frederik Vanrenterghem <frederik@vanrenterghem.biz>
Sat, 25 Nov 2023 12:15:44 +0000 (20:15 +0800)
diff --git a/posts/AUC_and_economics_of_predictive_modelling.mdwn b/posts/AUC_and_economics_of_predictive_modelling.mdwn

new file mode 100644 (file)

index 0000000..268e048
--- /dev/null
+++ b/posts/AUC_and_economics_of_predictive_modelling.mdwn
@@ -0,0 +1,41 @@
+[[!meta date="2017-01-11 04:12:52 +1300"]]
+[[!tag R analysis economics forecasting]]
+The strenght of a predictive, machine-learning model is often evaluated by quoting the area under the curve or AUC (or similarly the Gini coefficient). This AUC represents the area under the ROC line, which shows the trade-off between false positives and true positives for different cutoff values. Cutoff values enable the use of a regression model for classification purposes, by marking the value below and above which either of the classifier values is predicted. Models with a higher AUC (or a higher Gini coefficient) are considered better.
+
+This oversimplifies the challenge facing any real-world model builder. The diagonal line from (0,0) to (1,1) is a first hint at that. Representing a model randomly guessing, this model with an AUC of .5 is effectively worth nothing. Now assume a model with the same AUC, but for a certain range of cutoffs its curve veers above the diagonal, and for another it veers below it.
+
+Such a model may very well have some practical use. This can be determined by introducing an **indifference line** to the ROC analysis. The upper-left area of the ROC space left by that line is where the model makes economical sense to use.
+
+The slope of the line (s) is defined mathematically as follows:
+
+slope s = (ratio negative * (utility TN - utility FP)) / (ratio positive * (utility TP - utility FN))
+
+This with *ratio negative* the base rate of negative outcomes, *utility TN* the economic value of identifying a true negative, and so on.
+
+Many such lines can be drawn on any square space - the left-most one crossing either (0,0) or (1,1) is the one we care about.
+
+This line represents combinations of true positive rates and false positive rates that have the same utility to the user. In the event of equal classes and equal utilities, this line is the diagonal of the random model.
+
+[[!img pics/indifference-line.png size="300x300" alt="ROC space plot with indifference line."]]
+
+An optimal and viable cutoff is the point of the tangent of the left-most parallel line to the indifference line and the ROC curve.
+
+The code to create a graphic like above is shown below. Of note is the conversion to `coord_fixed` which ensures the plot is actually a square as intended.
+
+[[!format r """
+library(ggplot2)
+library(dplyr)
+r.p <- nrow(filter(dataset, y == 'Positive')) / nrow(dataset)
+r.n <- 1- r.p
+uFP <- -10 
+uFN <- -2
+uTP <- 20
+uTN <- 0
+s <- (r.n * (uTN - uFP)) / (r.p * (uTP - uFN)) # equals .4 
+ROC.plot + # start from a previous plot with the ROC space
+  coord_fixed() + # Fix aspect ratio - allows to convert slope to angle and also better for plotted data
+  geom_abline(intercept = ifelse(s < 1, 1-s, 0), slope = s, colour = "blue") + 
+  annotate("text", x = 0.05, y = ifelse(s < 1, 1 - s -.01, 0), angle = atan(s) * 180/pi, label = "Indifference line", hjust = 0, colour = "blue")
+"""]]
+
+[Reference article](http://fulltext.study/article/3468080/A-principled-approach-to-setting-optimal-diagnostic-thresholds-where-ROC-and-indifference-curves-meet)
diff --git a/posts/AUC_and_economics_of_predictive_modelling.org b/posts/AUC_and_economics_of_predictive_modelling.org

new file mode 100644 (file)

index 0000000..720360a
--- /dev/null
+++ b/posts/AUC_and_economics_of_predictive_modelling.org
@@ -0,0 +1,44 @@
+#+date: <2017-01-11 04:12:52 +1300>
+#+filetags: R analysis economics forecasting
+
+The strenght of a predictive, machine-learning model is often evaluated by quoting the area under the curve or AUC (or similarly the Gini coefficient). This AUC represents the area under the ROC line, which shows the trade-off between false positives and true positives for different cutoff values. Cutoff values enable the use of a regression model for classification purposes, by marking the value below and above which either of the classifier values is predicted. Models with a higher AUC (or a higher Gini coefficient) are considered better.
+
+This oversimplifies the challenge facing any real-world model builder. The diagonal line from (0,0) to (1,1) is a first hint at that. Representing a model randomly guessing, this model with an AUC of .5 is effectively worth nothing. Now assume a model with the same AUC, but for a certain range of cutoffs its curve veers above the diagonal, and for another it veers below it.
+
+Such a model may very well have some practical use. This can be determined by introducing an **indifference line** to the ROC analysis. The upper-left area of the ROC space left by that line is where the model makes economical sense to use.
+
+The slope of the line (s) is defined mathematically as follows:
+
+slope s = (ratio negative * (utility TN - utility FP)) / (ratio positive * (utility TP - utility FN))
+
+This with *ratio negative* the base rate of negative outcomes, *utility TN* the economic value of identifying a true negative, and so on.
+
+Many such lines can be drawn on any square space - the left-most one crossing either (0,0) or (1,1) is the one we care about.
+
+This line represents combinations of true positive rates and false positive rates that have the same utility to the user. In the event of equal classes and equal utilities, this line is the diagonal of the random model.
+
+#+caption: ROC space plot with indifference line.
+#+attr_html: :width 300
+[[file:assets/indifference-line.png]]
+
+An optimal and viable cutoff is the point of the tangent of the left-most parallel line to the indifference line and the ROC curve.
+
+The code to create a graphic like above is shown below. Of note is the conversion to `coord_fixed` which ensures the plot is actually a square as intended.
+
+#+begin_src R
+library(ggplot2)
+library(dplyr)
+r.p <- nrow(filter(dataset, y == 'Positive')) / nrow(dataset)
+r.n <- 1- r.p
+uFP <- -10 
+uFN <- -2
+uTP <- 20
+uTN <- 0
+s <- (r.n * (uTN - uFP)) / (r.p * (uTP - uFN)) # equals .4 
+ROC.plot + # start from a previous plot with the ROC space
+  coord_fixed() + # Fix aspect ratio - allows to convert slope to angle and also better for plotted data
+  geom_abline(intercept = ifelse(s < 1, 1-s, 0), slope = s, colour = "blue") + 
+  annotate("text", x = 0.05, y = ifelse(s < 1, 1 - s -.01, 0), angle = atan(s) * 180/pi, label = "Indifference line", hjust = 0, colour = "blue")
+#+end_src
+
+[[http://fulltext.study/article/3468080/A-principled-approach-to-setting-optimal-diagnostic-thresholds-where-ROC-and-indifference-curves-meet][Reference article]]
diff --git a/posts/Bluetooth.mdwn b/posts/Bluetooth.mdwn

new file mode 100644 (file)

index 0000000..227b013
--- /dev/null
+++ b/posts/Bluetooth.mdwn
@@ -0,0 +1,2 @@
+[[!meta date="2015-03-27 01:24:28 +1300"]]
+Wondering which bluetooth chipsets have free firmware. Debian non-free contains Broadcom firmware, but are more ndiswrapper tricks needed for others?
diff --git a/posts/Bluetooth.org b/posts/Bluetooth.org

new file mode 100644 (file)

index 0000000..6e08502
--- /dev/null
+++ b/posts/Bluetooth.org
@@ -0,0 +1,6 @@
+#+date: 2015-03-27 01:24:28 +1300
+#+title: Bluetooth
+#+filetags: muzings
+
+Wondering which bluetooth chipsets have free firmware. Debian non-free contains Broadcom firmware,
+but are more ndiswrapper tricks needed for others?
diff --git a/posts/Bring_Back_Blogging.mdwn b/posts/Bring_Back_Blogging.mdwn

new file mode 100644 (file)

index 0000000..73894c9
--- /dev/null
+++ b/posts/Bring_Back_Blogging.mdwn
@@ -0,0 +1,11 @@
+[[!meta date="2023-01-02 19:35:28 +0800"]]
+[[!opengraph2 ogimage="https://indieweb.org/File:indiewebcamp-logo-lockup-color@3x.png"]]
+[[!tag indieweb blogging open_web]]
+
+After setting a personal objective late last year of reaching ['level 2' on the indieweb](https://indiewebify.me/send-webmentions/), with the ability to send [WebMentions](http://webmention.org/), I stumbled on [Bring Back Blogging](https://bringback.blog/) yesterday. I love how some people are dedicated to keeping a decentralised internet available and of relevance.
+
+Over the past month, I created an [OpenGraph plugin for IkiWiki](http://git.vanrenterghem.biz/git.ikiwiki.info.git/blob/6c546c8f3182668c6d21d578b789674894f18c39:/IkiWiki/Plugin/opengraph2.pm) which allows to add customized [Open Graph](ogp.me) tags to posts on my blog. This in combination with standard IkiWiki blog support seems equivalent to an [h-entry](http://microformats.org/wiki/h-entry), which was a level 1 feature.
+
+I've also added an [h-card](https://microformats.org/wiki/h-card) to the [About](https://www.vanrenterghem.biz/About/index.shtml) section of this site, thereby completing the first part of the level 2 requirements.
+
+Yet to do are the WebMentions. There were some [conversations about that on the IkiWiki discussion forum](https://ikiwiki.info/todo/pingback_support/) years ago, but no code was checked in to complete the feature. At its simplest, it appears to be a matter of including another site's blog post in yours and sending of a ping to the site to inform it about this. Having the ability to receive back seems quite a bit harder.
diff --git a/posts/Bring_Back_Blogging.org b/posts/Bring_Back_Blogging.org

new file mode 100644 (file)

index 0000000..b44ca84
--- /dev/null
+++ b/posts/Bring_Back_Blogging.org
@@ -0,0 +1,11 @@
+#+date: <2023-01-02 19:35:28 +0800>
+#+opengraph2: ogimage="https://indieweb.org/File:indiewebcamp-logo-lockup-color@3x.png
+#+filetags: indieweb blogging open_web
+
+After setting a personal objective late last year of reaching [[https://indiewebify.me/send-webmentions/]['level 2' on the indieweb]], with the ability to send [[http://webmention.org/)][Webmentions]], I stumbled on [[https://bringback.blog/][Bring Back Blogging]] yesterday. I love how some people are dedicated to keeping a decentralised internet available and of relevance.
+
+Over the past month, I created an [[http://git.vanrenterghem.biz/git.ikiwiki.info.git/blob/6c546c8f3182668c6d21d578b789674894f18c39:/IkiWiki/Plugin/opengraph2.pm][OpenGraph plugin for IkiWiki]] which allows to add customized [[http://ogp.me][Open Graph]] tags to posts on my blog. This in combination with standard IkiWiki blog support seems equivalent to an [[http://microformats.org/wiki/h-entry][h-entry]], which was a level 1 feature.
+
+I've also added an [[https://microformats.org/wiki/h-card][h-card]] to the [[https://www.vanrenterghem.biz/About/index.shtml][About]] section of this site, thereby completing the first part of the level 2 requirements.
+
+Yet to do are the WebMentions. There were some [[https://ikiwiki.info/todo/pingback_support/][conversations about that on the IkiWiki discussion forum]] years ago, but no code was checked in to complete the feature. At its simplest, it appears to be a matter of including another site's blog post in yours and sending of a ping to the site to inform it about this. Having the ability to receive back seems quite a bit harder.
diff --git a/posts/Debian_on_A20.mdwn b/posts/Debian_on_A20.mdwn

new file mode 100644 (file)

index 0000000..0c6ac02
--- /dev/null
+++ b/posts/Debian_on_A20.mdwn
@@ -0,0 +1,3 @@
+[[!meta date="2015-03-17 04:08:26 +1300"]]
+[Installing Debian on A20 development board](http://www.vanrenterghem.biz/Linux/Installing_Debian_on_Lime2.shtml)
+
diff --git a/posts/Debian_on_A20.org b/posts/Debian_on_A20.org

new file mode 100644 (file)

index 0000000..269ec6a
--- /dev/null
+++ b/posts/Debian_on_A20.org
@@ -0,0 +1,5 @@
+#+date: 2015-03-17 04:08:26 +1300
+#+title: Debian on A20
+
+[[http://www.vanrenterghem.biz/Linux/Installing_Debian_on_Lime2.shtml][Installing
+Debian on A20 development board]]
diff --git a/posts/Fearless_analysis.mdwn b/posts/Fearless_analysis.mdwn

new file mode 100644 (file)

index 0000000..5c30303
--- /dev/null
+++ b/posts/Fearless_analysis.mdwn
@@ -0,0 +1,8 @@
+[[!meta date="2015-08-22 01:47:57 +1200"]]
+['There is no silver bullet': Isis, al-Qaida and the myths of terrorism](http://www.theguardian.com/world/2015/aug/19/isis-al-qaida-myths-terrorism-war-mistakes-9-11)
+
+> The west’s response to 9/11 was the catastrophic ‘war on terror’. Have we learned from our mistakes with
+> al-Qaida, or is history repeating itself with Isis?
+
+As Boyd said: Observe, orient, decide, act. Seems that second step is forgotten these days.
+
diff --git a/posts/Fearless_analysis.org b/posts/Fearless_analysis.org

new file mode 100644 (file)

index 0000000..b20e244
--- /dev/null
+++ b/posts/Fearless_analysis.org
@@ -0,0 +1,12 @@
+#+date: <2015-08-22 01:47:57 +1200>
+#+title: Fearless analysis
+
+[[http://www.theguardian.com/world/2015/aug/19/isis-al-qaida-myths-terrorism-war-mistakes-9-11]['There is no silver bullet': Isis, al-Qaida and the myths of terrorism]]
+
+#+BEGIN_QUOTE
+The west’s response to 9/11 was the catastrophic ‘war on terror’. Have we learned from our mistakes with
+al-Qaida, or is history repeating itself with Isis?
+#+END_QUOTE
+
+As Boyd said: Observe, orient, decide, act. Seems that second step is forgotten these days.
+
diff --git a/posts/FedEx_marries_TNT.mdwn b/posts/FedEx_marries_TNT.mdwn

new file mode 100644 (file)

index 0000000..05a6670
--- /dev/null
+++ b/posts/FedEx_marries_TNT.mdwn
@@ -0,0 +1,7 @@
+[[!meta date="2015-04-07 20:05:21 +1200"]]
+Well, that one has been in the works for a while! Buying a solid European network for two-thirds of what UPS was willing to pay for it a few years ago sounds like a great deal.
+
+[FedEx to buy TNT to expand Europe deliveries](http://www.reuters.com/article/2015/04/07/us-tnt-express-m-a-fedex-idUSKBN0MY06G20150407)
+
+> AMSTERDAM (Reuters) - FedEx Corp <FDX.N> is seeking to buy Dutch package delivery firm TNT Express 
+> for an agreed 4.4 billion euros ($4.8 billion), aiming to succeed where United Parcel Service <UPS.N>
diff --git a/posts/FedEx_marries_TNT.org b/posts/FedEx_marries_TNT.org

new file mode 100644 (file)

index 0000000..8a01e5a
--- /dev/null
+++ b/posts/FedEx_marries_TNT.org
@@ -0,0 +1,15 @@
+#+date: 2015-04-07 20:05:21 +1200
+#+title: FedEx marries TNT.
+
+Well, that one has been in the works for a while! Buying a solid European network for two-thirds of
+what UPS was willing to pay for it a few years ago sounds like a great deal.
+
+[[http://www.reuters.com/article/2015/04/07/us-tnt-express-m-a-fedex-idUSKBN0MY06G20150407][FedEx
+to buy TNT to expand Europe deliveries]]
+
+#+begin_quote
+AMSTERDAM (Reuters) - FedEx Corp <FDX.N> is seeking to buy Dutch package
+delivery firm TNT Express for an agreed 4.4 billion euros ($4.8
+billion), aiming to succeed where United Parcel Service <UPS.N>
+
+#+end_quote
diff --git a/posts/Fibonacci_golden_spiral.mdwn b/posts/Fibonacci_golden_spiral.mdwn

new file mode 100644 (file)

index 0000000..3aecd5d
--- /dev/null
+++ b/posts/Fibonacci_golden_spiral.mdwn
@@ -0,0 +1,81 @@
+[[!meta date="2019-09-16 22:03:03 +0800"]]\r
+[[!tag mathematics R visualisation]]\r
+\r
+# What\r
+\r
+After having read the first part of a Rcpp tutorial which compared native R vs C++ implementations of a Fibonacci sequence generator, I resorted to drawing the so-called Golden Spiral using R.\r
+\r
+\r
+# Details\r
+\r
+Libraries used in this example are the following\r
+\r
+    library(ggplot2)\r
+    library(plotrix)\r
+\r
+In polar coordinates, this special instance of a logarithmic spiral's functional representation can be simplified to r(t) = e<sup>(0.0635\*t)</sup>\r
+For every quarter turn, the corresponding point on the spiral is a factor of phi further from the origin (r is this distance), with phi the golden ratio - the same one obtained from dividing any 2 sufficiently big successive numbers on a Fibonacci sequence, which is how the golden ratio, the golden spiral and Fibonacci sequences are linked concepts!\r
+\r
+    polar_golden_spiral <- function(theta) exp(0.30635*theta)\r
+\r
+Let's do 2 full circles. First, I create a sequence of angle values theta. Since 2 \* PI is the equivalent of a circle in polar coordinates, we need to have distances from origin for values between 0 and 4 \* PI.\r
+\r
+    seq_theta <- seq(0,4*pi,by=0.05)\r
+    \r
+    dist_from_origin <- sapply(seq_theta,polar_golden_spiral)\r
+\r
+Plotting the function using coord<sub>polar</sub> in ggplot2 does not work as intended. Unexpectedly, the x axis keeps extending instead of circling back once a full circle is reached. Turns out coord<sub>polar</sub> might not really be intended to plot elements in polar vector format.\r
+\r
+    ggplot(data.frame(x = seq_theta, y = dist_from_origin), aes(x,y)) +\r
+        geom_point() +\r
+        coord_polar(theta="x")\r
+\r
+[[failed attempt plotting golden spiral|/pics/golden_spiral-coord_polar-fail.png]]\r
+\r
+To ensure what I was trying to do is possible, I employ a specialised plotfunction instead\r
+\r
+    plotrix::radial.plot(dist_from_origin, seq_theta,rp.type="s", point.col = "blue")\r
+\r
+[[Plotrix golden spiral|/pics/golden_spiral-plotrix.png]]\r
+\r
+With that established and the original objective of the exercise achieved, it still would be nice to be able to accomplish this using ggplot2. To do so, the created sequence above needs to be converted to cartesian coordinates.\r
+The rectangular function equivalent of the golden spiral function r(t) defined above is a(t) = (r(t) cos(t), r(t) sin(t))\r
+It's not too hard to come up with a hack to convert one to the other.\r
+\r
+    cartesian_golden_spiral <- function(theta) {\r
+        a <- polar_golden_spiral(theta)*cos(theta)\r
+        b <- polar_golden_spiral(theta)*sin(theta)\r
+        c(a,b)\r
+    }\r
+\r
+Applying that function to the same series of angles from above and stitching the resulting coordinates in a data frame. Note I'm enclosing the first expression in brackets, which prints it immediately, which is useful when the script is run interactively.\r
+\r
+    (serie <- sapply(seq_theta,cartesian_golden_spiral))\r
+    df <- data.frame(t(serie))\r
+\r
+\r
+# Result\r
+\r
+With everything now ready in the right coordinate system, it's now only a matter of setting some options to make the output look acceptable.\r
+\r
+    ggplot(df, aes(x=X1,y=X2)) +\r
+        geom_path(color="blue") +\r
+        theme(panel.grid.minor = element_blank(),\r
+         axis.text.x = element_blank(),\r
+         axis.text.y = element_blank()) +\r
+        scale_y_continuous(breaks = seq(-20,20,by=10)) +\r
+        scale_x_continuous(breaks = seq(-20,50,by=10)) +\r
+        coord_fixed() +\r
+        labs(title = "Golden spiral",\r
+        subtitle = "Another view on the Fibonacci sequence",\r
+        caption = "Maths from https://www.intmath.com/blog/mathematics/golden-spiral-6512\nCode errors mine.",\r
+        x = "",\r
+        y = "")\r
+\r
+[[ggplot2 version of Golden Spiral|/pics/golden_spiral-ggplot-coord-fixed.png]]\r
+\r
+\r
+# Note on how this post was written.\r
+\r
+After a long hiatus, I set about using emacs, org-mode and ESS together to create this post. All code is part of an .org file, and gets exported to markdown using the orgmode conversion - C-c C-e m m.\r
+\r
diff --git a/posts/Fibonacci_golden_spiral.org b/posts/Fibonacci_golden_spiral.org

new file mode 100644 (file)

index 0000000..c4eeed8
--- /dev/null
+++ b/posts/Fibonacci_golden_spiral.org
@@ -0,0 +1,359 @@
+#+TITLE: Creating a golden spiral in R\r
+#+AUTHOR: Frederik Vanrenterghem\r
+#+LANGUAGE: en\r
+#+PROPERTY: session *R* \r
+#+PROPERTY: cache yes \r
+#+PROPERTY: results graphics \r
+#+PROPERTY: exports both \r
+#+PROPERTY: tangle yes \r
+#+OPTIONS: toc:nil  \r
+#+date: <2019-09-16 22:03:03>\r
+\r
+* What \r
+After having read the first part of a Rcpp tutorial which compared native R vs C++ implementations of a Fibonacci sequence generator, I resorted to drawing the so-called Golden Spiral using R.\r
+\r
+* Details\r
+Libraries used in this example are the following\r
+\r
+#+BEGIN_SRC R :session R\r
+library(ggplot2)\r
+library(plotrix)\r
+#+END_SRC\r
+\r
+#+RESULTS:\r
+| plotrix   |\r
+| ggplot2   |\r
+| stats     |\r
+| graphics  |\r
+| grDevices |\r
+| utils     |\r
+| datasets  |\r
+| methods   |\r
+| base      |\r
+\r
+In polar coordinates, this special instance of a logarithmic spiral's functional representation can be simplified to r(t) = e^(0.0635*t)\r
+For every quarter turn, the corresponding point on the spiral is a factor of phi further from the origin (r is this distance), with phi the golden ratio - the same one obtained from dividing any 2 sufficiently big successive numbers on a Fibonacci sequence, which is how the golden ratio, the golden spiral and Fibonacci sequences are linked concepts!\r
+#+BEGIN_SRC R :session R\r
+polar_golden_spiral <- function(theta) exp(0.30635*theta)\r
+#+END_SRC\r
+\r
+#+RESULTS:\r
+\r
+Let's do 2 full circles. First, I create a sequence of angle values theta. Since 2 * PI is the equivalent of a circle in polar coordinates, we need to have distances from origin for values between 0 and 4 * PI.\r
+#+BEGIN_SRC R :session R\r
+seq_theta <- seq(0,4*pi,by=0.05)\r
+\r
+dist_from_origin <- sapply(seq_theta,polar_golden_spiral)\r
+#+END_SRC\r
+\r
+#+RESULTS:\r
+|                1 |\r
+| 1.01543541418402 |\r
+| 1.03110908037907 |\r
+| 1.04702467610363 |\r
+| 1.06318593564018 |\r
+| 1.07959665091141 |\r
+| 1.09626067236991 |\r
+| 1.11318190990159 |\r
+| 1.13036433374308 |\r
+| 1.14781197541325 |\r
+| 1.16552892865913 |\r
+| 1.18351935041644 |\r
+| 1.20178746178492 |\r
+| 1.22033754901874 |\r
+| 1.23917396453215 |\r
+| 1.25830112792076 |\r
+| 1.27772352699844 |\r
+| 1.29744571885033 |\r
+| 1.31747233090207 |\r
+| 1.33780806200553 |\r
+| 1.35845768354131 |\r
+| 1.37942604053823 |\r
+| 1.40071805281016 |\r
+| 1.42233871611032 |\r
+| 1.44429310330345 |\r
+| 1.46658636555607 |\r
+| 1.48922373354506 |\r
+|   1.512210518685 |\r
+| 1.53555211437434 |\r
+| 1.55925399726085 |\r
+| 1.58332172852666 |\r
+| 1.60776095519303 |\r
+| 1.63257741144533 |\r
+| 1.65777691997847 |\r
+| 1.68336539336305 |\r
+| 1.70934883543265 |\r
+| 1.73573334269253 |\r
+|    1.76252510575 |\r
+| 1.78973041076699 |\r
+| 1.81735564093491 |\r
+| 1.84540727797241 |\r
+| 1.87389190364612 |\r
+| 1.90281620131498 |\r
+| 1.93218695749834 |\r
+| 1.96201106346829 |\r
+| 1.99229551686655 |\r
+| 2.02304742334636 |\r
+| 2.05427399823962 |\r
+| 2.08598256824992 |\r
+|  2.1181805731715 |\r
+| 2.15087556763495 |\r
+| 2.18407522287968 |\r
+| 2.21778732855389 |\r
+| 2.25201979454219 |\r
+| 2.28678065282156 |\r
+| 2.32207805934587 |\r
+|  2.3579202959595 |\r
+| 2.39431577234054 |\r
+| 2.43127302797395 |\r
+| 2.46880073415517 |\r
+| 2.50690769602466 |\r
+| 2.54560285463391 |\r
+| 2.58489528904321 |\r
+| 2.62479421845192 |\r
+| 2.66530900436155 |\r
+| 2.70644915277227 |\r
+|  2.7482243164133 |\r
+| 2.79064429700773 |\r
+| 2.83371904757232 |\r
+| 2.87745867475275 |\r
+| 2.92187344119496 |\r
+|  2.9669737679531 |\r
+| 3.01277023693458 |\r
+| 3.05927359338295 |\r
+| 3.10649474839905 |\r
+| 3.15444478150108 |\r
+| 3.20313494322417 |\r
+| 3.25257665776014 |\r
+| 3.30278152563795 |\r
+|  3.3537613264455 |\r
+| 3.40552802159354 |\r
+| 3.45809375712212 |\r
+| 3.51147086655048 |\r
+| 3.56567187377081 |\r
+| 3.62070949598677 |\r
+| 3.67659664669734 |\r
+|  3.7333464387267 |\r
+| 3.79097218730088 |\r
+| 3.84948741317197 |\r
+| 3.90890584579046 |\r
+| 3.96924142652657 |\r
+| 4.03050831194138 |\r
+| 4.09272087710833 |\r
+| 4.15589371898609 |\r
+| 4.22004165984341 |\r
+| 4.28517975073691 |\r
+| 4.35132327504251 |\r
+| 4.41848775204137 |\r
+| 4.48668894056115 |\r
+| 4.55594284267357 |\r
+| 4.62626570744896 |\r
+| 4.69767403476877 |\r
+| 4.77018457919694 |\r
+| 4.84381435391107 |\r
+| 4.91858063469419 |\r
+|  4.9945009639882 |\r
+| 5.07159315500985 |\r
+| 5.14987529593027 |\r
+| 5.22936575411901 |\r
+| 5.31008318045357 |\r
+| 5.39204651369547 |\r
+| 5.47527498493387 |\r
+| 5.55978812209773 |\r
+|  5.6456057545377 |\r
+| 5.73274801767868 |\r
+| 5.82123535774417 |\r
+| 5.91108853655362 |\r
+| 6.00232863639374 |\r
+| 6.09497706496509 |\r
+| 6.18905556040493 |\r
+| 6.28458619638769 |\r
+| 6.38159138730412 |\r
+| 6.48009389352033 |\r
+| 6.58011682671816 |\r
+|  6.6816836553178 |\r
+| 6.78481820998423 |\r
+| 6.88954468921862 |\r
+| 6.99588766503603 |\r
+| 7.10387208873074 |\r
+|  7.2135232967306 |\r
+| 7.32486701654172 |\r
+| 7.43792937278492 |\r
+| 7.55273689332534 |\r
+| 7.66931651549675 |\r
+| 7.78769559242179 |\r
+| 7.90790189942989 |\r
+|  8.0299636405742 |\r
+| 8.15390945524908 |\r
+| 8.27976842490986 |\r
+| 8.40757007989611 |\r
+| 8.53734440636049 |\r
+|  8.6691218533043 |\r
+| 8.80293333972179 |\r
+| 8.93881026185472 |\r
+| 9.07678450055882 |\r
+| 9.21688842878404 |\r
+| 9.35915491917023 |\r
+| 9.50361735176004 |\r
+|  9.6503096218309 |\r
+| 9.79926614784789 |\r
+| 9.95052187953938 |\r
+| 10.1041123060972 |\r
+| 10.2600734645037 |\r
+| 10.4184419479868 |\r
+| 10.5792549146061 |\r
+| 10.7425500959714 |\r
+| 10.9083658060953 |\r
+| 11.0767409503832 |\r
+| 11.2477150347615 |\r
+| 11.4213281749469 |\r
+| 11.5976211058588 |\r
+| 11.7766351911771 |\r
+|  11.958412433047 |\r
+| 12.1429954819344 |\r
+| 12.3304276466328 |\r
+| 12.5207529044246 |\r
+| 12.7140159114002 |\r
+| 12.9102620129349 |\r
+| 13.1095372543288 |\r
+| 13.3118883916102 |\r
+| 13.5173629025061 |\r
+|  13.726008997582 |\r
+| 13.9378756315533 |\r
+| 14.1530125147717 |\r
+| 14.3714701248888 |\r
+| 14.5932997186998 |\r
+| 14.8185533441694 |\r
+| 15.0472838526447 |\r
+| 15.2795449112548 |\r
+| 15.5153910155034 |\r
+| 15.7548775020547 |\r
+| 15.9980605617174 |\r
+| 16.2449972526286 |\r
+| 16.4957455136412 |\r
+| 16.7503641779184 |\r
+| 17.0089129867378 |\r
+|  17.271452603508 |\r
+| 17.5380446280028 |\r
+| 17.8087516108139 |\r
+| 18.0836370680272 |\r
+| 18.3627654961257 |\r
+| 18.6462023871224 |\r
+| 18.9340142439267 |\r
+| 19.2262685959479 |\r
+| 19.5230340149396 |\r
+| 19.8243801310889 |\r
+| 20.1303776493537 |\r
+| 20.4410983660522 |\r
+| 20.7566151857085 |\r
+| 21.0770021381583 |\r
+| 21.4023343959182 |\r
+| 21.7326882918241 |\r
+| 22.0681413369407 |\r
+| 22.4087722387478 |\r
+| 22.7546609196083 |\r
+| 23.1058885355194 |\r
+|  23.462537495155 |\r
+| 23.8246914792008 |\r
+| 24.1924354599887 |\r
+| 24.5658557214339 |\r
+| 24.9450398792791 |\r
+| 25.3300769016527 |\r
+| 25.7210571299428 |\r
+| 26.1180722999943 |\r
+| 26.5212155636329 |\r
+| 26.9305815105213 |\r
+| 27.3462661903527 |\r
+| 27.7683671353873 |\r
+| 28.1969833833359 |\r
+| 28.6322155005976 |\r
+| 29.0741656058555 |\r
+| 29.5229373940367 |\r
+| 29.9786361606425 |\r
+| 30.4413688264541 |\r
+|  30.911243962619 |\r
+| 31.3883718161253 |\r
+| 31.8728643356692 |\r
+| 32.3648351979214 |\r
+| 32.8643998341988 |\r
+|  33.371675457549 |\r
+| 33.8867810902509 |\r
+| 34.4098375917422 |\r
+| 34.9409676869756 |\r
+| 35.4802959952146 |\r
+| 36.0279490592724 |\r
+|  36.584055375203 |\r
+| 37.1487454224504 |\r
+| 37.7221516944627 |\r
+| 38.3044087297792 |\r
+| 38.8956531435973 |\r
+| 39.4960236598267 |\r
+|  40.105661143638 |\r
+| 40.7247086345141 |\r
+| 41.3533113798114 |\r
+| 41.9916168688395 |\r
+| 42.6397748674667 |\r
+| 43.2979374532595 |\r
+| 43.9662590511644 |\r
+|  44.644896469741 |\r
+| 45.3340089379542 |\r
+| 46.0337581425336 |\r
+| 46.7443082659106 |\r
+\r
+Plotting the function using coord_polar in ggplot2 does not work as intended. Unexpectedly, the x axis keeps extending instead of circling back once a full circle is reached. Turns out coord_polar might not really be intended to plot elements in polar vector format.\r
+#+BEGIN_SRC R :session R :results output graphics :file ../pics/golden_spiral-coord_polar-fail.png :exports both\r
+ggplot(data.frame(x = seq_theta, y = dist_from_origin), aes(x,y)) +\r
+    geom_point() +\r
+    coord_polar(theta="x")\r
+#+END_SRC\r
+\r
+#+RESULTS:\r
+[[file:../pics/golden_spiral-coord_polar-fail.png]]\r
+\r
+To ensure what I was trying to do is possible, I employ a specialised plotfunction instead\r
+#+BEGIN_SRC R :session R :results output graphics :file ../pics/golden_spiral-plotrix.png :exports both\r
+plotrix::radial.plot(dist_from_origin, seq_theta,rp.type="s", point.col = "blue")\r
+#+END_SRC\r
+\r
+With that established and the original objective of the exercise achieved, it still would be nice to be able to accomplish this using ggplot2. To do so, the created sequence above needs to be converted to cartesian coordinates.\r
+The rectangular function equivalent of the golden spiral function r(t) defined above is a(t) = (r(t) cos(t), r(t) sin(t))\r
+It's not too hard to come up with a hack to convert one to the other.\r
+#+BEGIN_SRC R :session R\r
+cartesian_golden_spiral <- function(theta) {\r
+    a <- polar_golden_spiral(theta)*cos(theta)\r
+    b <- polar_golden_spiral(theta)*sin(theta)\r
+    c(a,b)\r
+}\r
+#+END_SRC\r
+\r
+#+RESULTS:\r
+\r
+Applying that function to the same series of angles from above and stitching the resulting coordinates in a data frame. Note I'm enclosing the first expression in brackets, which prints it immediately, which is useful when the script is run interactively.\r
+#+BEGIN_SRC R :session R :exports code\r
+(serie <- sapply(seq_theta,cartesian_golden_spiral))\r
+df <- data.frame(t(serie))\r
+#+END_SRC\r
+\r
+#+RESULTS:\r
+: TRUE\r
+\r
+* Result\r
+With everything now ready in the right coordinate system, it's now only a matter of setting some options to make the output look acceptable.\r
+#+BEGIN_SRC R :session R :results output graphics :file ../pics/golden_spiral-ggplot-coord-fixed.png :width 800 :height 800 :exports both\r
+ggplot(df, aes(x=X1,y=X2)) +\r
+    geom_path(color="blue") +\r
+    theme(panel.grid.minor = element_blank(),\r
+          axis.text.x = element_blank(),\r
+          axis.text.y = element_blank()) +\r
+    scale_y_continuous(breaks = seq(-20,20,by=10)) +\r
+    scale_x_continuous(breaks = seq(-20,50,by=10)) +\r
+    coord_fixed() +\r
+    labs(title = "Golden spiral",\r
+         subtitle = "Another view on the Fibonacci sequence",\r
+         caption = "Maths from https://www.intmath.com/blog/mathematics/golden-spiral-6512\nCode errors mine.",\r
+         x = "",\r
+         y = "")\r
+#+END_SRC\r
+\r
+* Note on how this post was written.\r
+After a long hiatus, I set about using emacs, org-mode and ESS together to create this post. All code is part of an .org file, and gets exported to markdown using the orgmode conversion - C-c C-e m m.\r
diff --git a/posts/Half_Time_Oranges.mdwn b/posts/Half_Time_Oranges.mdwn

new file mode 100644 (file)

index 0000000..b00adda
--- /dev/null
+++ b/posts/Half_Time_Oranges.mdwn
@@ -0,0 +1,8 @@
+[[!meta date="2023-08-25 21:36:00 +0800"]]
+[[!opengraph2 ogimage="https://www.vanrenterghem.biz/blog/pics/half-time-oranges.png"]]
+[[!tag Australia]]
+
+Moving countries means learning about a new culture. Even after 10 years in Australia, I am still discovering quirky things people raised here don't think twice about. One of these is the tradition of cutting up oranges for kids playing sports to eat during their half-time. It's not just any fruit - it simply always is oranges. It doesn't have to be oranges, except that it does, as this is just what people expect. Such a lovely tradition! Tomorrow is already the last competition day of netball in WA for the 2023 season, and my first as the parent responsible to bring in the oranges.
+
+[[!img /pics/half-time-oranges.png alt = "Bag of oranges labelled half-time oranges" class = "img-fluid"]]
+
diff --git a/posts/Half_Time_Oranges.org b/posts/Half_Time_Oranges.org

new file mode 100644 (file)

index 0000000..99b33a2
--- /dev/null
+++ b/posts/Half_Time_Oranges.org
@@ -0,0 +1,11 @@
+#+date: <2023-08-25 21:36:00 +0800>
+#+opengraph2; ogimage="https://www.vanrenterghem.biz/blog/pics/half-time-oranges.png
+#+filetags: Australia
+#+title: Half Time Oranges
+
+Moving countries means learning about a new culture. Even after 10 years in Australia, I am still discovering quirky things people raised here don't think twice about. One of these is the tradition of cutting up oranges for kids playing sports to eat during their half-time. It's not just any fruit - it simply always is oranges. It doesn't have to be oranges, except that it does, as this is just what people expect. Such a lovely tradition! Tomorrow is already the last competition day of netball in WA for the 2023 season, and my first as the parent responsible to bring in the oranges.
+
+#+caption: Bag of oranges labelled half-time oranges
+#+attr_html: :class img-fluid :alt Bag of oranges labelled half-time oranges
+[[file:assets/half-time-oranges.png]]
+
diff --git a/posts/I_bought_a_balance_board.mdwn b/posts/I_bought_a_balance_board.mdwn

new file mode 100644 (file)

index 0000000..7035080
--- /dev/null
+++ b/posts/I_bought_a_balance_board.mdwn
@@ -0,0 +1,14 @@
+[[!meta date="2023-05-27 16:44:00 +0800"]]
+[[!opengraph2 ogimage="https://www.vanrenterghem.biz/blog/pics/balance-board.png"]]
+[[!tag boardriding]]
+
+With summer having come to an end over here in Western Australia, the wind-filled afternoons are also behind us for a few months. While the prevailing easterlies that now reign make for glassy ocean conditions, they're generally not strong enough for wing surfing. Most Autumn days don't bring enough swell to go out surfing either around the area I live. This prompted me to try out a new activity - balance boarding.
+
+I bought the board online from a Sunshine Coast family business, [Barefoot & Salty](https://barefootandsalty.com.au). It came with 2 cork rollers, 5cm and 8cm in diameter. They were really nice to deal with, and included a handwritten thank-you note in the shipment. A lovely touch.
+
+Just a few days in, I can already see a lot of progress, and it's proving to be a really fun activity for which I don't even need to leave the house.
+
+Keep on riding :)
+
+[[!img /pics/balance-board.png alt="Barefoot & Salty XL Surf balance board" class="img-fluid"]]
+
diff --git a/posts/I_bought_a_balance_board.org b/posts/I_bought_a_balance_board.org

new file mode 100644 (file)

index 0000000..27c34af
--- /dev/null
+++ b/posts/I_bought_a_balance_board.org
@@ -0,0 +1,27 @@
+#+date: 2023-05-27 16:44:00 +0800
+#+opengraph2: ogimage="https://www.vanrenterghem.biz/blog/pics/balance-board.png"
+#+filetags: boardriding
+#+title: I bought a balance board.
+
+With summer having come to an end over here in Western Australia, the
+wind-filled afternoons are also behind us for a few months. While the
+prevailing easterlies that now reign make for glassy ocean conditions,
+they're generally not strong enough for wing surfing. Most Autumn days
+don't bring enough swell to go out surfing either around the area I
+live. This prompted me to try out a new activity - balance boarding.
+
+I bought the board online from a Sunshine Coast family business,
+[[https://barefootandsalty.com.au][Barefoot & Salty]]. It came with 2
+cork rollers, 5cm and 8cm in diameter. They were really nice to deal
+with, and included a handwritten thank-you note in the shipment. A
+lovely touch.
+
+Just a few days in, I can already see a lot of progress, and it's
+proving to be a really fun activity for which I don't even need to leave
+the house.
+
+Keep on riding :)
+
+#+CAPTION: Barefoot & Salty XL Surf balance board
+#+ATTR_HTML: :class img-fluid :alt Barefoot & Salty XL Surf balance board
+[[file:assets/balance-board.png]]
diff --git a/posts/Implementing_Webmention_on_my_blog.mdwn b/posts/Implementing_Webmention_on_my_blog.mdwn

new file mode 100644 (file)

index 0000000..d7f9048
--- /dev/null
+++ b/posts/Implementing_Webmention_on_my_blog.mdwn
@@ -0,0 +1,84 @@
+[[!meta date="2023-05-14 20:32:00 +0800"]]
+[[!opengraph2 ogimage="https://indieweb.org/File:indiewebcamp-logo-lockup-color@3x.png"]]
+[[!tag indieweb blogging open_web]]
+
+Following on from my last [[post on joining the indieweb|Bring_Back_Blogging]]...
+
+Back in February, I implemented Webmentions on my website. I took a roll-my-own approach, borrowing from [an idea by superkuh](http://superkuh.com/blog/2020-01-10-1.html). It's a semi-automated solution which listens for webmentions using nginx. When (if) one is received, an email is generated that tells me about this, allowing me to validate it's a genuine comment.
+
+Technically, nginx logs the body of POST requests in its logfile.
+
+In the main configuration file `/etc/nginx/nginx.conf`, I've added
+
+[[!format  sh """
+# Defined for Webmention logging support of www.vanrenterghem.biz
+log_format postdata '$time_local,$remote_addr,"$http_user_agent",$request_body';
+"""]]
+
+In the configuration for www.vanrenterghem.biz, the following lines enable logging webmention requests:
+
+[[!format  sh """
+# use proxying to self to get the HTTP post variables.
+    # https://stackoverflow.com/questions/4939382/logging-post-data-from-request-body
+    location = /webmention {
+       limit_req zone=webmention;
+       client_max_body_size 7k;
+       if ($request_method = POST) {
+               access_log /var/log/nginx/postdata.log postdata;
+               proxy_pass $scheme://www.vanrenterghem.biz/logsink;
+               break;
+       }
+       return 204 $scheme://$host/serviceup.html;
+    }
+    location /logsink {
+       #return 200;
+       # use 204 instead of 200 so no 0 byte file is sent to browsers from HTTP forms.
+       return 204;
+    }
+"""]]
+
+Before the `server` section in there, I'm reducing the risk of abuse by rate limiting requests:
+
+[[!format  sh """
+limit_req_zone  $server_name  zone=webmention:1m   rate=2r/s;
+"""]]
+
+The logfile is being monitored by a service calling a shell script:
+
+[[!format sh """
+#!/bin/sh
+# Service starts on boot in /etc/systemd/system/webmention.service
+TO=my@email.address
+WEBMENTION_LOG=/var/log/nginx/postdata.log
+inotifywait -s -m $WEBMENTION_LOG --format "%e %w%f" --event modify|
+while read CHANGED;
+do
+    echo "$CHANGED"
+    logger "Webmention received"
+    tail -n1 $WEBMENTION_LOG | /usr/bin/mail -a'From: niihau webmention service <webmaster@email.address>' -s 'Webmention received' $TO
+done
+"""]]
+
+This uses `inotifywait`, which is part of [inotify-tools](https://github.com/inotify-tools/inotify-tools). Unfortunately, `logrotate` will remove the log file on a regular basis, which is done in 3 steps. The first 2 steps results in a MODIFY event, before a DELETE event is created. That results in 2 emails being sent every time logs are rotated if using the above script. I've not tried to ignore these yet - checking for `logrotate` running at the time an event is triggered could be a solution.
+
+The systemd service is defined in `/etc/systemd/system/webmention.service`:
+
+[[!format  sh """
+[Unit]
+Description=Service to monitor nginx log for webmentions
+After=multi-user.target
+
+[Service]
+ExecStart=/root/webmention_service.sh
+
+[Install]
+WantedBy=multi-user.target
+"""]]
+
+Announcing I'm accepting webmentions is as simple as putting the endpoint in the header of the blog:
+
+[[!format  html """
+<link rel="webmention" href="https://www.vanrenterghem.biz/webmention">
+"""]]
+
+Clearly federating conversation as the final level of joining the indieweb is quite a bit more complicated than achieving 'level 2' status on [indiewebify.me](https://indiewebify.me/).
diff --git a/posts/Implementing_Webmention_on_my_blog.org b/posts/Implementing_Webmention_on_my_blog.org

new file mode 100644 (file)

index 0000000..991c9b3
--- /dev/null
+++ b/posts/Implementing_Webmention_on_my_blog.org
@@ -0,0 +1,85 @@
+#+date: 2023-05-14 20:32:00 +0800
+#+opengraph2: ogimage="https://indieweb.org/File:indiewebcamp-logo-lockup-color@3x.png
+#+filetags: indieweb blogging open_web
+#+title: Implementing Webmention on my blog
+
+Following on from my last [[Bring_Back_Blogging][post on joining the indieweb]]...
+
+Back in February, I implemented Webmentions on my website. I took a roll-my-own approach, borrowing from [[http://superkuh.com/blog/2020-01-10-1.html][an idea by superkuh]]. It's a semi-automated solution which listens for webmentions using nginx. When (if) one is received, an email is generated that tells me about this, allowing me to validate it's a genuine comment.
+
+Technically, nginx logs the body of POST requests in its logfile.
+
+In the main configuration file `/etc/nginx/nginx.conf`, I've added
+
+#+BEGIN_SRC sh
+# Defined for Webmention logging support of www.vanrenterghem.biz
+log_format postdata '$time_local,$remote_addr,"$http_user_agent",$request_body';
+"""]]
+
+In the configuration for www.vanrenterghem.biz, the following lines enable logging webmention requests:
+
+[[!format  sh """
+# use proxying to self to get the HTTP post variables.
+    # https://stackoverflow.com/questions/4939382/logging-post-data-from-request-body
+    location = /webmention {
+       limit_req zone=webmention;
+       client_max_body_size 7k;
+       if ($request_method = POST) {
+               access_log /var/log/nginx/postdata.log postdata;
+               proxy_pass $scheme://www.vanrenterghem.biz/logsink;
+               break;
+       }
+       return 204 $scheme://$host/serviceup.html;
+    }
+    location /logsink {
+       #return 200;
+       # use 204 instead of 200 so no 0 byte file is sent to browsers from HTTP forms.
+       return 204;
+    }
+#+END_SRC
+
+Before the `server` section in there, I'm reducing the risk of abuse by rate limiting requests:
+
+#+BEGIN_SRC sh
+limit_req_zone  $server_name  zone=webmention:1m   rate=2r/s;
+#+END_SRC
+
+The logfile is being monitored by a service calling a shell script:
+
+#+BEGIN_SRC sh
+#!/bin/sh
+# Service starts on boot in /etc/systemd/system/webmention.service
+TO=my@email.address
+WEBMENTION_LOG=/var/log/nginx/postdata.log
+inotifywait -s -m $WEBMENTION_LOG --format "%e %w%f" --event modify|
+while read CHANGED;
+do
+    echo "$CHANGED"
+    logger "Webmention received"
+    tail -n1 $WEBMENTION_LOG | /usr/bin/mail -a'From: niihau webmention service <webmaster@email.address>' -s 'Webmention received' $TO
+done
+#+END_SRC
+
+This uses `inotifywait`, which is part of [[https://github.com/inotify-tools/inotify-tools][inotify-tools]]. Unfortunately, `logrotate` will remove the log file on a regular basis, which is done in 3 steps. The first 2 steps results in a MODIFY event, before a DELETE event is created. That results in 2 emails being sent every time logs are rotated if using the above script. I've not tried to ignore these yet - checking for `logrotate` running at the time an event is triggered could be a solution.
+
+The systemd service is defined in `/etc/systemd/system/webmention.service`:
+
+#+BEGIN_SRC sh
+[Unit]
+Description=Service to monitor nginx log for webmentions
+After=multi-user.target
+
+[Service]
+ExecStart=/root/webmention_service.sh
+
+[Install]
+WantedBy=multi-user.target
+#+END_SRC
+
+Announcing I'm accepting webmentions is as simple as putting the endpoint in the header of the blog:
+
+#+BEGIN_SRC html
+<link rel="webmention" href="https://www.vanrenterghem.biz/webmention">
+#+END_SRC
+
+Clearly federating conversation as the final level of joining the indieweb is quite a bit more complicated than achieving 'level 2' status on [[https://indiewebify.me/][indiewebify.me]].
diff --git a/posts/In_the_pines.mdwn b/posts/In_the_pines.mdwn

new file mode 100644 (file)

index 0000000..6a9aa45
--- /dev/null
+++ b/posts/In_the_pines.mdwn
@@ -0,0 +1,2 @@
+[[!meta date="2015-11-09 02:20:04 +1300"]]
+Unplugged in New York. Sidetracked - The Triffids. Back to start in one hop. `#inthepines`
diff --git a/posts/In_the_pines.org b/posts/In_the_pines.org

new file mode 100644 (file)

index 0000000..70969f7
--- /dev/null
+++ b/posts/In_the_pines.org
@@ -0,0 +1,7 @@
+#+date: 2015-11-09 02:20:04 +1300
+#+title: In the pines
+#+filetags: musings
+
+Unplugged in New York.
+Sidetracked - The Triffids. 
+Back to start in one hop. =#inthepines=
diff --git a/posts/Innovate_WA_2023.mdwn b/posts/Innovate_WA_2023.mdwn

new file mode 100644 (file)

index 0000000..aa81615
--- /dev/null
+++ b/posts/Innovate_WA_2023.mdwn
@@ -0,0 +1,17 @@
+[[!meta date="2023-02-15 17:04:00 +0800"]]
+
+[[!tag management innovation leadership]]
+
+Public Sector Network's Innovate WA conference today started with a poll amongst the attendees, asking for our biggest goal or aspiration for the public sector in Western Australia. Overwhelmingly, collaboration came out as the main opportunity for contributors and decision makers in the sector. Closely linked was the desire to better share data between government departments and functions. In his opening address, WA Minister for Innovation Stephen Dawson touched on that, mentioning the State Government is planning to introduce legislation later this year around privacy and responsible data sharing. This will be the first time WA government agencies and state-owned enterprises will be subject to privacy laws, and at the same time is hoped to encourage data sharing that should result in better outcomes for citizens of the state.
+
+[[poll results|/pics/Innovate_WA_2023.png]]
+
+Greg Italiano, the state government's CIO, gave an update on the digital transformation of the WA government. Delivering a digital identity has been a key milestone so far - no easy task given the many arms of government at state and federal level that were involved. He acknowledged the Service WA app doesn't offer a compelling range of services to deal with government so far though - finding your best deal for refueling and notices on shark detections probably don't top the list of needs for many.
+
+With an ability to digitally authenticate as yourself, many options now do exist for future enhancements. This was also the viewpoint of Hans Jayatissa, who spoke about the steps the Danish government has taken over the last 20 years that have brought the country to be the world leader in digital government.
+
+All the focus on 'digital first' should not result in 'digital only' though, which came out strongly in a review of the blueprint to ensure digital inclusivity in government. To that end, it was great to hear several actions have been planned to help specific groups in society at risk of being left behind: people in regional communities, older people, Aboriginal communities, people from different cultural and linguistic backgrounds all have a range of reasons why they may be digitally excluded: lacking education on how to use IT, no access to devices like internet-connected phones or computers, lacking budgets to pay for connectivity, etc. A human-centered approach to digital inclusivity clearly brought out what to work on to ensure everyone will have a decent opportunity to access an increasingly digital world.
+
+Giselle Rowe and Danielle Giles in their respective talks provided excellent advice on how to foster innovation in an organisation. The former listed out a range of factors for success - an understanding of 'what is in it for me', small steps, supportive leadership, innovation ambassadors all are enablers for changing business processes. The latter convinced the conference room of the power of simple 2 minute exercises like creating a portrait of the person you are working with without taking your eyes of them, or continuously asking 'who are you' as a variant of the '5 why technique' demonstrating how you need multiple attempts at answering the same question to get to a root truth.
+
+Great to see such a focus on enabling and supporting innovation, not only to streamline access to government services, but also in the wider economy to help keep Australia shine on the world stage!
diff --git a/posts/Innovate_WA_2023.org b/posts/Innovate_WA_2023.org

new file mode 100644 (file)

index 0000000..e2d8c6c
--- /dev/null
+++ b/posts/Innovate_WA_2023.org
@@ -0,0 +1,18 @@
+#+date: 2023-02-15 17:04:00 +0800
+#+title: Innovate WA 2023
+#+filetags: management innovation leadership
+
+Public Sector Network's Innovate WA conference today started with a poll amongst the attendees, asking for our biggest goal or aspiration for the public sector in Western Australia. Overwhelmingly, collaboration came out as the main opportunity for contributors and decision makers in the sector. Closely linked was the desire to better share data between government departments and functions. In his opening address, WA Minister for Innovation Stephen Dawson touched on that, mentioning the State Government is planning to introduce legislation later this year around privacy and responsible data sharing. This will be the first time WA government agencies and state-owned enterprises will be subject to privacy laws, and at the same time is hoped to encourage data sharing that should result in better outcomes for citizens of the state.
+
+#+caption: poll results
+[[file:assets/Innovate_WA_2023.png]]
+
+Greg Italiano, the state government's CIO, gave an update on the digital transformation of the WA government. Delivering a digital identity has been a key milestone so far - no easy task given the many arms of government at state and federal level that were involved. He acknowledged the Service WA app doesn't offer a compelling range of services to deal with government so far though - finding your best deal for refueling and notices on shark detections probably don't top the list of needs for many.
+
+With an ability to digitally authenticate as yourself, many options now do exist for future enhancements. This was also the viewpoint of Hans Jayatissa, who spoke about the steps the Danish government has taken over the last 20 years that have brought the country to be the world leader in digital government.
+
+All the focus on 'digital first' should not result in 'digital only' though, which came out strongly in a review of the blueprint to ensure digital inclusivity in government. To that end, it was great to hear several actions have been planned to help specific groups in society at risk of being left behind: people in regional communities, older people, Aboriginal communities, people from different cultural and linguistic backgrounds all have a range of reasons why they may be digitally excluded: lacking education on how to use IT, no access to devices like internet-connected phones or computers, lacking budgets to pay for connectivity, etc. A human-centered approach to digital inclusivity clearly brought out what to work on to ensure everyone will have a decent opportunity to access an increasingly digital world.
+
+Giselle Rowe and Danielle Giles in their respective talks provided excellent advice on how to foster innovation in an organisation. The former listed out a range of factors for success - an understanding of 'what is in it for me', small steps, supportive leadership, innovation ambassadors all are enablers for changing business processes. The latter convinced the conference room of the power of simple 2 minute exercises like creating a portrait of the person you are working with without taking your eyes of them, or continuously asking 'who are you' as a variant of the '5 why technique' demonstrating how you need multiple attempts at answering the same question to get to a root truth.
+
+Great to see such a focus on enabling and supporting innovation, not only to streamline access to government services, but also in the wider economy to help keep Australia shine on the world stage!
diff --git a/posts/Magna_Carta.mdwn b/posts/Magna_Carta.mdwn

new file mode 100644 (file)

index 0000000..1f5f7ce
--- /dev/null
+++ b/posts/Magna_Carta.mdwn
@@ -0,0 +1,2 @@
+[[!meta date="2015-06-15 12:59:30 +1200"]]
+The spirit of Magna Carta could serve us well today.
diff --git a/posts/Magna_Carta.org b/posts/Magna_Carta.org

new file mode 100644 (file)

index 0000000..547764e
--- /dev/null
+++ b/posts/Magna_Carta.org
@@ -0,0 +1,5 @@
+#+date: 2015-06-15 12:59:30 +1200
+#+title: Magna Carta
+#+filetags: musings
+
+The spirit of Magna Carta could serve us well today.
diff --git a/posts/NYC_taxi_calendar_fun.mdwn b/posts/NYC_taxi_calendar_fun.mdwn

new file mode 100644 (file)

index 0000000..3f1393c
--- /dev/null
+++ b/posts/NYC_taxi_calendar_fun.mdwn
@@ -0,0 +1,53 @@
+[[!meta date="2017-11-30 21:36:05 +0800"]]
+[[!tag R analysis visualisation sqlite3 sql]]
+
+The Internet Archive contains a [[dataset from the NYC Taxi and Limousine Commission|https://archive.org/details/nycTaxiTripData2013]], obtained under a FOIA request. It includes a listing of each taxi ride in 2013, its number of passengers, distance covered, start and stop locations and more.
+
+The dataset is a wopping 3.9 GB compressed, or shy of 30 GB uncompressed. As such, it is quite unwieldy in R.
+
+As I was interested in summarised data for my first analysis, I decided to load the CSV files in a SQLite database, query it using SQL and storing the resulting output as CSV file again - far smaller though, as I only needed 2 columns for each day of the 1 year of data.
+
+The process went as follows.
+
+First extract the CSV file from the 7z compressed archive.
+
+[[!format sh """
+7z e ../trip_data.7z trip_data_1.csv
+"""]]
+
+and the same for the other months. (As I was running low on disk space, I had to do 2 months at a time only.) Next, import it in a SQLite db. 
+
+[[!format sh """
+echo -e '.mode csv \n.import trip_data_1.csv trips2013' | sqlite3 NYCtaxi.db
+"""]]
+
+Unfortunately the header row separates with ", ", and column names now start with a space. This does not happen when importing in the sqlite3 command line - tbd why. As a result, those column names need to be quoted in the query below.
+
+Repeat this import for all the months - as mentioned, I did 2 at time.
+
+Save the output we need in temporary csv files:
+
+[[!format sh """
+sqlite3 -header -csv trips2013.db 'select DATE(" pickup_datetime"), count(" passenger_count") AS rides, sum(" passenger_count") AS passengers from trips2013 GROUP BY DATE(" pickup_datetime");' > 01-02.csv
+"""]]
+
+Remove the archives and repeat:
+
+[[!format sh """
+rm trip_data_?.csv
+rm trips2013.db
+"""]]
+
+Next, I moved on to the actual analysis work in R.
+
+Looking at the number of trips per day on a calendar heatmap reveals something odd - the first week of August has very few rides compared to any other week. While it's known people in NY tend to leave the city in August, this drop is odd.
+
+[[Calendar heatmap of trips|/pics/NYCtaxitripsNbrPlot.png]]
+
+Deciding to ignore August altogether, and zooming in on occupancy rate of the taxis rather than the absolute number or rides, reveals an interesting insight - people travel together far more in weekends and on public holidays!
+
+[[Occupancy heatmap|/pics/NYCtaxioccupancyPlot.png]]
+
+Just looking at the calendar heatmap it's possible to determine 1 Jan 2013 was a Tuesday and point out Memorial Day as the last Monday of May, Labour day in September, Thanksgiving day and even Black Friday at the end of November, and of course the silly season at the end of the year!)
+
+The dataset contains even more interesting information in its geo-location columns I imagine!
diff --git a/posts/NYC_taxi_calendar_fun.org b/posts/NYC_taxi_calendar_fun.org

new file mode 100644 (file)

index 0000000..ef0f2c4
--- /dev/null
+++ b/posts/NYC_taxi_calendar_fun.org
@@ -0,0 +1,56 @@
+#+date: 2017-11-30 21:36:05 +0800
+#+filetags: R analysis visualisation sqlite3 sql
+#+title: NYC taxi calendar fun
+
+The Internet Archive contains a [[https://archive.org/details/nycTaxiTripData2013][dataset from the NYC Taxi and Limousine Commission]], obtained under a FOIA request. It includes a listing of each taxi ride in 2013, its number of passengers, distance covered, start and stop locations and more.
+
+The dataset is a wopping 3.9 GB compressed, or shy of 30 GB uncompressed. As such, it is quite unwieldy in R.
+
+As I was interested in summarised data for my first analysis, I decided to load the CSV files in a SQLite database, query it using SQL and storing the resulting output as CSV file again - far smaller though, as I only needed 2 columns for each day of the 1 year of data.
+
+The process went as follows.
+
+First extract the CSV file from the 7z compressed archive.
+
+#+BEGIN_SRC sh
+7z e ../trip_data.7z trip_data_1.csv
+#+END_SRC
+
+and the same for the other months. (As I was running low on disk space, I had to do 2 months at a time only.) Next, import it in a SQLite db. 
+
+#+BEGIN_SRC sh
+echo -e '.mode csv \n.import trip_data_1.csv trips2013' | sqlite3 NYCtaxi.db
+#+END_SRC
+
+Unfortunately the header row separates with ", ", and column names now start with a space. This does not happen when importing in the sqlite3 command line - tbd why. As a result, those column names need to be quoted in the query below.
+
+Repeat this import for all the months - as mentioned, I did 2 at time.
+
+Save the output we need in temporary csv files:
+
+#+BEGIN_SRC sh
+sqlite3 -header -csv trips2013.db 'select DATE(" pickup_datetime"), count(" passenger_count") AS rides, sum(" passenger_count") AS passengers from trips2013 GROUP BY DATE(" pickup_datetime");' > 01-02.csv
+#+END_SRC
+
+Remove the archives and repeat:
+
+#+BEGIN_SRC sh
+rm trip_data_?.csv
+rm trips2013.db
+#+END_SRC
+
+Next, I moved on to the actual analysis work in R.
+
+Looking at the number of trips per day on a calendar heatmap reveals something odd - the first week of August has very few rides compared to any other week. While it's known people in NY tend to leave the city in August, this drop is odd.
+
+#+caption: Calendar heatmap of trips
+[[file:assets/NYCtaxitripsNbrPlot.png]]
+
+Deciding to ignore August altogether, and zooming in on occupancy rate of the taxis rather than the absolute number or rides, reveals an interesting insight - people travel together far more in weekends and on public holidays!
+
+#+caption: Occupancy heatmap
+[[file:assets/NYCtaxioccupancyPlot.png]]
+
+Just looking at the calendar heatmap it's possible to determine 1 Jan 2013 was a Tuesday and point out Memorial Day as the last Monday of May, Labour day in September, Thanksgiving day and even Black Friday at the end of November, and of course the silly season at the end of the year!)
+
+The dataset contains even more interesting information in its geo-location columns I imagine!
diff --git a/posts/Nobel_prize_winner_having_fun.mdwn b/posts/Nobel_prize_winner_having_fun.mdwn

new file mode 100644 (file)

index 0000000..9c1b34c
--- /dev/null
+++ b/posts/Nobel_prize_winner_having_fun.mdwn
@@ -0,0 +1,5 @@
+[[!meta date="2018-10-12 20:00:01 +0800"]]
+[[!tag people leadership Python]]
+Paul Romer may well be the first Nobel prize winner using Jupyter notebooks in his scientific workflow. On his blog, he explains his reasoning.
+
+My key takeaway from [[the article|https://paulromer.net/jupyter-mathematica-and-the-future-of-the-research-paper/]]: he's having fun.
diff --git a/posts/Nobel_prize_winner_having_fun.org b/posts/Nobel_prize_winner_having_fun.org

new file mode 100644 (file)

index 0000000..c64fc44
--- /dev/null
+++ b/posts/Nobel_prize_winner_having_fun.org
@@ -0,0 +1,8 @@
+#+date: 2018-10-12 20:00:01 +0800
+#+filetags: people leadership Python 
+
+Paul Romer may well be the first Nobel prize winner using Jupyter notebooks in his scientific workflow. On his blog, he explains
+his reasoning.
+
+My key takeaway from [[https://paulromer.net/jupyter-mathematica-and-the-future-of-the-research-paper/][the article]]:
+he's having fun.
diff --git a/posts/Perth_solar_exposure_over_year.mdwn b/posts/Perth_solar_exposure_over_year.mdwn

new file mode 100644 (file)

index 0000000..04802bb
--- /dev/null
+++ b/posts/Perth_solar_exposure_over_year.mdwn
@@ -0,0 +1,32 @@
+[[!meta date="2023-06-11 13:27:13 +0800"]]
+[[!opengraph2 ogimage="https://www.vanrenterghem.biz/blog/pics/solarExposure.png"]]
+[[!tag R analysis weather visualisation]]
+
+Perth, Western Australia is a sunny place, as any local will
+confirm. Combined with subsidised buyback tariffs for electricity
+returned into the grid, this has resulted in many local households now
+having an array of solar panels on their roof.
+
+What most will intuitively understand is the production of these
+panels varying over the year. That's a combination of the differences
+in average cloud cover on the one hand, and on the other hand the
+amount of energy that falls down on the panel from the sun varying
+over the year due to the tilted axis of the earth.
+
+Combined, this results in significantly different levels of energy
+available on our roofs throughout the year.
+
+[[!img /pics/solarExposure.png alt = "Perth solar exposure variation" class="img-fluid"]]
+
+__Table:__ Average solar exposure per m<sup>2</sup> in Kings Park, Perth Jan 2017
+to Jun 2023.
+
+[[!table header="row" data="""
+__Month__|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec
+---|---|---|---|---|---|---|---|---|---|---|---|---
+__MJ/m<sup>2</sup>__|949,152|741,432|642,187|484,428|362,085|280,690|297,863|411,215|548,154|718,831|831,845|958,651
+"""]]
+
+Right now in June, we're at the low point for the year in expected
+yield from a solar panel, with just about a third of the energy being
+generated that we can expect to get in Dec.
diff --git a/posts/Perth_solar_exposure_over_year.org b/posts/Perth_solar_exposure_over_year.org

new file mode 100644 (file)

index 0000000..7e282fc
--- /dev/null
+++ b/posts/Perth_solar_exposure_over_year.org
@@ -0,0 +1,34 @@
+#+date: 2023-06-11 13:27:13 +0800
+#+opengraph2; ogimage="https://www.vanrenterghem.biz/blog/pics/solarExposure.png
+#+filetags: R analysis weather visualisation
+#+title: Perth solar exposure over year
+
+Perth, Western Australia is a sunny place, as any local will
+confirm. Combined with subsidised buyback tariffs for electricity
+returned into the grid, this has resulted in many local households now
+having an array of solar panels on their roof.
+
+What most will intuitively understand is the production of these
+panels varying over the year. That's a combination of the differences
+in average cloud cover on the one hand, and on the other hand the
+amount of energy that falls down on the panel from the sun varying
+over the year due to the tilted axis of the earth.
+
+Combined, this results in significantly different levels of energy
+available on our roofs throughout the year.
+
+#+caption: Perth solar exposure variation
+#+ATTR_HTML: :class img-fluid :alt Perth solar exposure variation
+[[file:assets/solarExposure.png]]
+
+__Table:__ Average solar exposure per m<sup>2</sup> in Kings Park, Perth Jan 2017
+to Jun 2023.
+
+|__Month__|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec
+|-
+|__MJ/m<sup>2</sup>__|949,152|741,432|642,187|484,428|362,085|280,690|297,863|411,215|548,154|718,831|831,845|958,651
+
+
+Right now in June, we're at the low point for the year in expected
+yield from a solar panel, with just about a third of the energy being
+generated that we can expect to get in Dec.
diff --git a/posts/R_and_github.mdwn b/posts/R_and_github.mdwn

new file mode 100644 (file)

index 0000000..a732db8
--- /dev/null
+++ b/posts/R_and_github.mdwn
@@ -0,0 +1,7 @@
+[[!meta date="2016-09-23 19:47:35 +0800"]]
+
+Where [my first R package](posts/first_r_package) was more a proof-of-concept, I now certainly left the *beginneRs* group by publishing an R package to a private Github repository at work. I used that package in some R scripts performing various real-time analysis of our operations already. This by loading it through the [devtools package](https://cran.r-project.org/web/packages/devtools/README.html), and specifically the `install_github()` function.
+
+Next, I will have to check the `install_url()` function, as I had not quite figured out at the time of writing my initial R package how I could actually use it in a script without manually installing it first.
+
+The ability to script regular reporting and publish results as graphs and tables and publishing these in emails (opening a gateway into for instance Slack) or in Excel files, is very empowering. To an extent, I used to do this using VBA some years ago. Doing that in an integrated way with various datasources required a lot more work though, certainly given how the MS Windows environment until not so long ago lacked decent support for scripted operations for anyone but in-the-know IT professionals.
diff --git a/posts/R_and_github.org b/posts/R_and_github.org

new file mode 100644 (file)

index 0000000..453ea61
--- /dev/null
+++ b/posts/R_and_github.org
@@ -0,0 +1,23 @@
+#+date: 2016-09-23 19:47:35 +0800
+#+title: R and Github
+
+Where [[file:posts/first_r_package][my first R package]] was more a
+proof-of-concept, I now certainly left the /beginneRs/ group by
+publishing an R package to a private Github repository at work. I used
+that package in some R scripts performing various real-time analysis of
+our operations already. This by loading it through the
+[[https://cran.r-project.org/web/packages/devtools/README.html][devtools
+package]], and specifically the =install_github()= function.
+
+Next, I will have to check the =install_url()= function, as I had not
+quite figured out at the time of writing my initial R package how I
+could actually use it in a script without manually installing it first.
+
+The ability to script regular reporting and publish results as graphs
+and tables and publishing these in emails (opening a gateway into for
+instance Slack) or in Excel files, is very empowering. To an extent, I
+used to do this using VBA some years ago. Doing that in an integrated
+way with various datasources required a lot more work though, certainly
+given how the MS Windows environment until not so long ago lacked decent
+support for scripted operations for anyone but in-the-know IT
+professionals.
diff --git a/posts/Ronald_McDonald_House.mdwn b/posts/Ronald_McDonald_House.mdwn

new file mode 100644 (file)

index 0000000..96b852e
--- /dev/null
+++ b/posts/Ronald_McDonald_House.mdwn
@@ -0,0 +1,9 @@
+[[!meta date="2022-12-22 06:28:00 +0800"]]
+[[!opengraph2 ogimage="https://www.vanrenterghem.biz/blog/pics/group-photo-RMDH.png"]]
+[[!tag volunteering]]
+
+Synergy's Pricing and Portfolio team took the opportunity at the end of the year to volunteer for [Ronald McDonald House Nedlands](https://rmhcwa.org.au/). This is a charity providing a 'home away from home' for families with a child in hospital in Perth. These families often live hundreds if not thousands of kilometers away due to the size of Western Australia, with Perth the only metropolitan area in the state featuring a children's hospital. Families often stay for weeks and even months in the facility while their child undergoes treatment.
+
+[[!img /pics/group-photo-RMDH.png alt="Group picture with the house manager" class="img-fluid"]]
+
+We prepared lunch for the families and volunteers in the house, followed by a tour of the place by its manager. Hearing some of the stories about the children was heartbreaking and uplifting at the same time. During extremely demanding times for a family, both mentally and financially, the volunteers and the team at Ronald McDonald House do a great job in making life slightly easier for them. It was great to be able to contribute a little bit to that this Christmas period.
diff --git a/posts/Ronald_McDonald_House.org b/posts/Ronald_McDonald_House.org

new file mode 100644 (file)

index 0000000..c27355d
--- /dev/null
+++ b/posts/Ronald_McDonald_House.org
@@ -0,0 +1,12 @@
+#+date: 2022-12-22 06:28:00 +0800
+#+opengraph2: ogimage="https://www.vanrenterghem.biz/blog/pics/group-photo-RMDH.png
+#+filetags: volunteering
+#+title: Ronald McDonald House
+
+Synergy's Pricing and Portfolio team took the opportunity at the end of the year to volunteer for [[https://rmhcwa.org.au/][Ronald McDonald House Nedlands]]. This is a charity providing a 'home away from home' for families with a child in hospital in Perth. These families often live hundreds if not thousands of kilometers away due to the size of Western Australia, with Perth the only metropolitan area in the state featuring a children's hospital. Families often stay for weeks and even months in the facility while their child undergoes treatment.
+
+#+caption: Group picture with the house manager
+#+ATTR_HTML: :class img-fluid :alt Group picture with the house manager
+[[file:assets/group-photo-RMDH.png]]
+
+We prepared lunch for the families and volunteers in the house, followed by a tour of the place by its manager. Hearing some of the stories about the children was heartbreaking and uplifting at the same time. During extremely demanding times for a family, both mentally and financially, the volunteers and the team at Ronald McDonald House do a great job in making life slightly easier for them. It was great to be able to contribute a little bit to that this Christmas period.
diff --git a/posts/Setting_up_an_Analytics_Practice.mdwn b/posts/Setting_up_an_Analytics_Practice.mdwn

new file mode 100644 (file)

index 0000000..648dfc4
--- /dev/null
+++ b/posts/Setting_up_an_Analytics_Practice.mdwn
@@ -0,0 +1,5 @@
+[[!meta date="2019-04-09 07:59:58 +0800"]]
+[[!tag leadership management analytics]]
+[[Mindmap on setting up analytics practice|/pics/Setting_up_an_analytics_practice.png]]
+
+Ideas courtesy of Abhi Seth, Head of Data Science & Analytics at Honeywell Aerospace.
diff --git a/posts/Setting_up_an_Analytics_Practice.org b/posts/Setting_up_an_Analytics_Practice.org

new file mode 100644 (file)

index 0000000..96f6c54
--- /dev/null
+++ b/posts/Setting_up_an_Analytics_Practice.org
@@ -0,0 +1,9 @@
+#+date: 2019-04-09 07:59:58 +0800
+#+filetags: leadership management analytics
+#+title: Setting up an ananlytics practice.
+
+#+CAPTION: Mindmap on setting up analytics practice
+#+ATTR_HTML: :class img-fluid :alt Mindmap on setting up analytics practice
+[[file:assets/Setting_up_an_analytics_practice.png]]
+
+Ideas courtesy of Abhi Seth, Head of Data Science & Analytics at Honeywell Aerospace.
diff --git a/posts/WADSIH_talk_on_consumer_insights.mdwn b/posts/WADSIH_talk_on_consumer_insights.mdwn

new file mode 100644 (file)

index 0000000..b3ecc0a
--- /dev/null
+++ b/posts/WADSIH_talk_on_consumer_insights.mdwn
@@ -0,0 +1,8 @@
+[[!meta date="2022-11-06 23:14:11 +1100"]]
+
+[[!tag management data_science customer_analytics analytics leadership]]
+
+I recently had the honour of giving a talk about delivering consumer insights at a publicly owned utility during a session organised by the [Western Australian Data Science Innovation Hub (WADSIH)](https://wadsih.org.au/). During the talk, I walked through the data science process the team moves through while delivering customer insights. To make that less theoretical, I did this using an example of work delivered recently to prompt consumers to take mutually beneficial actions to both lower their power bills and help ensure the stability of the electricity grid.
+
+I converted the presentation from Powerpoint to code on the weekend, and made it [available online](http://git.vanrenterghem.biz/gitweb.cgi/WADSIH-presentation.git), as well as a [compiled PDF version](http://git.vanrenterghem.biz/gitweb.cgi/WADSIH-presentation.git/blob/HEAD:/DataScienceDiscovery-ConsumerInsights.pdf).  
+
diff --git a/posts/WADSIH_talk_on_consumer_insights.org b/posts/WADSIH_talk_on_consumer_insights.org

new file mode 100644 (file)

index 0000000..771dc51
--- /dev/null
+++ b/posts/WADSIH_talk_on_consumer_insights.org
@@ -0,0 +1,19 @@
+#+date 2022-11-06 23:14:11 +1100
+#+filetags: tag management data_science customer_analytics analytics leadership
+#+title: WADSIH talk on consumer insights
+
+I recently had the honour of giving a talk about delivering consumer
+insights at a publicly owned utility during a session organised by the
+[[https://wadsih.org.au/][Western Australian Data Science Innovation Hub
+(WADSIH)]]. During the talk, I walked through the data science process
+the team moves through while delivering customer insights. To make that
+less theoretical, I did this using an example of work delivered recently
+to prompt consumers to take mutually beneficial actions to both lower
+their power bills and help ensure the stability of the electricity grid.
+
+I converted the presentation from Powerpoint to code on the weekend, and
+made it
+[[http://git.vanrenterghem.biz/gitweb.cgi/WADSIH-presentation.git][available
+online]], as well as a
+[[http://git.vanrenterghem.biz/gitweb.cgi/WADSIH-presentation.git/blob/HEAD:/DataScienceDiscovery-ConsumerInsights.pdf][compiled
+PDF version]].
diff --git a/posts/WA_roads_in_R_using_sf.mdwn b/posts/WA_roads_in_R_using_sf.mdwn

new file mode 100644 (file)

index 0000000..ca1f2a2
--- /dev/null
+++ b/posts/WA_roads_in_R_using_sf.mdwn
@@ -0,0 +1,16 @@
+[[!meta date="2017-10-17 20:35:40 +0800"]]
+
+[[!tag R spatial analysis visualisation]]
+How cool is this? A map of Western Australia with all state roads marked in only 5 lines of R!
+
+[[!format r """
+WARoads <- st_read(dsn = "data/", layer = "RoadNetworkMRWA_514", stringsAsFactors = FALSE)
+WALocalities <- st_read(dsn = "data/", layer = "WA_LOCALITY_POLYGON_shp", stringsAsFactors = FALSE)
+ggplot(WALocalities) +
+  geom_sf() +
+  geom_sf(data = dplyr::filter(WARoads, network_ty == "State Road"), colour = "red")
+"""]]
+
+[[Map of WA state roads|/pics/state-roads.png]]
+
+Courtesy of the development version of ggplot2 - geom_sf is not yet available in the version on CRAN.
diff --git a/posts/WA_roads_in_R_using_sf.org b/posts/WA_roads_in_R_using_sf.org

new file mode 100644 (file)

index 0000000..dd12d7f
--- /dev/null
+++ b/posts/WA_roads_in_R_using_sf.org
@@ -0,0 +1,20 @@
+#+date 2017-10-17 20:35:40 +0800
+#+title WA roads in R using SF
+#+filetags: R spatial analysis visualisation
+
+How cool is this? A map of Western Australia with all state roads marked in only 5 lines of R!
+
+#+BEGIN_SRC R
+WARoads <- st_read(dsn = "data/", layer = "RoadNetworkMRWA_514", stringsAsFactors = FALSE) 
+WALocalities <- st_read(dsn = "data/", layer = "WA_LOCALITY_POLYGON_shp", stringsAsFactors = FALSE) 
+ggplot(WALocalities) + geom_sf() +
+geom_sf(data = dplyr::filter(WARoads, network_ty == "State Road"),
+colour = "red")
+#+END_SRC
+
+#+caption: Map of WA state roads
+#+ATTR_HTML: :class img-fluid :alt Map of WA state roads
+[[file:assets/state-roads.png]]
+
+Courtesy of the development version of ggplot2 - geom_sf is not yet
+available in the version on CRAN.
diff --git a/posts/Wrapping_Confluent_Kafka_REST_Proxy_API_in_R.org b/posts/Wrapping_Confluent_Kafka_REST_Proxy_API_in_R.org

new file mode 100644 (file)

index 0000000..e9d17fb
--- /dev/null
+++ b/posts/Wrapping_Confluent_Kafka_REST_Proxy_API_in_R.org
@@ -0,0 +1,8 @@
+#+date: <2018-09-14 21:53:53 +0800>
+#+filetags: Apache Kafka R bigdata API
+
+It started of as an attempt to analyse some data stored in [[https://kafka.apache.org/][Apache Kafka]] using [[https://www.r-project.org/][R]], and ended up becoming the start of an R package to interact with [[https://docs.confluent.io/current/kafka-rest/docs/index.html][Confluent's REST Proxy API]].
+
+While [[https://cran.r-project.org/package=rkafka][rkafka]] already allows the creation of a producer and a consumer from R, writing some R functions interfacing with its REST API was an interesting way to learn a bit more about Kafka's inner workings, and demonstrate how easy it is to interact with any REST API from R thanks to [[https://cran.r-project.org/package=httr][httr]].
+
+The result is [[http://git.vanrenterghem.biz/R/project-using-kafka-in-R.git][available to clone on my git server]].
diff --git a/posts/agent_based_models_digital_twins.mdwn b/posts/agent_based_models_digital_twins.mdwn

new file mode 100644 (file)

index 0000000..edac69f
--- /dev/null
+++ b/posts/agent_based_models_digital_twins.mdwn
@@ -0,0 +1,18 @@
+[[!meta date="2023-10-29T07:15:34Z"]]
+[[!tag mathematics modeling]]
+
+A new approach to modeling using categories and software, facilitating the build of advanced models like digital twins, is being developed at the moment.
+
+During the 2023 SIAM Conference on Computational Science and Engineering, a group of researchers [presented](https://meetings.siam.org/sess/dsp_talk.cfm?p=124342) their 
+
+> diagrammatic representations that provide an intuitive interface for specifying the relationships between variables in a system of equations, a method for composing systems equations into a multiphysics model using an operad of wiring diagrams, and an algorithm for deriving solvers using directed hypergraphs, yielding a method of generating executable systems from these diagrams using the operators of discrete exterior calculus on a simplicial set. The generated solvers produce numerical solutions consistent with state of the art open source tools.
+
+As pointed out, mathematics can rarely be isomorphic to its software implementation, yet here the researchers go a long way in enabling that.
+
+Using Julia language, the applied category theorists working on this concept wrote a software (StockFlow) which allows users to build stock-flow diagrams and do all sorts of things with them - from drawing them over to transforming them into other forms like dynamical systems and system structure diagrams, or to solving the underlying differential equations. 
+
+The team have also built software (ModelCollab) that hides all the Julia code again, enabling people that aren't educated mathematicians or computer scientists to apply this way of modeling in their work.
+
+This fascinates me, as having a way to write and audit complex systems like digital twins using free and open-source approaches can be transformative in making them accessible for smaller organisations or developed for non-core departments in bigger organisations that up to now are the only ones with enough money or people to develop them for their key operations.
+
+Read more on [John Baez's blog](https://johncarlosbaez.wordpress.com/2023/10/25/software-for-compositional-modeling-in-epidemiology/). 
diff --git a/posts/agent_based_models_digital_twins.org b/posts/agent_based_models_digital_twins.org

new file mode 100644 (file)

index 0000000..3769709
--- /dev/null
+++ b/posts/agent_based_models_digital_twins.org
@@ -0,0 +1,48 @@
+#+date: 2023-10-29T07:15:34Z
+#+filetags: mathematics modeling
+#+title: Agent-based models and digital twins.
+
+A new approach to modeling using categories and software, facilitating
+the build of advanced models like digital twins, is being developed at
+the moment.
+
+During the 2023 SIAM Conference on Computational Science and
+Engineering, a group of researchers
+[[https://meetings.siam.org/sess/dsp_talk.cfm?p=124342][presented]]
+their
+
+#+begin_quote
+diagrammatic representations that provide an intuitive interface for
+specifying the relationships between variables in a system of equations,
+a method for composing systems equations into a multiphysics model using
+an operad of wiring diagrams, and an algorithm for deriving solvers
+using directed hypergraphs, yielding a method of generating executable
+systems from these diagrams using the operators of discrete exterior
+calculus on a simplicial set. The generated solvers produce numerical
+solutions consistent with state of the art open source tools.
+#+end_quote
+
+As pointed out, mathematics can rarely be isomorphic to its software
+implementation, yet here the researchers go a long way in enabling that.
+
+Using Julia language, the applied category theorists working on this
+concept wrote a software (StockFlow) which allows users to build
+stock-flow diagrams and do all sorts of things with them - from drawing
+them over to transforming them into other forms like dynamical systems
+and system structure diagrams, or to solving the underlying differential
+equations.
+
+The team have also built software (ModelCollab) that hides all the Julia
+code again, enabling people that aren't educated mathematicians or
+computer scientists to apply this way of modeling in their work.
+
+This fascinates me, as having a way to write and audit complex systems
+like digital twins using free and open-source approaches can be
+transformative in making them accessible for smaller organisations or
+developed for non-core departments in bigger organisations that up to
+now are the only ones with enough money or people to develop them for
+their key operations.
+
+Read more on
+[[https://johncarlosbaez.wordpress.com/2023/10/25/software-for-compositional-modeling-in-epidemiology/][John
+Baez's blog]].
diff --git a/posts/azure_file_storage_blobs.mdwn b/posts/azure_file_storage_blobs.mdwn

new file mode 100644 (file)

index 0000000..8841ca9
--- /dev/null
+++ b/posts/azure_file_storage_blobs.mdwn
@@ -0,0 +1,3 @@
+[[!meta date="2016-05-06 20:10:31 +0800"]]
+
+Azure's files storage blobs are currently not mountable from GNU/Linux, except if the box is running in the same Azure region as the blob. Theoretical solution: mount it on a VM, and sshfs that from anywhere. TBC.
diff --git a/posts/azure_file_storage_blobs.org b/posts/azure_file_storage_blobs.org

new file mode 100644 (file)

index 0000000..e691d89
--- /dev/null
+++ b/posts/azure_file_storage_blobs.org
@@ -0,0 +1,8 @@
+#+date: 2016-05-06 20:10:31 +0800
+#+title: Azure file storage blobs
+#+filetags: muzings
+
+Azure's files storage blobs are currently not mountable from GNU/Linux,
+except if the box is running in the same Azure region as the blob.
+Theoretical solution: mount it on a VM, and sshfs that from anywhere.
+TBC.
diff --git a/posts/different_spin_to_competing_on_analytics.mdwn b/posts/different_spin_to_competing_on_analytics.mdwn

new file mode 100644 (file)

index 0000000..a9279f2
--- /dev/null
+++ b/posts/different_spin_to_competing_on_analytics.mdwn
@@ -0,0 +1,30 @@
+[[!meta date="2019-06-11 20:46:22 +0800"]]
+[[!tag analytics leadership military fintech]]
+
+The May/June 2019 issue of Foreign Affairs contains an article by Christian Brose, titled [["The New Revolution in Military Affairs"|https://www.foreignaffairs.com/articles/2019-04-16/new-revolution-military-affairs]].
+
+What struck me while reading the article is how much of an analogy can be drawn between what is happening to businesses worldwide, and what the author writes about the future in military technology and its trailing adoption in the United States of America's military.
+
+The transformation he describes is about the core process concerning militaries, the so called "kill chain". Thanks to technological advances, including artificial intelligence, that process can be rapdidly accelerated, offering a competitive advantage to the owner of the technology.
+
+Following quotes struck me in particular:
+
+> Instead of thinking systematically about buying faster, more effective kill chains that could be built now, Washington poured money into newer versions of old military platforms and prayed for technological miracles to come.
+
+> The question, accordingly, is not how new technologies can improve the U.S. military’s ability to do what it already does but how they can enable it to operate in new ways.
+
+> A military made up of small numbers of large, expensive, heavily manned, and hard-to-replace systems will not survive on future battlefields, where swarms of intelligent machines will deliver violence at a greater volume and higher velocity than ever before. Success will require a different kind of military, one built around large numbers of small, inexpensive, expendable, and highly autonomous systems.
+
+The same could be written about so many companies that haven't taken up the strategy of competing on analytics.
+
+Replacing the U.S. military with banking sector for instance, formerly very profitable and seemingly unbeatable big banks have over the past decade found their banking software to be too rigid. Instead of investing in new products and services, they continued to rely on what they had been doing for the prior hundred years. They invested in upgrading their core systems, often with little payoff. While they were doing that, small fintech firms appeared, excelling at just a small fraction of what a bank considered its playing field. In those areas, these new players innovated much more quickly, resulting in far more efficient and effective service delivery.
+
+At the core of many of these innovations lies data. The author likes China's stockpiling of data as to that of oil, but the following quote was particularly relevant in how it describes the use of that stockpile of data to inform decisioning.
+
+> Every autonomous system will be able to process and make sense of the information it gathers on its own, without relying on a command hub.
+
+The analogy is clear - for years, organisations have been trying to ensure they knew the "single source of truth". Tightly coupling all business functions to a central ERP system  was usually the answer. Just like in the military, it can now often be better to have many small functions be performed on the perifery of a company's systems, accepting some duplication of data and directional accuracy to deliver quicker, more cost-effective results - using expendable solutions. The challenges to communicate effectively between these semi-autonomous systems are noted.
+
+Not insignificantly, the author poses "future militaries will be distinguished by the quality of their software, especially their artificial intelligence" - i.e. countries are competing on analytics, also in the military sphere.
+
+The article ends with some advise to government leadership - make the transormation a priority, drive the change forward, recast cultures and ensure correct incentives are in place.
diff --git a/posts/different_spin_to_competing_on_analytics.org b/posts/different_spin_to_competing_on_analytics.org

new file mode 100644 (file)

index 0000000..517d74d
--- /dev/null
+++ b/posts/different_spin_to_competing_on_analytics.org
@@ -0,0 +1,88 @@
+#+date: 2019-06-11 20:46:22 +0800
+#+filetags: analytics leadership military fintech
+#+title: Different spin to competing on analytics.
+
+The May/June 2019 issue of Foreign Affairs contains an article by
+Christian Brose, titled [[https://www.foreignaffairs.com/articles/2019-04-16/new-revolution-military-affairs]["The New Revolution in Military Affairs"]].
+
+What struck me while reading the article is how much of an analogy can
+be drawn between what is happening to businesses worldwide, and what the
+author writes about the future in military technology and its trailing
+adoption in the United States of America's military.
+
+The transformation he describes is about the core process concerning
+militaries, the so called "kill chain". Thanks to technological
+advances, including artificial intelligence, that process can be
+rapdidly accelerated, offering a competitive advantage to the owner of
+the technology.
+
+Following quotes struck me in particular:
+
+#+begin_quote
+Instead of thinking systematically about buying faster, more effective
+kill chains that could be built now, Washington poured money into newer
+versions of old military platforms and prayed for technological miracles
+to come.
+
+#+end_quote
+
+#+begin_quote
+The question, accordingly, is not how new technologies can improve the
+U.S. military's ability to do what it already does but how they can
+enable it to operate in new ways.
+
+#+end_quote
+
+#+begin_quote
+A military made up of small numbers of large, expensive, heavily manned,
+and hard-to-replace systems will not survive on future battlefields,
+where swarms of intelligent machines will deliver violence at a greater
+volume and higher velocity than ever before. Success will require a
+different kind of military, one built around large numbers of small,
+inexpensive, expendable, and highly autonomous systems.
+
+#+end_quote
+
+The same could be written about so many companies that haven't taken up
+the strategy of competing on analytics.
+
+Replacing the U.S. military with banking sector for instance, formerly
+very profitable and seemingly unbeatable big banks have over the past
+decade found their banking software to be too rigid. Instead of
+investing in new products and services, they continued to rely on what
+they had been doing for the prior hundred years. They invested in
+upgrading their core systems, often with little payoff. While they were
+doing that, small fintech firms appeared, excelling at just a small
+fraction of what a bank considered its playing field. In those areas,
+these new players innovated much more quickly, resulting in far more
+efficient and effective service delivery.
+
+At the core of many of these innovations lies data. The author likes
+China's stockpiling of data as to that of oil, but the following quote
+was particularly relevant in how it describes the use of that stockpile
+of data to inform decisioning.
+
+#+begin_quote
+Every autonomous system will be able to process and make sense of the
+information it gathers on its own, without relying on a command hub.
+
+#+end_quote
+
+The analogy is clear - for years, organisations have been trying to
+ensure they knew the "single source of truth". Tightly coupling all
+business functions to a central ERP system was usually the answer. Just
+like in the military, it can now often be better to have many small
+functions be performed on the perifery of a company's systems, accepting
+some duplication of data and directional accuracy to deliver quicker,
+more cost-effective results - using expendable solutions. The challenges
+to communicate effectively between these semi-autonomous systems are
+noted.
+
+Not insignificantly, the author poses "future militaries will be
+distinguished by the quality of their software, especially their
+artificial intelligence" - i.e. countries are competing on analytics,
+also in the military sphere.
+
+The article ends with some advise to government leadership - make the
+transormation a priority, drive the change forward, recast cultures and
+ensure correct incentives are in place.
diff --git a/posts/explore-AU-road-fatalities.mdwn b/posts/explore-AU-road-fatalities.mdwn

new file mode 100644 (file)

index 0000000..c0b7265
--- /dev/null
+++ b/posts/explore-AU-road-fatalities.mdwn
@@ -0,0 +1,113 @@
+[[!meta date="2017-10-10 16:56:56 +0800"]]
+[[!tag R analysis]]
+
+Road fatalities in Australia
+----------------------------
+
+Recently inspired to doing a little analysis again, I landed on a
+dataset from
+<https://bitre.gov.au/statistics/safety/fatal_road_crash_database.aspx>,
+which I downloaded on 5 Oct 2017. Having open datasets for data is a
+great example of how governments are moving with the times!
+
+Trends
+------
+
+I started by looking at the trends - what is the approximate number of
+road fatalities a year, and how is it evolving over time? Are there any
+differences noticeable between states? Or by gender?
+
+[[Overall trend line|/pics/explore-AU-road-fatalities_files/fatalitiesTrends-1.png]][[Trend lines by Australian state|/pics/explore-AU-road-fatalities_files/fatalitiesTrends-2.png]][[Trend lines by gender|/pics/explore-AU-road-fatalities_files/fatalitiesTrends-3.png]]
+
+What age group is most at risk in city traffic?
+-----------------------------------------------
+
+Next, I wondered if there were any particular ages that were more at
+risk in city traffic. I opted to quickly bin the data to produce a
+histogram.
+
+    fatalities %>%
+      filter(Year != 2017, Speed_Limit <= 50) %>%
+      ggplot(aes(x=Age))+
+      geom_histogram(binwidth = 5) +
+      labs(title = "Australian road fatalities by age group",
+           y = "Fatalities") +
+      theme_economist()
+
+    ## Warning: Removed 2 rows containing non-finite values (stat_bin).
+
+[[histogram|/pics/explore-AU-road-fatalities_files/fatalities.cityTraffic-1.png]]
+
+Hypothesis
+----------
+
+Based on the above, I wondered - are people above 65 more likely to die
+in slow traffic areas? To make this a bit easier, I added two variables
+to the dataset - one splitting people in younger and older than 65, and
+one based on the speed limit in the area of the crash being under or
+above 50 km per hour - city traffic or faster in Australia.
+
+    fatalities.pensioners <- fatalities %>%
+      filter(Speed_Limit <= 110) %>% # less than 2% has this - determine why
+      mutate(Pensioner = if_else(Age >= 65, TRUE, FALSE)) %>%
+      mutate(Slow_Traffic = ifelse(Speed_Limit <= 50, TRUE, FALSE)) %>%
+      filter(!is.na(Pensioner))
+
+To answer the question, I produce a density plot and a boxplot.
+
+[[density plot|/pics/explore-AU-road-fatalities_files/fatalitiesSegmentation-1.png]][[box plot|/pics/explore-AU-road-fatalities_files/fatalitiesSegmentation-2.png]]
+
+Some further statistical analysis does confirm the hypothesis!
+
+    # Build a contingency table and perform prop test
+    cont.table <- table(select(fatalities.pensioners, Slow_Traffic, Pensioner))
+    cont.table
+
+    ##             Pensioner
+    ## Slow_Traffic FALSE  TRUE
+    ##        FALSE 36706  7245
+    ##        TRUE   1985   690
+
+    prop.test(cont.table)
+
+    ## 
+    ##  2-sample test for equality of proportions with continuity
+    ##  correction
+    ## 
+    ## data:  cont.table
+    ## X-squared = 154.11, df = 1, p-value < 2.2e-16
+    ## alternative hypothesis: two.sided
+    ## 95 percent confidence interval:
+    ##  0.07596463 0.11023789
+    ## sample estimates:
+    ##    prop 1    prop 2 
+    ## 0.8351573 0.7420561
+
+    # Alternative approach to using prop test
+    pensioners <- c(nrow(filter(fatalities.pensioners, Slow_Traffic == TRUE, Pensioner == TRUE)), nrow(filter(fatalities.pensioners, Slow_Traffic == FALSE, Pensioner == TRUE)))
+    everyone <- c(nrow(filter(fatalities.pensioners, Slow_Traffic == TRUE)), nrow(filter(fatalities.pensioners, Slow_Traffic == FALSE)))
+    prop.test(pensioners,everyone)
+
+    ## 
+    ##  2-sample test for equality of proportions with continuity
+    ##  correction
+    ## 
+    ## data:  pensioners out of everyone
+    ## X-squared = 154.11, df = 1, p-value < 2.2e-16
+    ## alternative hypothesis: two.sided
+    ## 95 percent confidence interval:
+    ##  0.07596463 0.11023789
+    ## sample estimates:
+    ##    prop 1    prop 2 
+    ## 0.2579439 0.1648427
+
+Conclusion
+----------
+
+It's possible to conclude older people are over-represented in the
+fatalities in lower speed zones. Further ideas for investigation are
+understanding the impact of the driving age limit on the fatalities -
+the position in the car of the fatalities (driver or passenger) was not
+yet considered in this quick look at the contents of the dataset.
+
+[[quantile-quantile plot|/pics/explore-AU-road-fatalities_files/fatalitiesDistComp-1.png]]
diff --git a/posts/explore-AU-road-fatalities.org b/posts/explore-AU-road-fatalities.org

new file mode 100644 (file)

index 0000000..29dfb0a
--- /dev/null
+++ b/posts/explore-AU-road-fatalities.org
@@ -0,0 +1,142 @@
+#+date: 2017-10-10 16:56:56 +0800
+#+filetags: R analysis
+#+title: Explore Australian road fatalities.
+
+** Road fatalities in Australia
+:PROPERTIES:
+:CUSTOM_ID: road-fatalities-in-australia
+:END:
+Recently inspired to doing a little analysis again, I landed on a
+dataset from
+[[https://bitre.gov.au/statistics/safety/fatal_road_crash_database.aspx]],
+which I downloaded on 5 Oct 2017. Having open datasets for data is a
+great example of how governments are moving with the times!
+
+** Trends
+:PROPERTIES:
+:CUSTOM_ID: trends
+:END:
+I started by looking at the trends - what is the approximate number of
+road fatalities a year, and how is it evolving over time? Are there any
+differences noticeable between states? Or by gender?
+
+#+CAPTION: Overall trendline
+#+ATTR_HTML: :class img-fluid :alt Overall trendline
+[[file:assets/explore-AU-road-fatalities_files/fatalitiesTrends-1.png]]
+#+CAPTION: Trendlines by Australian state
+#+ATTR_HTML: :class img-fluid :alt Trendline by Australian state
+[[file:assets/explore-AU-road-fatalities_files/fatalitiesTrends-2.png]]
+#+CAPTION: Trendlines by gender
+#+ATTR_HTML: :class img-fluid :alt Trendlines by gender
+[[file:assets/explore-AU-road-fatalities_files/fatalitiesTrends-3.png]]
+
+** What age group is most at risk in city traffic?
+:PROPERTIES:
+:CUSTOM_ID: what-age-group-is-most-at-risk-in-city-traffic
+:END:
+Next, I wondered if there were any particular ages that were more at
+risk in city traffic. I opted to quickly bin the data to produce a
+histogram.
+
+#+begin_example
+fatalities %>%
+  filter(Year != 2017, Speed_Limit <= 50) %>%
+  ggplot(aes(x=Age))+
+  geom_histogram(binwidth = 5) +
+  labs(title = "Australian road fatalities by age group",
+       y = "Fatalities") +
+  theme_economist()
+
+## Warning: Removed 2 rows containing non-finite values (stat_bin).
+#+end_example
+
+#+CAPTION: histogram
+#+ATTR_HTLM: :class img-fluid :alt histogram
+[[file:assets/explore-AU-road-fatalities_files/fatalities.cityTraffic-1.png]]
+
+** Hypothesis
+:PROPERTIES:
+:CUSTOM_ID: hypothesis
+:END:
+Based on the above, I wondered - are people above 65 more likely to die
+in slow traffic areas? To make this a bit easier, I added two variables
+to the dataset - one splitting people in younger and older than 65, and
+one based on the speed limit in the area of the crash being under or
+above 50 km per hour - city traffic or faster in Australia.
+
+#+begin_example
+fatalities.pensioners <- fatalities %>%
+  filter(Speed_Limit <= 110) %>% # less than 2% has this - determine why
+  mutate(Pensioner = if_else(Age >= 65, TRUE, FALSE)) %>%
+  mutate(Slow_Traffic = ifelse(Speed_Limit <= 50, TRUE, FALSE)) %>%
+  filter(!is.na(Pensioner))
+#+end_example
+
+To answer the question, I produce a density plot and a boxplot.
+
+#+CAPTION: densityplot
+#+ATTR_HTML: :class img-fluid :alt densityplot
+[[file:assets/explore-AU-road-fatalities_files/fatalitiesSegmentation-1.png]]
+#+CAPTION: boxplot
+#+ATTR_HTML: :class img-fluid :alt boxplot
+[[file:assets/explore-AU-road-fatalities_files/fatalitiesSegmentation-2.png]]
+
+Some further statistical analysis does confirm the hypothesis!
+
+#+begin_example
+# Build a contingency table and perform prop test
+cont.table <- table(select(fatalities.pensioners, Slow_Traffic, Pensioner))
+cont.table
+
+##             Pensioner
+## Slow_Traffic FALSE  TRUE
+##        FALSE 36706  7245
+##        TRUE   1985   690
+
+prop.test(cont.table)
+
+## 
+##  2-sample test for equality of proportions with continuity
+##  correction
+## 
+## data:  cont.table
+## X-squared = 154.11, df = 1, p-value < 2.2e-16
+## alternative hypothesis: two.sided
+## 95 percent confidence interval:
+##  0.07596463 0.11023789
+## sample estimates:
+##    prop 1    prop 2 
+## 0.8351573 0.7420561
+
+# Alternative approach to using prop test
+pensioners <- c(nrow(filter(fatalities.pensioners, Slow_Traffic == TRUE, Pensioner == TRUE)), nrow(filter(fatalities.pensioners, Slow_Traffic == FALSE, Pensioner == TRUE)))
+everyone <- c(nrow(filter(fatalities.pensioners, Slow_Traffic == TRUE)), nrow(filter(fatalities.pensioners, Slow_Traffic == FALSE)))
+prop.test(pensioners,everyone)
+
+## 
+##  2-sample test for equality of proportions with continuity
+##  correction
+## 
+## data:  pensioners out of everyone
+## X-squared = 154.11, df = 1, p-value < 2.2e-16
+## alternative hypothesis: two.sided
+## 95 percent confidence interval:
+##  0.07596463 0.11023789
+## sample estimates:
+##    prop 1    prop 2 
+## 0.2579439 0.1648427
+#+end_example
+
+** Conclusion
+:PROPERTIES:
+:CUSTOM_ID: conclusion
+:END:
+It's possible to conclude older people are over-represented in the
+fatalities in lower speed zones. Further ideas for investigation are
+understanding the impact of the driving age limit on the fatalities -
+the position in the car of the fatalities (driver or passenger) was not
+yet considered in this quick look at the contents of the dataset.
+
+#+CAPTION: quantile-quantile plot
+#+ATTR_HTML: :class img-fluid :alt quantile-quantile plot
+[[file:assets/explore-AU-road-fatalities_files/fatalitiesDistComp-1.png]]
diff --git a/posts/facet_labels_in_R.mdwn b/posts/facet_labels_in_R.mdwn

new file mode 100644 (file)

index 0000000..8f29a49
--- /dev/null
+++ b/posts/facet_labels_in_R.mdwn
@@ -0,0 +1,23 @@
+[[!meta date="2016-10-05 21:48:11 +0800"]]
+
+[[!tag R graph code]]
+Getting used to the grammar of ggplot2 takes some time, but so far it's not been disappointing. Wanting to split a scatterplot by segment, I used `facet_grid`. That by default shows a label on each subplot, using the values in the variable by which the plot is faceted.
+
+As that often isn't very descriptive in itself, there needs to be a way to re-label these subplots. That way is `as_labeller`, as shown in the example code below.
+
+Example:
+[[!format r """
+ggplot(outputs, aes(x=date_var,y=value_var),alpha=0.8) +
+  geom_point(aes(y=value_var, colour=colour_var)) +
+  geom_smooth() +
+  theme(legend.position = "none",axis.text.y = element_blank(),axis.text.x = element_blank()) +
+  scale_x_date(date_breaks = '1 week') +
+  labs(y = "Value",x = "Date", title = "Example") +
+  scale_colour_manual("Legend",values=named_coloring_vector)) +
+  scale_fill_manual("",values=c("grey12")) +
+  facet_grid(. ~ Segment, labeller = as_labeller(c("yes" = "Segment A",
+                                                          "no" = "Segment B")))
+"""]]
+
+Output:
+[[!img pics/2016-10-05_R-facet.png size="200x200" alt="Example plot with 2 facets labelled Segment B and Segment A"]]
diff --git a/posts/facet_labels_in_R.org b/posts/facet_labels_in_R.org

new file mode 100644 (file)

index 0000000..f096158
--- /dev/null
+++ b/posts/facet_labels_in_R.org
@@ -0,0 +1,33 @@
+#+date: 2016-10-05 21:48:11 +0800
+#+title: Facet labels in R
+#+filetags: R graph code
+
+Getting used to the grammar of ggplot2 takes some time, but so far it's not been disappointing. Wanting to split a
+scatterplot by segment, I used =facet_grid=. That by default shows a
+label on each subplot, using the values in the variable by which the
+plot is faceted.
+
+As that often isn't very descriptive in itself, there needs to be a way
+to re-label these subplots. That way is =as_labeller=, as shown in the
+example code below.
+
+Example:
+#+BEGIN_SRC R
+ggplot(outputs, aes(x = date_var,y = value_var), alpha = 0.8) + 
+  geom_point(aes(y = value_var, colour = colour_var)) + 
+  geom_smooth() + 
+  theme(legend.position = "none",
+        axis.text.y = element_blank(),
+       axis.text.x = element_blank()) +
+  scale_x_date(date_breaks = '1 week') + 
+  labs(y = "Value", x = "Date",
+  title = "Example") +
+  scale_colour_manual("Legend", values = named_coloring_vector)) +
+  scale_fill_manual(“", values = c("grey12”)) + facet_grid(. ~ Segment,
+  labeller = as_labeller(c("yes" = "Segment A", "no" = "Segment B")))
+#+END_SRC
+
+Output: 
+#+CAPTION: Example plot with 2 facets labelled Segment B and Segment A.
+#+ATTR_HTML: :class img-fluid :alt Example plot with 2 facets labelled Segment B and Segment A.
+[[file:assets/2016-10-05_R-facet.png]]
diff --git a/posts/fertile-summers.mdwn b/posts/fertile-summers.mdwn

new file mode 100644 (file)

index 0000000..fb217be
--- /dev/null
+++ b/posts/fertile-summers.mdwn
@@ -0,0 +1,9 @@
+[[!meta date="2017-10-26 13:06:27 +0800"]]
+[[!tag R analysis visualisation births Australia]]
+In the Northern hemisphere, it's commonly said women prefer to give birth around summer. It would appear this does not hold for Australia. The graph below actually suggests most babies are conceived over the summer months (December to February) down under!
+
+[[seasonal subseries plot Australian births by month 1996-2014|/pics/au-births-seasonal-subseries-plot.png]]
+
+In preparing the graph above (a "seasonal subseries plot"), I could not help but notice the spike in the numbers for each month around 2005. It turns out that was real - Australia did experience a temporary increase in its fertility rate. Whether that was thanks to government policy (baby bonus, tax subsidies) or other causes is not known.
+
+[Full R code is on my git server.](http://git.vanrenterghem.biz/?p=R/project-au-births.git;a=summary) Check it out - there are a few more plots in there already. I might write about these later.
diff --git a/posts/fertile-summers.org b/posts/fertile-summers.org

new file mode 100644 (file)

index 0000000..223ebeb
--- /dev/null
+++ b/posts/fertile-summers.org
@@ -0,0 +1,22 @@
+#+date: 2017-10-26 13:06:27 +0800
+#+filetags: R analysis visualisation births Australia
+#+title: Fertile summers.
+
+In the Northern hemisphere, it's commonly said women prefer to give birth around summer. It would appear
+this does not hold for Australia. The graph below actually suggests most
+babies are conceived over the summer months (December to February) down
+under!
+
+#+CAPTION: seasonal subseries plot Australian births by month 1996-2014
+#+ATTR_HTML: :class img-fluid :alt seasonal subseries plot Australian births by month 1996-2014
+[[file:assets/au-births-seasonal-subseries-plot.png]]
+
+In preparing the graph above (a "seasonal subseries plot"), I could not
+help but notice the spike in the numbers for each month around 2005. It
+turns out that was real - Australia did experience a temporary increase
+in its fertility rate. Whether that was thanks to government policy
+(baby bonus, tax subsidies) or other causes is not known.
+
+[[http://git.vanrenterghem.biz/?p=R/project-au-births.git;a=summary][Full
+R code is on my git server.]] Check it out - there are a few more plots
+in there already. I might write about these later.
diff --git a/posts/first_r_package.mdwn b/posts/first_r_package.mdwn

new file mode 100644 (file)

index 0000000..4665e44
--- /dev/null
+++ b/posts/first_r_package.mdwn
@@ -0,0 +1,15 @@
+[[!meta date="2016-05-18 21:09:02 +0800"]]
+
+Continuing a [long tradition](https://web.archive.org/web/20041111125713/http://vanrenterghem.biz/News/index.php?begin=25) with announcing *firsts*, I wrote an [R package](http://git.vanrenterghem.biz/?p=R/operatingdays.git;a=summary) recently, and made it available on [Projects](http://git.vanrenterghem.biz/), a new section of the website. (Talking about my website on said website is also not exactly new.)
+
+It still needs further work, as it really only supports AU public holidays for now, but it's exciting to be able to use freely available public data to make analysis a bit easier. The ability to know how many working days are left in the month is fundamental in forecasting a month-end result. Extensions of the package could include simple checks like if today's date is a working day, or more complex ones like the number of working days between given dates for instance.
+
+In other news, I received [a new gadget](http://www.rtl-sdr.com) in the mail today.
+
+> RTL-SDR is a very cheap software defined radio that 
+> uses a DVB-T TV tuner dongle based on the RTL2832U 
+> chipset. With the combined efforts of  Antti Palosaari
+> , Eric Fry and Osmocom it was found that the signal 
+> I/Q data could be accessed directly, which allowed the 
+> DVB-T TV tuner to be converted into a wideband software 
+> defined radio via a new software driver.
diff --git a/posts/first_r_package.org b/posts/first_r_package.org

new file mode 100644 (file)

index 0000000..345e336
--- /dev/null
+++ b/posts/first_r_package.org
@@ -0,0 +1,28 @@
+#+date: 2016-05-18 21:09:02 +0800
+#+title: First R package.
+#+filetags: R code
+
+Continuing a [[https://web.archive.org/web/20041111125713/http://vanrenterghem.biz/News/index.php?begin=25][long tradition]] with announcing /firsts/, I wrote an
+[[http://git.vanrenterghem.biz/?p=R/operatingdays.git;a=summary][R package]] recently, and made it available on
+[[http://git.vanrenterghem.biz/][Projects]], a new section of the website. (Talking about my website on said website is also not exactly new.)
+
+It still needs further work, as it really only supports AU public
+holidays for now, but it's exciting to be able to use freely available
+public data to make analysis a bit easier. The ability to know how many
+working days are left in the month is fundamental in forecasting a
+month-end result. Extensions of the package could include simple checks
+like if today's date is a working day, or more complex ones like the
+number of working days between given dates for instance.
+
+In other news, I received [[http://www.rtl-sdr.com][a new gadget]] in
+the mail today.
+
+#+begin_quote
+RTL-SDR is a very cheap software defined radio that uses a DVB-T TV
+tuner dongle based on the RTL2832U chipset. With the combined efforts of
+Antti Palosaari , Eric Fry and Osmocom it was found that the signal I/Q
+data could be accessed directly, which allowed the DVB-T TV tuner to be
+converted into a wideband software defined radio via a new software
+driver.
+
+#+end_quote
diff --git a/posts/fun_with_RJDBC_and_RODBC.mdwn b/posts/fun_with_RJDBC_and_RODBC.mdwn

new file mode 100644 (file)

index 0000000..65e3005
--- /dev/null
+++ b/posts/fun_with_RJDBC_and_RODBC.mdwn
@@ -0,0 +1,17 @@
+[[!meta date="2016-06-24 14:17:16 +0800"]]
+
+I have learned the hard way it is important to be aware that
+
+> Type-handling is a rather complex issue, especially with JDBC as different databases support different data types. RJDBC attempts to simplify this issue by internally converting all data types to either character or numeric values. 
+
+[Source](https://www.rforge.net/RJDBC/)
+
+This because RODBC does not have the same behaviour. 
+
+When switching a few R scripts over from using RJDBC to access a MS SQL Server database to RODBC, I ran into some odd problems.
+
+First, I noticed `as.Date(query,output$datecolumn)` resulted in what looked like 2016-06-21 becoming 2016-06-22. That's right, R started adding a day to the date. `as.Date(strptime(query.output$datecolumn, "%Y-%m-%d"))` put a stop to that madness.
+
+Another problem had to do with an XML value being returned by a query. The application generating that XML for some reason opts to not store it as an XML data type but instead uses a varchar. That makes it is very hard to use XQuery, so I had opted to do the hard work in R by taking the whole XML value into R - despite this making the retrieval of query results almost impossible. In order to convert that column to an XML data type in R, I was able to do `sapply(response.xml$response, xmlParse)` on the output of a SQL query using RJDBC. Once the output from the RODBC connection had to be processed, this needed to become `sapply(response.xml$response, xmlParse, asText = TRUE)`. It is interesting this wasn't needed for the RJDBC output.
+
+So yes, type-handling is a rather complex issue.
diff --git a/posts/fun_with_RJDBC_and_RODBC.org b/posts/fun_with_RJDBC_and_RODBC.org

new file mode 100644 (file)

index 0000000..a028eb7
--- /dev/null
+++ b/posts/fun_with_RJDBC_and_RODBC.org
@@ -0,0 +1,41 @@
+#+date: 2016-06-24 14:17:16 +0800
+#+title: Fun with RJDBC and RODBC.
+#+filetags: R code
+
+I have learned the hard way it is important to be aware that
+
+#+begin_quote
+Type-handling is a rather complex issue, especially with JDBC as
+different databases support different data types. RJDBC attempts to
+simplify this issue by internally converting all data types to either
+character or numeric values.
+
+#+end_quote
+
+[[https://www.rforge.net/RJDBC/][Source]]
+
+This because RODBC does not have the same behaviour.
+
+When switching a few R scripts over from using RJDBC to access a MS SQL
+Server database to RODBC, I ran into some odd problems.
+
+First, I noticed =as.Date(query,output$datecolumn)= resulted in what
+looked like 2016-06-21 becoming 2016-06-22. That's right, R started
+adding a day to the date.
+=as.Date(strptime(query.output$datecolumn, "%Y-%m-%d"))= put a stop to
+that madness.
+
+Another problem had to do with an XML value being returned by a query.
+The application generating that XML for some reason opts to not store it
+as an XML data type but instead uses a varchar. That makes it is very
+hard to use XQuery, so I had opted to do the hard work in R by taking
+the whole XML value into R - despite this making the retrieval of query
+results almost impossible. In order to convert that column to an XML
+data type in R, I was able to do
+=sapply(response.xml$response, xmlParse)= on the output of a SQL query
+using RJDBC. Once the output from the RODBC connection had to be
+processed, this needed to become
+=sapply(response.xml$response, xmlParse, asText = TRUE)=. It is
+interesting this wasn't needed for the RJDBC output.
+
+So yes, type-handling is a rather complex issue.
diff --git a/posts/generating_album_art_on_N9.mdwn b/posts/generating_album_art_on_N9.mdwn

new file mode 100644 (file)

index 0000000..2663c4d
--- /dev/null
+++ b/posts/generating_album_art_on_N9.mdwn
@@ -0,0 +1,17 @@
+[[!meta date="2016-10-03 21:33:50 +0800"]]
+
+For unknown reasons, the Music application on my Nokia N9 does not always display the album cover where expected. Instead, it displays the artist name and album title. Reports by other users of this phone suggest this isn't an uncommon issue, but offer no *confirmed* insight in the root cause of the problem unfortunately.
+
+Fortunately, the symptoms of this problem are relatively easy to fix on a one-by-one basis.
+
+In `~/.cache/media-art` on the phone, copy the album art (in a JPEG file) to a file named using the following format:
+
+```
+album-$(echo -n "artist name" | md5sum | cut -d ' ' -f 1)-$(echo -n "album name" | md5sum | cut -d ' ' -f 1).jpeg 
+```
+
+Replace `artist name` and `album name` with the appropriate values for the album, in small caps (lowercase).
+
+This follows the [Media Art Storage Spec](https://wiki.gnome.org/action/show/DraftSpecs/MediaArtStorageSpec?action=show&redirect=MediaArtStorageSpec)
+
+Luckily, in most cases the above is not necessary and it suffices to store the cover picture as `cover.jpg` in the album's directory in `~/MyDocs/Music`.
diff --git a/posts/generating_album_art_on_N9.org b/posts/generating_album_art_on_N9.org

new file mode 100644 (file)

index 0000000..151f6bd
--- /dev/null
+++ b/posts/generating_album_art_on_N9.org
@@ -0,0 +1,29 @@
+#+date: 2016-10-03 21:33:50 +0800
+#+title: Generating album art on N9.
+
+For unknown reasons, the Music application on my Nokia N9 does not
+always display the album cover where expected. Instead, it displays the
+artist name and album title. Reports by other users of this phone
+suggest this isn't an uncommon issue, but offer no /confirmed/ insight
+in the root cause of the problem unfortunately.
+
+Fortunately, the symptoms of this problem are relatively easy to fix on
+a one-by-one basis.
+
+In =~/.cache/media-art= on the phone, copy the album art (in a JPEG
+file) to a file named using the following format:
+
+#+begin_example
+album-$(echo -n "artist name" | md5sum | cut -d ' ' -f 1)-$(echo -n "album name" | md5sum | cut -d ' ' -f 1).jpeg 
+#+end_example
+
+Replace =artist name= and =album name= with the appropriate values for
+the album, in small caps (lowercase).
+
+This follows the
+[[https://wiki.gnome.org/action/show/DraftSpecs/MediaArtStorageSpec?action=show&redirect=MediaArtStorageSpec][Media
+Art Storage Spec]]
+
+Luckily, in most cases the above is not necessary and it suffices to
+store the cover picture as =cover.jpg= in the album's directory in
+=~/MyDocs/Music=.
diff --git a/posts/house_price_evolution.mdwn b/posts/house_price_evolution.mdwn

new file mode 100644 (file)

index 0000000..33ada7b
--- /dev/null
+++ b/posts/house_price_evolution.mdwn
@@ -0,0 +1,20 @@
+[[!meta date="2023-09-02 16:01:16+08:00"]]
+[[!opengraph2 ogimage="https://www.vanrenterghem.biz/blog/pics/house-price-evolution-table.png"]]
+[[!tag economics Australia Belgium]]
+
+Back in May 2022, [[I made a bet|monetary policy and mortgage products]] Australian house prices would decline relative to Belgian ones, and the Australian cash rate wouldn't grow as high as the Euro-zone one. On that day, the RBA had lifted the Australian cash rate from the historical low of 0.10% to 0.35%. Today, that rate stands at 4.10%, with the latest increase in a series of 12 having happened at the start of June 2023 - an increase of 4%pt.
+
+The European central bank in the mean time has raised its main refinancing operations rate 9 times since July 2022 from 0% to 4.25% at the start of August 2023, an increase of 4.25%pt.
+
+Both the Australian and the Belgian government have statistical offices publishing median house prices. The way these are tracked varies slightly between the countries. In Belgium, Statbel publishes median prices for 3 types of residential dwellings every quarter, and the corresponding number of transactions that happened for each of these. The Australian Bureau of Statistics on the other hand publishes a median residential dwelling price every month, which is based on a stratification by dwellings type taken from the census which happens every 4 years.
+
+[[!img /pics/house-price-evolution-table.png alt="Table with evolution of house prices" class="img-fluid"]]
+
+Of course, the AUD/EUR exchange rate needs to be taken into account as well. I've adjusted the prices using the weighted average monthly exchange rate. This way, we can compare the price evolution in a way that takes into account the evolving difference in purchasing power between the currencies of the 2 nations.
+
+[[!img /pics/house-price-evolution-plot.png alt="Plot with evolution of house prices in EUR" class="img-fluid"]]
+
+Comparing the first 3 months of 2021 to the first 3 months of 2023, the relative price of an Australian residential dwelling has gone to 92% of what it was when compared to its Belgian equivalent. If the starting point is Q1 2022, just before the rates started going up, the difference is an even starker 16% relative decline in price!
+
+So far, both bets seem to have been correct - house prices in Australia have significantly gone down relative to the Belgian ones since the interest rate hikes started, and the cash rate in Europe, which started slightly lower than the one in Australia, has already surpassed it.
+
diff --git a/posts/house_price_evolution.org b/posts/house_price_evolution.org

new file mode 100644 (file)

index 0000000..5d2b0a0
--- /dev/null
+++ b/posts/house_price_evolution.org
@@ -0,0 +1,51 @@
+#+date: 2023-09-02 16:01:16+08:00
+#+opengraph2: ogimage="https://www.vanrenterghem.biz/blog/pics/house-price-evolution-table.png"
+#+filetags: economics Australia Belgium
+#+title: House price evolution.
+
+Back in May 2022, [[file:monetary policy and mortgage products][I made a bet]]
+Australian house prices would decline relative to Belgian ones, and the
+Australian cash rate wouldn't grow as high as the Euro-zone one. On that
+day, the RBA had lifted the Australian cash rate from the historical low
+of 0.10% to 0.35%. Today, that rate stands at 4.10%, with the latest
+increase in a series of 12 having happened at the start of June 2023 -
+an increase of 4%pt.
+
+The European central bank in the mean time has raised its main
+refinancing operations rate 9 times since July 2022 from 0% to 4.25% at
+the start of August 2023, an increase of 4.25%pt.
+
+Both the Australian and the Belgian government have statistical offices
+publishing median house prices. The way these are tracked varies
+slightly between the countries. In Belgium, Statbel publishes median
+prices for 3 types of residential dwellings every quarter, and the
+corresponding number of transactions that happened for each of these.
+The Australian Bureau of Statistics on the other hand publishes a median
+residential dwelling price every month, which is based on a
+stratification by dwellings type taken from the census which happens
+every 4 years.
+
+#+CAPTION: Table with evolution of house prices
+#+ATTR_HTML: :class img-fluid :alt Table with evolution of house prices
+[[file:assets/house-price-evolution-table.png]]
+
+Of course, the AUD/EUR exchange rate needs to be taken into account as
+well. I've adjusted the prices using the weighted average monthly
+exchange rate. This way, we can compare the price evolution in a way
+that takes into account the evolving difference in purchasing power
+between the currencies of the 2 nations.
+
+#+CAPTION: Plot with evolution of house prices in EUR
+#+ATTR_HTML: :class img-fluid :alt Plot with evolution of house prices in EUR
+[[file:assets/house-price-evolution-plot.png]]
+
+Comparing the first 3 months of 2021 to the first 3 months of 2023, the
+relative price of an Australian residential dwelling has gone to 92% of
+what it was when compared to its Belgian equivalent. If the starting
+point is Q1 2022, just before the rates started going up, the difference
+is an even starker 16% relative decline in price!
+
+So far, both bets seem to have been correct - house prices in Australia
+have significantly gone down relative to the Belgian ones since the
+interest rate hikes started, and the cash rate in Europe, which started
+slightly lower than the one in Australia, has already surpassed it.
diff --git a/posts/loans_and_semi_Markov_chains.mdwn b/posts/loans_and_semi_Markov_chains.mdwn

new file mode 100644 (file)

index 0000000..2652618
--- /dev/null
+++ b/posts/loans_and_semi_Markov_chains.mdwn
@@ -0,0 +1,21 @@
+[[!meta date="2017-02-17 20:07:10 +0800"]]
+[[!tag finance fintech analysis forecasting Markov]]
+Using Markov chains' transition matrices to model the movement of loans from being opened (in a state of "Current") to getting closed can misinform the user at times.
+
+To illustrate the challenge, the graph below plots the evolution, from the original state to the final state, of a group of loans over 6 periods of time.
+
+[[!img pics/rollRateBeware.png size="400x247" alt="Actual vs predicted loan vintage performance."]]
+
+The solid lines are the result of applying an average transition matrix 6 times (the model's predicted outcome). The dashed lines are the actual observed results for a set of loans.
+
+As can be seen, the model does not do a very good job at predicting the accounts that will end up in state "Closed" in each period. They end up in a different state between Current and Closed (i.e. overdue) at a higher than expected rate. Why is that?
+
+The prediction was built using an average of the transition matrix of a number of consecutive period statetables for a book of loans. That book was not homogenic though. Most obviously, the "Current" accounts were not of the same vintage - some had been in that state for a number of periods before already. The observed set of loans all originated in the same period. Other differences can be related to client demographics, loan characteristics or macro-economic circumstances.
+
+Applying a transition matrix based on a group of loans of various vintages to a group of loans that all were new entrants in the book violates the often implied Markov chain assumption of time-homogenity.
+
+What that assumption says is that the future state is independent of the past state.
+
+Loans typically have a varying chance of becoming delinquent in function of how long they have been open already.
+
+Multi-order Markov chains are those that depend on a number (the order) of states in the past. The question becomes - what order is the Markov chain? Otherwise put, how many previous periods need to be taken into account to be able to accurately estimate the next period's statetable? Controlling for the other differences suggested above, if found to be material, may be important as well.
diff --git a/posts/loans_and_semi_Markov_chains.org b/posts/loans_and_semi_Markov_chains.org

new file mode 100644 (file)

index 0000000..08d801d
--- /dev/null
+++ b/posts/loans_and_semi_Markov_chains.org
@@ -0,0 +1,48 @@
+#+date: 2017-02-17 20:07:10 +0800
+#+filetags: finance fintech analysis forecasting Markov
+#+title: Loans and semi-Markov chains
+
+Using Markov chains' transition matrices to model the movement of loans from being opened (in a state of
+"Current") to getting closed can misinform the user at times.
+
+To illustrate the challenge, the graph below plots the evolution, from
+the original state to the final state, of a group of loans over 6
+periods of time.
+
+#+CAPTION: Actual vs predicted loan vintage performance.
+#+ATTR_HTML: :width 400 :class img-fluid :alt Actual vs predicted loan vintage performance.
+[[file:assets/rollRateBeware.png]]
+
+The solid lines are the result of applying an average transition matrix
+6 times (the model's predicted outcome). The dashed lines are the actual
+observed results for a set of loans.
+
+As can be seen, the model does not do a very good job at predicting the
+accounts that will end up in state "Closed" in each period. They end up
+in a different state between Current and Closed (i.e. overdue) at a
+higher than expected rate. Why is that?
+
+The prediction was built using an average of the transition matrix of a
+number of consecutive period statetables for a book of loans. That book
+was not homogenic though. Most obviously, the "Current" accounts were
+not of the same vintage - some had been in that state for a number of
+periods before already. The observed set of loans all originated in the
+same period. Other differences can be related to client demographics,
+loan characteristics or macro-economic circumstances.
+
+Applying a transition matrix based on a group of loans of various
+vintages to a group of loans that all were new entrants in the book
+violates the often implied Markov chain assumption of time-homogenity.
+
+What that assumption says is that the future state is independent of the
+past state.
+
+Loans typically have a varying chance of becoming delinquent in function
+of how long they have been open already.
+
+Multi-order Markov chains are those that depend on a number (the order)
+of states in the past. The question becomes - what order is the Markov
+chain? Otherwise put, how many previous periods need to be taken into
+account to be able to accurately estimate the next period's statetable?
+Controlling for the other differences suggested above, if found to be
+material, may be important as well.
diff --git a/posts/managing_data_science_work.mdwn b/posts/managing_data_science_work.mdwn

new file mode 100644 (file)

index 0000000..9e7bea2
--- /dev/null
+++ b/posts/managing_data_science_work.mdwn
@@ -0,0 +1,14 @@
+[[!meta date="2018-06-19 20:51:36 +0800"]]
+[[!tag leadership management analytics military]]
+
+It appears to me the cross-industry standard process for data mining (CRISP-DM) is still, almost a quarter century after first having been formulated,  a valuable framework to guide management of a data science team. Start with building business understanding, followed by understanding the data, preparing it, moving from modeling to solve the problem over to evaluating the model and ending by deploying it. The framework is iterative, and allows for back-and-forth between these steps based on what's learned in the later steps. 
+
+[[CRISP-DM|/pics/crisp-dm-diagram.png]]
+
+It  doesn't put too great an emphasis on scheduling the activities, but focuses on the value creation. 
+
+The Observe-Orient-Decide-Act (OODA) loop from John Boyd seems to be an analogue concept. Competing businesses would then be advised to speed up their cycling through the CRISP-DM loop, as that's how Boyd stated advantage is obtained - by cycling through the OODA loops more quickly than ones opponent. Most interestingly, in both loops it's a common pitfall to skip the last step - deploying the model / acting.
+
+[[OODA loop|/pics/OODA-diagram.png]]
+
+([[Image by Patrick Edwin Moran - Own work, CC BY 3.0|https://commons.wikimedia.org/w/index.php?curid=3904554]])
diff --git a/posts/managing_data_science_work.org b/posts/managing_data_science_work.org

new file mode 100644 (file)

index 0000000..c6fd2f1
--- /dev/null
+++ b/posts/managing_data_science_work.org
@@ -0,0 +1,32 @@
+#+date: 2018-06-19 20:51:36 +0800
+#+filetags: leadership management analytics military
+#+title: Managing data science work
+
+It appears to me the cross-industry standard process for data mining
+(CRISP-DM) is still, almost a quarter century after first having been
+formulated, a valuable framework to guide management of a data science
+team. Start with building business understanding, followed by
+understanding the data, preparing it, moving from modeling to solve the
+problem over to evaluating the model and ending by deploying it. The
+framework is iterative, and allows for back-and-forth between these
+steps based on what's learned in the later steps.
+
+#+CAPTION: CRISP-DM
+#+ATTR_HTML: :class img-fluid :alt CRISP-DM
+[[file:assets/crisp-dm-diagram.png]]
+
+It doesn't put too great an emphasis on scheduling the activities, but
+focuses on the value creation.
+
+The Observe-Orient-Decide-Act (OODA) loop from John Boyd seems to be an
+analogue concept. Competing businesses would then be advised to speed up
+their cycling through the CRISP-DM loop, as that's how Boyd stated
+advantage is obtained - by cycling through the OODA loops more quickly
+than ones opponent. Most interestingly, in both loops it's a common
+pitfall to skip the last step - deploying the model / acting.
+
+#+CAPTION: OODA loop ([[Image by Patrick Edwin Moran - Own work, CC BY 3.0|https://commons.wikimedia.org/w/index.php?curid=3904554]])
+#+ATTR_HTML: :class img-fluid :alt OODA loop
+[[file:assets/OODA-diagram.png]]
+
+
diff --git a/posts/monetary_policy_and_mortgage_products.mdwn b/posts/monetary_policy_and_mortgage_products.mdwn

new file mode 100644 (file)

index 0000000..3012e28
--- /dev/null
+++ b/posts/monetary_policy_and_mortgage_products.mdwn
@@ -0,0 +1,11 @@
+[[!meta date="2022-05-09 18:48:00 +0800"]]
+[[!tag economics finance law Belgium Australia]]
+
+In its latest [statement on monetary policy](https://www.rba.gov.au/publications/smp/2022/may/pdf/statement-on-monetary-policy-2022-05.pdf) ([Internet Archive](https://web.archive.org/web/20220506111153/https://www.rba.gov.au/publications/smp/2022/may/pdf/statement-on-monetary-policy-2022-05.pdf)), the Reserve Bank of Australia highlighted that households in Australia have much higher private debt than before. Total private debt is approximately [120% of GDP](https://stats.bis.org/statx/srs/table/f3.1) ([Internet Archive](https://web.archive.org/web/20220505031215/https://stats.bis.org/statx/srs/table/f3.1)), roughly double Belgium's. [Analysts hint this will restrict by how much the central bank will be able to raise the cash rate target](https://www.abc.net.au/news/2022-05-02/rba-will-raise-rates-but-not-to-levels-experts-are-predicting/101029534) ([Internet Archive](https://web.archive.org/web/20220509111508/https://www.abc.net.au/news/2022-05-02/rba-will-raise-rates-but-not-to-levels-experts-are-predicting/101029534)) to battle rising inflation in the near future.
+
+Most mortgages in Australia are contracted on a variable rate, or fixed only for the short term. As such, rising interest rates not only affect people taking out new loans - they also affect the amount all other indebted households need to repay. This contrasts with Belgium for instance, where in the first 3 months of 2022 [93.5% of mortgage contracts had a fixed term over the entire duration of the contract](https://www.febelfin.be/nl/press-room/ook-eerste-trimester-van-2022-record-aan-hypothecaire-kredietverlening) ([Internet Archive](https://web.archive.org/web/20220509115728/https://www.febelfin.be/nl/press-room/ook-eerste-trimester-van-2022-record-aan-hypothecaire-kredietverlening)). Only less than 1% of contracts closed in this quarter has a rate varying periodically. This despite strong consumer protections built into the Code of Economic Law [see art VII.143, § 2 to 6 in the Flemish publication](https://www.ejustice.just.fgov.be/cgi_loi/change_lg.pl?language=nl&la=N&table_name=wet&cn=2016042201) ([Internet Archive](https://web.archive.org/web/20210617021917/https://www.ejustice.just.fgov.be/cgi_loi/change_lg.pl?language=nl&la=N&table_name=wet&cn=2016042201)): few of them vary immediately upon changes in the cash rate as the minimum term between changes is at least a year, and the updated interest rate can never exceed twice the original rate. 
+
+In contrast, Australian mortgages are rarely offered with a fixed interest rate period beyond 2 years - 5 years is practically the most I have seen advertised. Banks can vary the rate at will - they are not mandated to link a change to a variation in the cash rate, nor do they need to stick to that variation. Rates can go as high as the banks want them to go.
+
+This is a remarkable difference in the products banks in these countries make available. I wonder to what extent this will influence how high the cash rate will be allowed to rise before politicians will need to step in, if at all they will, and how it may affect the evolution of houseprices in the countries. My current bet: Australian houseprices will decline relative to Belgian ones, and the cash rate won't grow as much. (At the time of writing, the RBA has a target cash rate of 0.35% while ECB still has a target cash rate of 0.00% though.)
+
diff --git a/posts/monetary_policy_and_mortgage_products.org b/posts/monetary_policy_and_mortgage_products.org

new file mode 100644 (file)

index 0000000..11b7c2e
--- /dev/null
+++ b/posts/monetary_policy_and_mortgage_products.org
@@ -0,0 +1,54 @@
+#+date: 2022-05-09 18:48:00 +0800
+#+filetags: economics finance law Belgium Australia
+#+title: Monetary policy and mortgage products
+
+In its latest
+[[https://www.rba.gov.au/publications/smp/2022/may/pdf/statement-on-monetary-policy-2022-05.pdf][statement
+on monetary policy]]
+([[https://web.archive.org/web/20220506111153/https://www.rba.gov.au/publications/smp/2022/may/pdf/statement-on-monetary-policy-2022-05.pdf][Internet
+Archive]]), the Reserve Bank of Australia highlighted that households in
+Australia have much higher private debt than before. Total private debt
+is approximately [[https://stats.bis.org/statx/srs/table/f3.1][120% of
+GDP]]
+([[https://web.archive.org/web/20220505031215/https://stats.bis.org/statx/srs/table/f3.1][Internet
+Archive]]), roughly double Belgium's.
+[[https://www.abc.net.au/news/2022-05-02/rba-will-raise-rates-but-not-to-levels-experts-are-predicting/101029534][Analysts
+hint this will restrict by how much the central bank will be able to
+raise the cash rate target]]
+([[https://web.archive.org/web/20220509111508/https://www.abc.net.au/news/2022-05-02/rba-will-raise-rates-but-not-to-levels-experts-are-predicting/101029534][Internet
+Archive]]) to battle rising inflation in the near future.
+
+Most mortgages in Australia are contracted on a variable rate, or fixed
+only for the short term. As such, rising interest rates not only affect
+people taking out new loans - they also affect the amount all other
+indebted households need to repay. This contrasts with Belgium for
+instance, where in the first 3 months of 2022
+[[https://www.febelfin.be/nl/press-room/ook-eerste-trimester-van-2022-record-aan-hypothecaire-kredietverlening][93.5%
+of mortgage contracts had a fixed term over the entire duration of the
+contract]]
+([[https://web.archive.org/web/20220509115728/https://www.febelfin.be/nl/press-room/ook-eerste-trimester-van-2022-record-aan-hypothecaire-kredietverlening][Internet
+Archive]]). Only less than 1% of contracts closed in this quarter has a
+rate varying periodically. This despite strong consumer protections
+built into the Code of Economic Law
+[[https://www.ejustice.just.fgov.be/cgi_loi/change_lg.pl?language=nl&la=N&table_name=wet&cn=2016042201][see
+art VII.143, § 2 to 6 in the Flemish publication]]
+([[https://web.archive.org/web/20210617021917/https://www.ejustice.just.fgov.be/cgi_loi/change_lg.pl?language=nl&la=N&table_name=wet&cn=2016042201][Internet
+Archive]]): few of them vary immediately upon changes in the cash rate
+as the minimum term between changes is at least a year, and the updated
+interest rate can never exceed twice the original rate.
+
+In contrast, Australian mortgages are rarely offered with a fixed
+interest rate period beyond 2 years - 5 years is practically the most I
+have seen advertised. Banks can vary the rate at will - they are not
+mandated to link a change to a variation in the cash rate, nor do they
+need to stick to that variation. Rates can go as high as the banks want
+them to go.
+
+This is a remarkable difference in the products banks in these countries
+make available. I wonder to what extent this will influence how high the
+cash rate will be allowed to rise before politicians will need to step
+in, if at all they will, and how it may affect the evolution of
+houseprices in the countries. My current bet: Australian houseprices
+will decline relative to Belgian ones, and the cash rate won't grow as
+much. (At the time of writing, the RBA has a target cash rate of 0.35%
+while ECB still has a target cash rate of 0.00% though.)
diff --git a/posts/my_management_style_mission_control.mdwn b/posts/my_management_style_mission_control.mdwn

new file mode 100644 (file)

index 0000000..8d74311
--- /dev/null
+++ b/posts/my_management_style_mission_control.mdwn
@@ -0,0 +1,19 @@
+[[!meta date="2018-02-14 15:38:33 +0800"]]
+[[!tag management leadership]]
+
+I have been asked a few times recently about my management style. First, while applying for a position myself. Next, less expected, by a member of the org I joined as well as by a candidate I interviewed for a position in the team.
+
+My answer was not very concise, as I lacked the framework knowledge to be so.
+
+Today, I believe to have stumbled on a description of the style I practice (or certainly aim to) most often on [[Adam Drake's blog|https://aadrake.com/]]. Its name? [[Mission Command|http://usacac.army.mil/sites/default/files/misc/doctrine/CDG/adp6_0.html]]. (The key alternative being detailed command.)
+
+Now this is an interesting revelation for more than one reason. I consider it a positive thing I can now more clearly articulate how I naturally tend to work as a team leader. It now becomes clear too what is important to me, by reviewing the key principles:
+
+  * Build cohesive teams through mutual trust.
+  * Create shared understanding.
+  * Provide a clear commander’s intent.
+  * Exercise disciplined initiative.
+  * Use mission orders.
+  * Accept prudent risk.
+  
+Reviewing these principles in detail, this style of leadership should not be mistaken for *laissez-faire*. Providing clear commander's intent, creating shared understanding, using mission orders are very active principles for the leader. For the subordinate, the need to exercise *disciplined* initiative is clearly also not a free-for-all. The need for mutual trust for this to work cannot be emphasised enough.
diff --git a/posts/my_management_style_mission_control.org b/posts/my_management_style_mission_control.org

new file mode 100644 (file)

index 0000000..5335e56
--- /dev/null
+++ b/posts/my_management_style_mission_control.org
@@ -0,0 +1,36 @@
+#+date: 2018-02-14 15:38:33 +0800
+#+filetags: management leadership
+#+title: My management style - mission control.
+
+I have been asked a few times recently about my management style. First,
+while applying for a position myself. Next, less expected, by a member
+of the org I joined as well as by a candidate I interviewed for a
+position in the team.
+
+My answer was not very concise, as I lacked the framework knowledge to
+be so.
+
+Today, I believe to have stumbled on a description of the style I
+practice (or certainly aim to) most often on [[Adam Drake's
+blog|https://aadrake.com/]]. Its name? [[Mission
+Command|http://usacac.army.mil/sites/default/files/misc/doctrine/CDG/adp6_0.html]].
+(The key alternative being detailed command.)
+
+Now this is an interesting revelation for more than one reason. I
+consider it a positive thing I can now more clearly articulate how I
+naturally tend to work as a team leader. It now becomes clear too what
+is important to me, by reviewing the key principles:
+
+- Build cohesive teams through mutual trust.
+- Create shared understanding.
+- Provide a clear commander's intent.
+- Exercise disciplined initiative.
+- Use mission orders.
+- Accept prudent risk.
+
+Reviewing these principles in detail, this style of leadership should
+not be mistaken for /laissez-faire/. Providing clear commander's intent,
+creating shared understanding, using mission orders are very active
+principles for the leader. For the subordinate, the need to exercise
+/disciplined/ initiative is clearly also not a free-for-all. The need
+for mutual trust for this to work cannot be emphasised enough.
diff --git a/posts/new_blog_activated.mdwn b/posts/new_blog_activated.mdwn

new file mode 100644 (file)

index 0000000..632a468
--- /dev/null
+++ b/posts/new_blog_activated.mdwn
@@ -0,0 +1,3 @@
+[[!meta date="2016-05-05 21:54:07 +0800"]]
+
+Going back to the early days of this site, I added a blog section in again, replacing the social one, which was overkill. Still need to import the posts done there though.
diff --git a/posts/new_blog_activated.org b/posts/new_blog_activated.org

new file mode 100644 (file)

index 0000000..5fc5304
--- /dev/null
+++ b/posts/new_blog_activated.org
@@ -0,0 +1,6 @@
+#+date: 2016-05-05 21:54:07 +0800
+#+title: New blog activated
+
+Going back to the early days of this site, I added a blog section in
+again, replacing the social one, which was overkill. Still need to
+import the posts done there though.
diff --git a/posts/obnam_multi_client_encrypted_backups.mdwn b/posts/obnam_multi_client_encrypted_backups.mdwn

new file mode 100644 (file)

index 0000000..4698c3f
--- /dev/null
+++ b/posts/obnam_multi_client_encrypted_backups.mdwn
@@ -0,0 +1,31 @@
+[[!meta date="2016-06-09 20:41:06 +0800"]]
+
+Trying to configure [obnam](http://obnam.org) to use one repository for 3 clients using encryption has been a bit of search.
+
+Initialising the first client was straightforward. I simply set it up to use a gpg key for encryption per the manual. Since that key is only used for encrypting backups from this client, making it not have a passphrase seemed to be a good option.
+
+For the next client, things got a bit trickier. Since the backup repository is now encrypted, that client couldn't access it. The solution I ended up with was to temporarily ensure client 2 has access to client 1's secret key too.
+
+On client 1: `gpg --export-secret-key -a LONG_KEY > client1.private.key`
+
+That file I had to copy to the other client, and import it using:
+
+On client 2: `gpg --import client1.private.key`
+
+Now I could configure this client with its own gpg key and perform an initial backup.
+
+After this, client 1's secret key can be removed again: `gpg --delete-secret-key LONG_KEY` followed by `gpg --delete-key LONG_KEY`.
+
+(Not removing it defeats the purpose of having a specific key per client - the workaround above doesn't seem entirely sensible from that perspective either, as the secret key needs to be shared temporarily.)
+
+The third client should have been easy, but gpg-agent made it a bit more tricky. Obnam failed to run because it couldn't find gpg-agent. Several workarounds have been documented in the past, but they all ended up not working anymore since version 2.1 of gpg-agent. I ended up [^1] having to modify `~/.bashrc` as follows: 
+
+       function gpg-update() {
+               GPG_PID=$(pidof gpg-agent)
+               GPG_AGENT_INFO=${HOME}/.gnupg/S.gpg-agent:$GPG_PID:1
+               export GPG_AGENT_INFO
+       }
+
+       gpg-update
+
+[^1]: Courtesy of [Brian Lane on RedHat's bugtracker](https://bugzilla.redhat.com/show_bug.cgi?id=1221234#c5)
diff --git a/posts/obnam_multi_client_encrypted_backups.org b/posts/obnam_multi_client_encrypted_backups.org

new file mode 100644 (file)

index 0000000..d79a265
--- /dev/null
+++ b/posts/obnam_multi_client_encrypted_backups.org
@@ -0,0 +1,52 @@
+#+date 2016-06-09 20:41:06 +0800
+#+title obnam multi client encrypted backups 
+
+Trying to configure [[http://obnam.org][obnam]] to use one repository
+for 3 clients using encryption has been a bit of search.
+
+Initialising the first client was straightforward. I simply set it up to
+use a gpg key for encryption per the manual. Since that key is only used
+for encrypting backups from this client, making it not have a passphrase
+seemed to be a good option.
+
+For the next client, things got a bit trickier. Since the backup
+repository is now encrypted, that client couldn't access it. The
+solution I ended up with was to temporarily ensure client 2 has access
+to client 1's secret key too.
+
+On client 1: =gpg --export-secret-key -a LONG_KEY > client1.private.key=
+
+That file I had to copy to the other client, and import it using:
+
+On client 2: =gpg --import client1.private.key=
+
+Now I could configure this client with its own gpg key and perform an
+initial backup.
+
+After this, client 1's secret key can be removed again:
+=gpg --delete-secret-key LONG_KEY= followed by
+=gpg --delete-key LONG_KEY=.
+
+(Not removing it defeats the purpose of having a specific key per
+client - the workaround above doesn't seem entirely sensible from that
+perspective either, as the secret key needs to be shared temporarily.)
+
+The third client should have been easy, but gpg-agent made it a bit more
+tricky. Obnam failed to run because it couldn't find gpg-agent. Several
+workarounds have been documented in the past, but they all ended up not
+working anymore since version 2.1 of gpg-agent. I ended up [fn:1] having
+to modify =~/.bashrc= as follows:
+
+#+begin_example
+function gpg-update() {
+    GPG_PID=$(pidof gpg-agent)
+    GPG_AGENT_INFO=${HOME}/.gnupg/S.gpg-agent:$GPG_PID:1
+    export GPG_AGENT_INFO
+}
+
+gpg-update
+#+end_example
+
+[fn:1] Courtesy of
+       [[https://bugzilla.redhat.com/show_bug.cgi?id=1221234#c5][Brian
+       Lane on RedHat's bugtracker]]
diff --git a/posts/obtaining_SLIP_data_using_R.mdwn b/posts/obtaining_SLIP_data_using_R.mdwn

new file mode 100644 (file)

index 0000000..9e7bc01
--- /dev/null
+++ b/posts/obtaining_SLIP_data_using_R.mdwn
@@ -0,0 +1,41 @@
+[[!meta date="2023-10-11T23:41:54+08:00"]]
+[[!opengraph2 ogimage="https://www.vanrenterghem.biz/blog/pics/SLIP_WA_schools.png"]]
+[[!tag R spatial analysis visualisation]]
+
+Six years ago, [[I wrote about Simple Features (sf) in R|using spatial features and openstreetmap]]. I mapped the number of pupils per high school in the Perth metro area. At the time, I didn't include how to obtain the shapefile, provided as open data by Landgate on behalf of the Western Australian government through its Shared Location Information Platform ([SLIP](https://data.wa.gov.au/slip)).
+
+I have now updated the script, [available in my code repository](http://git.vanrenterghem.biz/?p=R/project-wa-schools.git;a=summary), with an R implementation of [the methodology in SLIP's How To Guides](https://toolkit.data.wa.gov.au/hc/en-gb/articles/115000962734) ([Archive](https://web.archive.org/web/20230608184007/https://toolkit.data.wa.gov.au/hc/en-gb/articles/115000962734)). 
+
+The relevant code looks as follows, simplified greatly through the use of the [httr2](https://httr2.r-lib.org/) library - the equivalent of the [Requests](https://docs.python-requests.org/en/latest/) library used in the Python example in the SLIP knowledge base:
+
+[[!format r """
+library(httr2)
+tempdirSHP <- tempdir()
+tempfileSHP <- tempfile()
+# Create the token request
+req <- request("https://sso.slip.wa.gov.au/as/token.oauth2") |>
+    req_headers("Authorization" = "Basic ZGlyZWN0LWRvd25sb2Fk") |>
+    req_body_form(grant_type = "password",
+                  # SLIP username and password stored in
+                  # pass - the standard unix password manager
+                  username = system2("pass", args = "slip.wa.gov.au | grep Username | sed -e 's/Username: //'", stdout = TRUE),
+                  password = system2("pass", args = "slip.wa.gov.au | head -1", stdout = TRUE))
+# Obtain the token response
+tokenResponse <- req_perform(req)
+# Define the SLIP file to download
+slipUrl <-  "https://direct-download.slip.wa.gov.au/datadownload/Education/Current_Active_Schools_Sem_1_2022_Public_DET_020_WA_GDA94_Public_Shapefile.zip"
+# Create the request for the SLIP file using the received token
+req <- request(slipUrl) |>
+    req_headers( 'Authorization' = paste0('Bearer ',resp_body_json(tokenResponse)$access_token))
+# Obtain the SLIP file using the created request
+responseSlip <- req_perform(req)
+
+"""]]
+
+An updated plot of the high school enrollment numbers looks as follows (for clarity, I've only included the names of schools in the top 5% as ranked by student numbers):
+
+[[!img /pics/SLIP_WA_schools.png alt="Pupil density in Western Australian high schools" class="img-fluid"]]
+
+
+
+
diff --git a/posts/obtaining_SLIP_data_using_R.org b/posts/obtaining_SLIP_data_using_R.org

new file mode 100644 (file)

index 0000000..afd8261
--- /dev/null
+++ b/posts/obtaining_SLIP_data_using_R.org
@@ -0,0 +1,51 @@
+#+date: 2023-10-11T23:41:54+08:00 
+#+opengraph2: ogimage="https://www.vanrenterghem.biz/blog/pics/SLIP_WA_schools.png"
+#+filetags: R spatial analysis visualisation
+
+Six years ago, [[file:using spatial features and openstreetmap][I wrote about Simple Features (sf) in R]]. I mapped the number of pupils per high
+school in the Perth metro area. At the time, I didn't include how to
+obtain the shapefile, provided as open data by Landgate on behalf of the
+Western Australian government through its Shared Location Information
+Platform ([[https://data.wa.gov.au/slip][SLIP]]).
+
+I have now updated the script,
+[[http://git.vanrenterghem.biz/?p=R/project-wa-schools.git;a=summary][available
+in my code repository]], with an R implementation of
+[[https://toolkit.data.wa.gov.au/hc/en-gb/articles/115000962734][the
+methodology in SLIP's How To Guides]]
+([[https://web.archive.org/web/20230608184007/https://toolkit.data.wa.gov.au/hc/en-gb/articles/115000962734][Archive]]).
+
+The relevant code looks as follows, simplified greatly through the use
+of the [[https://httr2.r-lib.org/][httr2]] library - the equivalent of
+the [[https://docs.python-requests.org/en/latest/][Requests]] library
+used in the Python example in the SLIP knowledge base:
+
+#+BEGIN_SRC R
+library(httr2) 
+tempdirSHP <- tempdir() 
+tempfileSHP <- tempfile()
+# Create the token request 
+req <- request("https://sso.slip.wa.gov.au/as/token.oauth2") |>
+  req_headers("Authorization" = "Basic ZGlyZWN0LWRvd25sb2Fk") |>
+  req_body_form(grant_type = "password", # SLIP username and password stored in 
+                                         # pass - the standard unix password manager
+  username = system2("pass", args = "slip.wa.gov.au | grep Username | sed -e 's/Username: //'", stdout = TRUE), 
+  password = system2("pass", args = "slip.wa.gov.au | head -1", stdout = TRUE)) 
+# Obtain the token response
+tokenResponse <- req_perform(req) 
+# Define the SLIP file to download
+slipUrl <- "https://direct-download.slip.wa.gov.au/datadownload/Education/Current_Active_Schools_Sem_1_2022_Public_DET_020_WA_GDA94_Public_Shapefile.zip"
+# Create the request for the SLIP file using the received token req <-
+request(slipUrl) |> 
+  req_headers( 'Authorization' = paste0('Bearer',resp_body_json(tokenResponse)$access_token)) 
+# Obtain the SLIP file using the created request 
+responseSlip <- req_perform(req)
+#+END_SRC
+
+An updated plot of the high school enrollment numbers looks as follows
+(for clarity, I've only included the names of schools in the top 5% as
+ranked by student numbers):
+
+#+CAPTION: Pupil density in Western Australian high schools
+#+ATTR_HTML: :class img-fluid :alt Pupil density in Western Australian high schools
+[[file:assets/SLIP_WA_schools.png]]
diff --git a/posts/on_social_media.mdwn b/posts/on_social_media.mdwn

new file mode 100644 (file)

index 0000000..1c4a715
--- /dev/null
+++ b/posts/on_social_media.mdwn
@@ -0,0 +1,5 @@
+[[!meta date="2018-01-08 21:04:09 +0800"]]
+[[!tag social_media blogging meta_thinking open_web]]
+Dries Buytaert [wrote last week](https://dri.es/more-blogging-and-less-social-media) about intending to use social media less in 2018. As an entrepreneur developing a CMS, he has a vested interest in preventing the world moving to see the internet as being either Facebook, Instagram or Twitter (or reversing that current-state maybe). Still, I believe he is genuinely concerned about the effect of using social media on our thinking. This partly because I share the observation. Despite having been an early adopter, I disabled my Facebook account a year or two ago already. I'm currently in doubt whether I should not do the same with Twitter. I notice it actually is not as good a source of news as classic news sites - headlines simply get repeated numerous times when major events happen, and other news is equally easily noticed browsing a traditional website. Fringe and mainstream thinkers alike in the space of [management](http://tompeters.com/), [R](http://dirk.eddelbuettel.com/blog/) [stats](http://www.brodrigues.co/), [computing hardware](https://www.olimex.com/) etc are a different matter. While, as Dries notices, their micro-messages are typically not well worked out, they do make me aware of what they have blogged about - for those that actually still blog. So is it a matter of trying to increase my Nexcloud newsreader use, maybe during dedicated reading time, and no longer opening the Twitter homepage on my phone at random times throughout the day, and conceding short statements without a more worked out bit of content behind it are not all that useful?
+
+The above focuses on consuming content of others. To foster conversations, which arguably is the intent of social media too, we might need something like [webmentions](https://www.w3.org/TR/webmention/) to pick up steam too.
diff --git a/posts/on_social_media.org b/posts/on_social_media.org

new file mode 100644 (file)

index 0000000..44f8884
--- /dev/null
+++ b/posts/on_social_media.org
@@ -0,0 +1,33 @@
+#+date: 2018-01-08 21:04:09 +0800
+#+filetags: social_media blogging meta_thinking open_web
+#+title: On social media
+
+Dries Buytaert [[https://dri.es/more-blogging-and-less-social-media][wrote last week]]
+about intending to use social media less in 2018. As an entrepreneur
+developing a CMS, he has a vested interest in preventing the world
+moving to see the internet as being either Facebook, Instagram or
+Twitter (or reversing that current-state maybe). Still, I believe he is
+genuinely concerned about the effect of using social media on our
+thinking. This partly because I share the observation. Despite having
+been an early adopter, I disabled my Facebook account a year or two ago
+already. I'm currently in doubt whether I should not do the same with
+Twitter. I notice it actually is not as good a source of news as classic
+news sites - headlines simply get repeated numerous times when major
+events happen, and other news is equally easily noticed browsing a
+traditional website. Fringe and mainstream thinkers alike in the space
+of [[http://tompeters.com/][management]],
+[[http://dirk.eddelbuettel.com/blog/][R]]
+[[http://www.brodrigues.co/][stats]],
+[[https://www.olimex.com/][computing hardware]] etc are a different
+matter. While, as Dries notices, their micro-messages are typically not
+well worked out, they do make me aware of what they have blogged about -
+for those that actually still blog. So is it a matter of trying to
+increase my Nexcloud newsreader use, maybe during dedicated reading
+time, and no longer opening the Twitter homepage on my phone at random
+times throughout the day, and conceding short statements without a more
+worked out bit of content behind it are not all that useful?
+
+The above focuses on consuming content of others. To foster
+conversations, which arguably is the intent of social media too, we
+might need something like
+[[https://www.w3.org/TR/webmention/][webmentions]] to pick up steam too.
diff --git a/posts/sending_pingback_oldskool_indieweb.mdwn b/posts/sending_pingback_oldskool_indieweb.mdwn

new file mode 100644 (file)

index 0000000..5bda94d
--- /dev/null
+++ b/posts/sending_pingback_oldskool_indieweb.mdwn
@@ -0,0 +1,57 @@
+[[!meta title="About sending pingbacks, webmentions and some thoughts on how to improve on them."]]
+[[!tag indieweb blogging open_web]]
+[[!meta date="2023-10-31T20:41:30+08:00"]]
+ 
+In a 'blast from the past', I sent my first [pingback](https://www.hixie.ch/specs/pingback/pingback) after writing the [[previous post|agent based models digital twins]]. A pingback is a way for a blogger to send a message to another blogger, informing them they've written a post that refers to theirs, e.g. as a reply or an extension of the ideas raised.
+
+The process is a bit more involved than using a [webmention](https://www.w3.org/TR/webmention/), which I've used before and [[implemented support for|Implementing webmention on my blog]] a while back, due to requiring an XML message to be created rather than a simple exchange of URLs.
+
+First, I created a file `pingback.xml` containing the URLs of the blog post I wrote and the one I made reference to within my post. The standard defines the schema, resulting in the following:
+
+[[!format xml """
+<?xml version="1.0" encoding="UTF-8"?>
+<methodCall>
+    <methodName>pingback.ping</methodName>
+    <params>
+        <param>
+            <value><string>https://www.vanrenterghem.biz/blog/posts/agent_based_models_digital_twins/</string></value>
+        </param>
+        <param>
+            <value><string>https://johncarlosbaez.wordpress.com/2023/10/25/software-for-compositional-modeling-in-epidemiology/</string></value>
+        </param>
+    </params>
+</methodCall>
+"""]]
+
+Next, I used `curl` on the command-line to send this file in a POST request to Wordpress's pingback service. I had to use the `-k` option to make this work - bypassing verification of the TLS certificate.
+
+[[!format sh """
+curl -k https://johncarlosbaez.wordpress.com/xmlrpc.php -d @pingback.xml
+"""]]
+
+In a sign things were going well, I saw the following appear in my website's access log:
+
+[[!format txt """
+192.0.112.141 - - [29/Oct/2023:09:35:06 +0100] "GET /blog/posts/agent_based_models_digital_twins/ HTTP/1.1" 200 2676 "https://www.vanrenterghem.biz/blog/posts/agent_based_models_digital_twins/" "WordPress.com; https://johncarlosbaez.wordpress.com; verifying pingback from 139.216.235.49"
+"""]]
+
+Finally, I received the following response to my `curl` request on the command-line:
+
+[[!format xml """
+<?xml version="1.0" encoding="UTF-8"?>
+<methodResponse>
+  <params>
+    <param>
+      <value>
+      <string>Pingback from https://www.vanrenterghem.biz/blog/posts/agent_based_models_digital_twins/ to https://johncarlosbaez.wordpress.com/2023/10/25/software-for-compositional-modeling-in-epidemiology/ registered. Keep the web talking! :-)</string>
+      </value>
+    </param>
+  </params>
+</methodResponse>
+"""]]
+
+That "Keep the web talking! :-)" message made me smile.
+
+In order to understand a bit better how things were being processed, I checked the Wordpress code for its pingback service, and it appears they [take the title of the linked article as the author](https://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-wp-xmlrpc-server.php?rev=56637#L7040), which seems a bit odd. The pingback standard didn't allow for anything but the swapping out of links though. How your reference is summarized on the referred site is entirely left to recipient - who may process pingbacks manually or use a service automating (parts of) the processing. 
+
+Wordpress processes pingbacks automatically, turning them into comments on the original post. As the comment text, [Wordpress uses the link text in the anchor element](https://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-wp-xmlrpc-server.php?rev=56637#L7036) with a horizontal ellipsis around it, and some filtering to prevent the comment from being too long. It's odd how the standard didn't define further approaches to make this a bit easier. A pingback attribute in the anchor element would have been helpful for instance, as we could put some text in there to summarise our page when the pingback is processed automatically. Most surprisingly maybe, with the benefit of hindsight, it would have been interesting had the subsequent standard that emerged, Webmention, implemented some further enhancements. [Aaron Parecki](https://aaronparecki.com/), author of the Webmention W3C Recommendation, might know if that was ever considered, or just not within the use case for pingbacks / webmentions? There seemed to have been [some thought put into it in 2019](https://aaronparecki.com/2019/10/15/26/) at least.
diff --git a/posts/sending_pingback_oldskool_indieweb.org b/posts/sending_pingback_oldskool_indieweb.org

new file mode 100644 (file)

index 0000000..59d96c9
--- /dev/null
+++ b/posts/sending_pingback_oldskool_indieweb.org
@@ -0,0 +1,98 @@
+#+title: About sending pingbacks, webmentions and some thoughts on how to improve on them.
+#+filetags: indieweb blogging open_web
+#+date: 2023-10-31T20:41:30+08:00
+
+In a 'blast from the past', I sent my first
+[[https://www.hixie.ch/specs/pingback/pingback][pingback]] after writing
+the [[previous post|agent based models digital twins]]. A pingback is a
+way for a blogger to send a message to another blogger, informing them
+they've written a post that refers to theirs, e.g. as a reply or an
+extension of the ideas raised.
+
+The process is a bit more involved than using a
+[[https://www.w3.org/TR/webmention/][webmention]], which I've used
+before and [[implemented support for|Implementing webmention on my
+blog]] a while back, due to requiring an XML message to be created
+rather than a simple exchange of URLs.
+
+First, I created a file =pingback.xml= containing the URLs of the blog
+post I wrote and the one I made reference to within my post. The
+standard defines the schema, resulting in the following:
+
+#+BEGIN_SRC XML
+<?xml version="1.0" encoding="UTF-8"?>
+<methodCall>
+    <methodName>pingback.ping</methodName>
+    <params>
+        <param>
+            <value><string>https://www.vanrenterghem.biz/blog/posts/agent_based_models_digital_twins/</string></value>
+        </param>
+        <param>
+            <value><string>https://johncarlosbaez.wordpress.com/2023/10/25/software-for-compositional-modeling-in-epidemiology/</string></value>
+        </param>
+    </params>
+</methodCall>
+#+END_SRC
+
+Next, I used =curl= on the command-line to send this file in a POST
+request to Wordpress's pingback service. I had to use the =-k= option to
+make this work - bypassing verification of the TLS certificate.
+
+#+BEGIN_SRC sh
+curl -k https://johncarlosbaez.wordpress.com/xmlrpc.php -d @pingback.xml
+#+END_SRC
+
+In a sign things were going well, I saw the following appear in my
+website's access log:
+
+#+BEGIN_SRC txt
+192.0.112.141 - - [29/Oct/2023:09:35:06 +0100] "GET /blog/posts/agent_based_models_digital_twins/ HTTP/1.1" 200 2676
+"https://www.vanrenterghem.biz/blog/posts/agent_based_models_digital_twins/"
+"WordPress.com; https://johncarlosbaez.wordpress.com; verifying pingback
+from 139.216.235.49"
+#+END_SRC
+
+Finally, I received the following response to my =curl= request on the
+command-line:
+
+#+BEGIN_SRC XML
+<?xml version="1.0" encoding="UTF-8"?>
+<methodResponse>
+  <params>
+    <param>
+      <value>
+      <string>Pingback from https://www.vanrenterghem.biz/blog/posts/agent_based_models_digital_twins/ to https://johncarlosbaez.wordpress.com/2023/10/25/software-for-compositional-modeling-in-epidemiology/ registered. Keep the web talking! :-)</string>
+      </value>
+    </param>
+  </params>
+</methodResponse>
+#+END_SRC
+
+That "Keep the web talking! :-)" message made me smile.
+
+In order to understand a bit better how things were being processed, I
+checked the Wordpress code for its pingback service, and it appears they
+[[https://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-wp-xmlrpc-server.php?rev=56637#L7040][take
+the title of the linked article as the author]], which seems a bit odd.
+The pingback standard didn't allow for anything but the swapping out of
+links though. How your reference is summarized on the referred site is
+entirely left to recipient - who may process pingbacks manually or use a
+service automating (parts of) the processing.
+
+Wordpress processes pingbacks automatically, turning them into comments
+on the original post. As the comment text,
+[[https://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-wp-xmlrpc-server.php?rev=56637#L7036][Wordpress
+uses the link text in the anchor element]] with a horizontal ellipsis
+around it, and some filtering to prevent the comment from being too
+long. It's odd how the standard didn't define further approaches to make
+this a bit easier. A pingback attribute in the anchor element would have
+been helpful for instance, as we could put some text in there to
+summarise our page when the pingback is processed automatically. Most
+surprisingly maybe, with the benefit of hindsight, it would have been
+interesting had the subsequent standard that emerged, Webmention,
+implemented some further enhancements.
+[[https://aaronparecki.com/][Aaron Parecki]], author of the Webmention
+W3C Recommendation, might know if that was ever considered, or just not
+within the use case for pingbacks / webmentions? There seemed to have
+been [[https://aaronparecki.com/2019/10/15/26/][some thought put into it
+in 2019]] at least.
diff --git a/posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.mdwn b/posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.mdwn

new file mode 100644 (file)

index 0000000..5a7b167
--- /dev/null
+++ b/posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.mdwn
@@ -0,0 +1,12 @@
+[[!meta date="2017-11-16 15:07:59 +0800"]]
+[[!tag R spatial analysis visualisation Australia git]]
+
+Trying to plot the income per capita in Australia on a map, I came across a perfectly good reason to make good use of [[a spatial query|http://r-spatial.org/r/2017/06/22/spatial-index.html]] in R.
+
+I had to combine a shapefile of Australian SA3's, a concept used under the Australian Statistical Geography Standard meaning Statistical Area Level 3, with a dataset of income per postal code. I created a matrix of intersecting postal codes and SA3's, and obtained the desired income per capita by SA3 performing a matrix multiplication. If the geographical areas were perfectly alignable, using a function like `st_contains` would have been preferred. Now I fell back on using `st_intersects`, which results in possibly assigning a postal code to 2 different statistical areas. Alternative approaches are welcome in the comments!
+
+As Australia is so vast, and the majority of its people are earning a living in a big city, a full map does not show the difference in income per area at this level of detail. Instead, I opted to map some of the key cities in a single image.
+
+[[Income distribution in major AU cities|/pics/AUCitiesIncomeDistribution.gif]]
+
+The full code is available on my git server for you to clone using `git clone git://git.vanrenterghem.biz/R/project-au-taxstats.git`.
diff --git a/posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.org b/posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.org

new file mode 100644 (file)

index 0000000..c12176c
--- /dev/null
+++ b/posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.org
@@ -0,0 +1,29 @@
+#+date: 2017-11-16 15:07:59 +0800
+#+filetags: R spatial analysis visualisation Australia git
+#+title: Spatial indexes to plot income per postal code in Australian cities.
+
+Trying to plot the income per capita in Australia on a map, I came
+across a perfectly good reason to make good use of [[a spatial
+query|http://r-spatial.org/r/2017/06/22/spatial-index.html]] in R.
+
+I had to combine a shapefile of Australian SA3's, a concept used under
+the Australian Statistical Geography Standard meaning Statistical Area
+Level 3, with a dataset of income per postal code. I created a matrix of
+intersecting postal codes and SA3's, and obtained the desired income per
+capita by SA3 performing a matrix multiplication. If the geographical
+areas were perfectly alignable, using a function like =st_contains=
+would have been preferred. Now I fell back on using =st_intersects=,
+which results in possibly assigning a postal code to 2 different
+statistical areas. Alternative approaches are welcome in the comments!
+
+As Australia is so vast, and the majority of its people are earning a
+living in a big city, a full map does not show the difference in income
+per area at this level of detail. Instead, I opted to map some of the
+key cities in a single image.
+
+#+CAPTION: Income distribution in major AU cities
+#+ATTR_HTML: :class img-fluid :alt Income distribution in major AU cities
+[[file:assets/AUCitiesIncomeDistribution.gif]]
+
+The full code is available on my git server for you to clone using
+=git clone git://git.vanrenterghem.biz/R/project-au-taxstats.git=.
diff --git a/posts/state-of_SaaS.mdwn b/posts/state-of_SaaS.mdwn

new file mode 100644 (file)

index 0000000..cad484b
--- /dev/null
+++ b/posts/state-of_SaaS.mdwn
@@ -0,0 +1,3 @@
+[[!meta date="2015-11-08 01:09:53 +1300"]]
+Nov 2015. Still no Mailpile 1.0. Roundcube Next MIA. Lest we forget, no Freedom
+Box.
diff --git a/posts/state-of_SaaS.org b/posts/state-of_SaaS.org

new file mode 100644 (file)

index 0000000..35a0527
--- /dev/null
+++ b/posts/state-of_SaaS.org
@@ -0,0 +1,5 @@
+#+date: 2015-11-08 01:09:53 +1300
+#+title: State of SaaS
+#+filetags: musings
+
+Nov 2015. Still no Mailpile 1.0. Roundcube Next MIA. Lest we forget, no Freedom Box.
diff --git a/posts/survival_analysis_in_fintech.mdwn b/posts/survival_analysis_in_fintech.mdwn

new file mode 100644 (file)

index 0000000..6ae99da
--- /dev/null
+++ b/posts/survival_analysis_in_fintech.mdwn
@@ -0,0 +1,42 @@
+[[!meta date="2016-12-10 15:49:02 +0800"]]
+
+[[!tag R fintech analysis]]
+It is useful to apply the concepts from survival data analysis in a fintech environment. After all, there will usually be a substantial amount of time-to-event data to choose from. This can be website visitors leaving the site, loans being repaid early, clients becoming delinquent - the options are abound.
+
+A visual analysis of such data can easily be obtained using R.
+
+[[!format r """
+library(survminer)
+library(survival)
+library(KMSurv)
+## Create survival curve from a survival object
+#' Status is 1 if the event was observed at TimeAfterStart
+#' It is set to 0 to mark the right-censored time
+vintage.survival <- survfit(Surv(TimeAfterStart,Status) ~ Vintage, data = my.dataset)
+## Generate cumulative incidence plot
+ci.plot <- ggsurvplot(vintage.survival,
+           fun = function(y) 1-y,
+           censor = FALSE,
+           conf.int = FALSE,
+           ylab = 'Ratio event observed',
+           xlab = 'Time after open',
+           break.time.by = 30,
+           legend = "bottom",
+           legend.title = "",
+           risk.table = TRUE,
+           risk.table.title = 'Number of group',
+           risk.table.col = "black",
+           risk.table.fontsize = 4,
+           risk.table.height = 0.35
+           )
+"""]]
+
+This produces a plot with a survival curve per group, and also includes the risk table. This table shows how many members of the group for whom no event was observed are still being followed at each point in time. Labelling these "at risk" stems of course from the original concept of survival analysis, where the event typically is the passing of the subject.
+
+The `fun = function(y) 1-y` part actually reverses the curve, resulting in what is known as a cumulative incidence curve.
+
+[[!img pics/surv-curve-risk-table.png size="442x393" alt="Survival/incidence curve and risk table"]]
+
+Underneath the plot, a risk table is added with no effort by adding `risk.table = TRUE` as parameter for `ggsurvplot`.
+
+Checking the trajectory of these curves for different groups of customers (with a different treatment plan, to stick to the terminology) is an easy way to verify whether actions are having the expected result.
diff --git a/posts/survival_analysis_in_fintech.org b/posts/survival_analysis_in_fintech.org

new file mode 100644 (file)

index 0000000..18cad1e
--- /dev/null
+++ b/posts/survival_analysis_in_fintech.org
@@ -0,0 +1,56 @@
+#+date: 2016-12-10 15:49:02 +0800
+#+filetags: R fintech analysis
+#+title: Survival analysis in fintech
+
+It is useful to apply the concepts from survival data analysis in a fintech environment. After all, there will
+usually be a substantial amount of time-to-event data to choose from.
+This can be website visitors leaving the site, loans being repaid early,
+clients becoming delinquent - the options are abound.
+
+A visual analysis of such data can easily be obtained using R.
+
+#+BEGIN_SRC R
+library(survminer) 
+library(survival) 
+library(KMSurv) 
+## Create survival curve from a survival object 
+#' Status is 1 if the event was observed at TimeAfterStart 
+#' It is set to 0 to mark the right-censored time 
+vintage.survival <- survfit(Surv(TimeAfterStart,Status) ~ Vintage, data = my.dataset) 
+## Generate cumulative incidence plot 
+ci.plot <- ggsurvplot(vintage.survival, 
+                      fun = function(y) 1-y, 
+                     censor = FALSE,
+                     conf.int = FALSE, 
+                     ylab = 'Ratio event observed', 
+                     xlab = 'Time after open', 
+                     break.time.by = 30, 
+                     legend = "bottom", 
+                     legend.title = “",
+                     risk.table = TRUE, 
+                     risk.table.title = 'Number of group', 
+                     risk.table.col ="black”, 
+                     risk.table.fontsize = 4, 
+                     risk.table.height = 0.35 )
+#+END_SRC
+
+This produces a plot with a survival curve per group, and also includes
+the risk table. This table shows how many members of the group for whom
+no event was observed are still being followed at each point in time.
+Labelling these "at risk" stems of course from the original concept of
+survival analysis, where the event typically is the passing of the
+subject.
+
+The =fun = function(y) 1-y= part actually reverses the curve, resulting
+in what is known as a cumulative incidence curve.
+
+#+CAPTION: Survival/incidence curve and risk table
+#+ATTR_HTML :width 442 :class img-fluid :alt Survival/incidence curve and risk table
+[[file:assets/surv-curve-risk-table.png]]
+
+Underneath the plot, a risk table is added with no effort by adding
+=risk.table = TRUE= as parameter for =ggsurvplot=.
+
+Checking the trajectory of these curves for different groups of
+customers (with a different treatment plan, to stick to the terminology)
+is an easy way to verify whether actions are having the expected result.
diff --git a/posts/using_Apache_Nifi_Kafka_big_data_tools.mdwn b/posts/using_Apache_Nifi_Kafka_big_data_tools.mdwn

new file mode 100644 (file)

index 0000000..75a1943
--- /dev/null
+++ b/posts/using_Apache_Nifi_Kafka_big_data_tools.mdwn
@@ -0,0 +1,33 @@
+[[!meta date="2018-09-11 21:09:06 +0800"]]
+[[!tag Apache Nifi Kafka bigdata streaming]]
+
+Working in analytics these days, the concept of big data has been firmly established. Smart engineers have been developing cool technology to work with it for a while now. The [[Apache Software Foundation|https://apache.org]] has emerged as a hub for many of these - Ambari, Hadoop, Hive, Kafka, Nifi, Pig, Zookeeper - the list goes on.
+
+While I'm mostly interested in improving business outcomes applying analytics, I'm also excited to work with some of these tools to make that easier.
+
+Over the past few weeks, I have been exploring some tools, installing them on my laptop or a server and giving them a spin. Thanks to [[Confluent, the founders of Kafka|https://www.confluent.io]] it is super easy to try out Kafka, Zookeeper, KSQL and their REST API. They all come in a pre-compiled tarball which just works on Arch Linux. (After trying to compile some of these, this is no luxury - these apps are very interestingly built...) Once unpacked, all it takes to get started is:
+
+[[!format sh """
+./bin/confluent start
+"""]]
+
+I also spun up an instance of [[nifi|https://nifi.apache.org/download.html]], which I used to monitor a (json-ised) apache2 webserver log. Every new line added to that log goes as a message to Kafka.
+
+[[Apache Nifi configuration|/pics/ApacheNifi.png]]
+
+A processor monitoring a file (tailing) copies every new line over to another processor publishing it to a Kafka topic. The Tailfile monitor includes options for rolling filenames, and what delineates each message. I set it up to process a custom logfile from my webserver, which was defined to produce JSON messages instead of the somewhat cumbersome to process standard logfile output (defined in apache2.conf, enabled in the webserver conf):
+
+[[!format sh """
+LogFormat "{ \"time\":\"%t\", \"remoteIP\":\"%a\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\", \"size\":\"%O\" }" leapache
+"""]]
+
+All the hard work is being done by Nifi. (Something like
+
+[[!format sh """
+tail -F /var/log/apache2/access.log | kafka-console-producer.sh --broker-list localhost:9092 --topic accesslogapache
+"""]]
+
+would probably be close to the CLI equivalent on a single-node system like my test setup, with the -F option to ensure the log rotation doesn't break things. Not sure how the message demarcator would need to be configured.)
+
+The above results in a Kafka message stream with every request hitting my webserver in real-time available for further analysis.
+
diff --git a/posts/using_Apache_Nifi_Kafka_big_data_tools.org b/posts/using_Apache_Nifi_Kafka_big_data_tools.org

new file mode 100644 (file)

index 0000000..297c75f
--- /dev/null
+++ b/posts/using_Apache_Nifi_Kafka_big_data_tools.org
@@ -0,0 +1,62 @@
+#+date: 2018-09-11 21:09:06 +0800
+#+filetags: Apache Nifi Kafka bigdata streaming
+#+title: Using Apache Nifi and Kafka - big data tools
+
+Working in analytics these days, the concept of big data has been firmly
+established. Smart engineers have been developing cool technology to
+work with it for a while now. The [[Apache Software
+Foundation|https://apache.org]] has emerged as a hub for many of these -
+Ambari, Hadoop, Hive, Kafka, Nifi, Pig, Zookeeper - the list goes on.
+
+While I'm mostly interested in improving business outcomes applying
+analytics, I'm also excited to work with some of these tools to make
+that easier.
+
+Over the past few weeks, I have been exploring some tools, installing
+them on my laptop or a server and giving them a spin. Thanks to
+[[Confluent, the founders of Kafka|https://www.confluent.io]] it is
+super easy to try out Kafka, Zookeeper, KSQL and their REST API. They
+all come in a pre-compiled tarball which just works on Arch Linux.
+(After trying to compile some of these, this is no luxury - these apps
+are very interestingly built...) Once unpacked, all it takes to get
+started is:
+
+#+BEGIN_SRC sh
+./bin/confluent start
+#+END_SRC
+
+I also spun up an instance of
+[[nifi|https://nifi.apache.org/download.html]], which I used to monitor
+a (json-ised) apache2 webserver log. Every new line added to that log
+goes as a message to Kafka.
+
+[[Apache Nifi configuration|/pics/ApacheNifi.png]]
+
+A processor monitoring a file (tailing) copies every new line over to
+another processor publishing it to a Kafka topic. The Tailfile monitor
+includes options for rolling filenames, and what delineates each
+message. I set it up to process a custom logfile from my webserver,
+which was defined to produce JSON messages instead of the somewhat
+cumbersome to process standard logfile output (defined in apache2.conf,
+enabled in the webserver conf):
+
+#+BEGIN_SRC sh
+LogFormat "{ "time":"%t", "remoteIP":"%a", "host":"%V",
+"request":"%U", "query":"%q", "method":"%m", "status":"%>s",
+"userAgent":"%{User-agent}i", "referer":"%{Referer}i", "size":"%O" }"
+leapache
+#+END_SRC
+
+All the hard work is being done by Nifi. (Something like
+
+#+BEGIN_SRC sh
+tail -F /var/log/apache2/access.log | kafka-console-producer.sh --broker-list localhost:9092 --topic accesslogapache
+#+END_SRC
+
+would probably be close to the CLI equivalent on a single-node system
+like my test setup, with the -F option to ensure the log rotation
+doesn't break things. Not sure how the message demarcator would need to
+be configured.)
+
+The above results in a Kafka message stream with every request hitting
+my webserver in real-time available for further analysis.
diff --git a/posts/using_R_to_automate_reports.mdwn b/posts/using_R_to_automate_reports.mdwn

new file mode 100644 (file)

index 0000000..61bf60f
--- /dev/null
+++ b/posts/using_R_to_automate_reports.mdwn
@@ -0,0 +1,39 @@
+[[!meta date="2016-10-10 21:48:11 +0800"]]
+
+[[!tag R code automation]]
+A lot of information on [knitr](https://cran.r-project.org/web/packages/knitr/index.html) is centered around using it for reproducible research. I've found it to be a nice way to make abstraction of mundane reporting though. It is as easy as performing the necessary data extraction and manipulation in an R script, including the creation of tables and graphs.
+
+To develop the report template, simply `source` the R script within an Rmd one, per the example template below:
+
+[[!format r """
+---
+title: "My report"
+date: "`r Sys.time()`" 
+output: pdf_document
+---
+
+```{r setup, include=FALSE}
+library(knitr)
+knitr::opts_chunk$set(echo = TRUE)
+source('my-report.R')
+```
+
+Include some text about your report.
+
+##Add a title.
+
+Some further text.
+
+```{r, echo=FALSE}
+plot(my-plot-object)
+kable(my-df-or-table)
+```
+"""]]
+
+When you are ready to create the report, the convenience of [RMarkdown](https://cran.r-project.org/web/packages/rmarkdown/) is hard to beat:
+
+[[!format bash """
+R -e "rmarkdown::render('~/my-report.Rmd',output_file='~/my-report.pdf')"
+"""]]
+
+Thanks to the YAML header at the start of the report template, information like the report's title and target output format don't need to be mentioned. This command can easily be scripted a bit further to include a date-time stamp in the output filename for instance, and scheduled using `cron`.
diff --git a/posts/using_R_to_automate_reports.org b/posts/using_R_to_automate_reports.org

new file mode 100644 (file)

index 0000000..2f204de
--- /dev/null
+++ b/posts/using_R_to_automate_reports.org
@@ -0,0 +1,43 @@
+#+date: 2016-10-10 21:48:11 +0800
+#+title: Using R to automate reports
+#+filetags: R code automation
+
+A lot of information on [[https://cran.r-project.org/web/packages/knitr/index.html][knitr]] is
+centered around using it for reproducible research. I've found it to be
+a nice way to make abstraction of mundane reporting though. It is as
+easy as performing the necessary data extraction and manipulation in an
+R script, including the creation of tables and graphs.
+
+To develop the report template, simply =source= the R script within an
+Rmd one, per the example template below:
+
+#+BEGIN_SRC R
+:PROPERTIES:
+:CUSTOM_ID: format-r
+:END:
+title: "My report" date: "=r Sys.time()=" output: pdf_document ---
+
+={r setup, include=FALSE} library(knitr) knitr::opts_chunk$set(echo = TRUE) source('my-report.R')=
+
+Include some text about your report.
+
+##Add a title.
+
+Some further text.
+
+={r, echo=FALSE} plot(my-plot-object) kable(my-df-or-table)=
+#+END_SRC
+
+When you are ready to create the report, the convenience of
+[[https://cran.r-project.org/web/packages/rmarkdown/][RMarkdown]] is
+hard to beat:
+
+#+BEGIN_SRC bash 
+R -e "rmarkdown::render('_{/my-report.Rmd',output_file='}/my-report.pdf')"
+#+END_SRC
+
+Thanks to the YAML header at the start of the report template,
+information like the report's title and target output format don't need
+to be mentioned. This command can easily be scripted a bit further to
+include a date-time stamp in the output filename for instance, and
+scheduled using =cron=.
diff --git a/posts/using_spatial_features_and_openstreetmap.mdwn b/posts/using_spatial_features_and_openstreetmap.mdwn

new file mode 100644 (file)

index 0000000..f99bae0
--- /dev/null
+++ b/posts/using_spatial_features_and_openstreetmap.mdwn
@@ -0,0 +1,11 @@
+[[!meta date="2017-10-12 21:30:51 +0800"]]
+[[!tag R spatial analysis visualisation]] 
+Turns out it is possible, thanks to the good folks at [Stamen Design](stamen.com), to get fairly unobtrusive maps based on the OpenStreetMap data.
+
+Combining this with a SLIP dataset from the Western Australian government on active schools in the state provided a good opportunity to check out the recently released [sf (Simple Features) package](https://cran.r-project.org/web/packages/sf/index.html) in R.
+
+Simple features are a standardized way to encode spatial vector data. Support in R was added in November 2016. For users of the tidyverse, this makes manipulating shapefiles and the likes easier, as simple features in R are dataframes!
+
+[[Plot of secondary schools by student population|/pics/Western_Australia_metro_secondary_schools.png]]
+
+[Full details in git](http://git.vanrenterghem.biz/?p=R/project-wa-schools.git;a=summary).
diff --git a/posts/using_spatial_features_and_openstreetmap.org b/posts/using_spatial_features_and_openstreetmap.org

new file mode 100644 (file)

index 0000000..690bd4c
--- /dev/null
+++ b/posts/using_spatial_features_and_openstreetmap.org
@@ -0,0 +1,24 @@
+#+date 2017-10-12 21:30:51 +0800
+#+filetags R spatial analysi visualisation
+#+title: Using spatial features and openstreetmap
+
+Turns out it is possible, thanks to the good folks at
+[[file:stamen.com][Stamen Design]], to get fairly unobtrusive maps based
+on the OpenStreetMap data.
+
+Combining this with a SLIP dataset from the Western Australian
+government on active schools in the state provided a good opportunity to
+check out the recently released
+[[https://cran.r-project.org/web/packages/sf/index.html][sf (Simple
+Features) package]] in R.
+
+Simple features are a standardized way to encode spatial vector data.
+Support in R was added in November 2016. For users of the tidyverse,
+this makes manipulating shapefiles and the likes easier, as simple
+features in R are dataframes!
+
+[[Plot of secondary schools by student
+population|/pics/Western_Australia_metro_secondary_schools.png]]
+
+[[http://git.vanrenterghem.biz/?p=R/project-wa-schools.git;a=summary][Full
+details in git]].
author	Frederik Vanrenterghem <frederik@vanrenterghem.biz>
	Sat, 25 Nov 2023 12:15:44 +0000 (20:15 +0800)
committer	Frederik Vanrenterghem <frederik@vanrenterghem.biz>
	Sat, 25 Nov 2023 12:15:44 +0000 (20:15 +0800)
posts/AUC_and_economics_of_predictive_modelling.mdwn	[new file with mode: 0644]	patch \| blob
posts/AUC_and_economics_of_predictive_modelling.org	[new file with mode: 0644]	patch \| blob
posts/Bluetooth.mdwn	[new file with mode: 0644]	patch \| blob
posts/Bluetooth.org	[new file with mode: 0644]	patch \| blob
posts/Bring_Back_Blogging.mdwn	[new file with mode: 0644]	patch \| blob
posts/Bring_Back_Blogging.org	[new file with mode: 0644]	patch \| blob
posts/Debian_on_A20.mdwn	[new file with mode: 0644]	patch \| blob
posts/Debian_on_A20.org	[new file with mode: 0644]	patch \| blob
posts/Fearless_analysis.mdwn	[new file with mode: 0644]	patch \| blob
posts/Fearless_analysis.org	[new file with mode: 0644]	patch \| blob
posts/FedEx_marries_TNT.mdwn	[new file with mode: 0644]	patch \| blob
posts/FedEx_marries_TNT.org	[new file with mode: 0644]	patch \| blob
posts/Fibonacci_golden_spiral.mdwn	[new file with mode: 0644]	patch \| blob
posts/Fibonacci_golden_spiral.org	[new file with mode: 0644]	patch \| blob
posts/Half_Time_Oranges.mdwn	[new file with mode: 0644]	patch \| blob
posts/Half_Time_Oranges.org	[new file with mode: 0644]	patch \| blob
posts/I_bought_a_balance_board.mdwn	[new file with mode: 0644]	patch \| blob
posts/I_bought_a_balance_board.org	[new file with mode: 0644]	patch \| blob
posts/Implementing_Webmention_on_my_blog.mdwn	[new file with mode: 0644]	patch \| blob
posts/Implementing_Webmention_on_my_blog.org	[new file with mode: 0644]	patch \| blob
posts/In_the_pines.mdwn	[new file with mode: 0644]	patch \| blob
posts/In_the_pines.org	[new file with mode: 0644]	patch \| blob
posts/Innovate_WA_2023.mdwn	[new file with mode: 0644]	patch \| blob
posts/Innovate_WA_2023.org	[new file with mode: 0644]	patch \| blob
posts/Magna_Carta.mdwn	[new file with mode: 0644]	patch \| blob
posts/Magna_Carta.org	[new file with mode: 0644]	patch \| blob
posts/NYC_taxi_calendar_fun.mdwn	[new file with mode: 0644]	patch \| blob
posts/NYC_taxi_calendar_fun.org	[new file with mode: 0644]	patch \| blob
posts/Nobel_prize_winner_having_fun.mdwn	[new file with mode: 0644]	patch \| blob
posts/Nobel_prize_winner_having_fun.org	[new file with mode: 0644]	patch \| blob
posts/Perth_solar_exposure_over_year.mdwn	[new file with mode: 0644]	patch \| blob
posts/Perth_solar_exposure_over_year.org	[new file with mode: 0644]	patch \| blob
posts/R_and_github.mdwn	[new file with mode: 0644]	patch \| blob
posts/R_and_github.org	[new file with mode: 0644]	patch \| blob
posts/Ronald_McDonald_House.mdwn	[new file with mode: 0644]	patch \| blob
posts/Ronald_McDonald_House.org	[new file with mode: 0644]	patch \| blob
posts/Setting_up_an_Analytics_Practice.mdwn	[new file with mode: 0644]	patch \| blob
posts/Setting_up_an_Analytics_Practice.org	[new file with mode: 0644]	patch \| blob
posts/WADSIH_talk_on_consumer_insights.mdwn	[new file with mode: 0644]	patch \| blob
posts/WADSIH_talk_on_consumer_insights.org	[new file with mode: 0644]	patch \| blob
posts/WA_roads_in_R_using_sf.mdwn	[new file with mode: 0644]	patch \| blob
posts/WA_roads_in_R_using_sf.org	[new file with mode: 0644]	patch \| blob
posts/Wrapping_Confluent_Kafka_REST_Proxy_API_in_R.org	[new file with mode: 0644]	patch \| blob
posts/agent_based_models_digital_twins.mdwn	[new file with mode: 0644]	patch \| blob
posts/agent_based_models_digital_twins.org	[new file with mode: 0644]	patch \| blob
posts/azure_file_storage_blobs.mdwn	[new file with mode: 0644]	patch \| blob
posts/azure_file_storage_blobs.org	[new file with mode: 0644]	patch \| blob
posts/different_spin_to_competing_on_analytics.mdwn	[new file with mode: 0644]	patch \| blob
posts/different_spin_to_competing_on_analytics.org	[new file with mode: 0644]	patch \| blob
posts/explore-AU-road-fatalities.mdwn	[new file with mode: 0644]	patch \| blob
posts/explore-AU-road-fatalities.org	[new file with mode: 0644]	patch \| blob
posts/facet_labels_in_R.mdwn	[new file with mode: 0644]	patch \| blob
posts/facet_labels_in_R.org	[new file with mode: 0644]	patch \| blob
posts/fertile-summers.mdwn	[new file with mode: 0644]	patch \| blob
posts/fertile-summers.org	[new file with mode: 0644]	patch \| blob
posts/first_r_package.mdwn	[new file with mode: 0644]	patch \| blob
posts/first_r_package.org	[new file with mode: 0644]	patch \| blob
posts/fun_with_RJDBC_and_RODBC.mdwn	[new file with mode: 0644]	patch \| blob
posts/fun_with_RJDBC_and_RODBC.org	[new file with mode: 0644]	patch \| blob
posts/generating_album_art_on_N9.mdwn	[new file with mode: 0644]	patch \| blob
posts/generating_album_art_on_N9.org	[new file with mode: 0644]	patch \| blob
posts/house_price_evolution.mdwn	[new file with mode: 0644]	patch \| blob
posts/house_price_evolution.org	[new file with mode: 0644]	patch \| blob
posts/loans_and_semi_Markov_chains.mdwn	[new file with mode: 0644]	patch \| blob
posts/loans_and_semi_Markov_chains.org	[new file with mode: 0644]	patch \| blob
posts/managing_data_science_work.mdwn	[new file with mode: 0644]	patch \| blob
posts/managing_data_science_work.org	[new file with mode: 0644]	patch \| blob
posts/monetary_policy_and_mortgage_products.mdwn	[new file with mode: 0644]	patch \| blob
posts/monetary_policy_and_mortgage_products.org	[new file with mode: 0644]	patch \| blob
posts/my_management_style_mission_control.mdwn	[new file with mode: 0644]	patch \| blob
posts/my_management_style_mission_control.org	[new file with mode: 0644]	patch \| blob
posts/new_blog_activated.mdwn	[new file with mode: 0644]	patch \| blob
posts/new_blog_activated.org	[new file with mode: 0644]	patch \| blob
posts/obnam_multi_client_encrypted_backups.mdwn	[new file with mode: 0644]	patch \| blob
posts/obnam_multi_client_encrypted_backups.org	[new file with mode: 0644]	patch \| blob
posts/obtaining_SLIP_data_using_R.mdwn	[new file with mode: 0644]	patch \| blob
posts/obtaining_SLIP_data_using_R.org	[new file with mode: 0644]	patch \| blob
posts/on_social_media.mdwn	[new file with mode: 0644]	patch \| blob
posts/on_social_media.org	[new file with mode: 0644]	patch \| blob
posts/sending_pingback_oldskool_indieweb.mdwn	[new file with mode: 0644]	patch \| blob
posts/sending_pingback_oldskool_indieweb.org	[new file with mode: 0644]	patch \| blob
posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.mdwn	[new file with mode: 0644]	patch \| blob
posts/spatial_indexes_to_plot_income_per_postal_code_in_Australian_cities.org	[new file with mode: 0644]	patch \| blob
posts/state-of_SaaS.mdwn	[new file with mode: 0644]	patch \| blob
posts/state-of_SaaS.org	[new file with mode: 0644]	patch \| blob
posts/survival_analysis_in_fintech.mdwn	[new file with mode: 0644]	patch \| blob
posts/survival_analysis_in_fintech.org	[new file with mode: 0644]	patch \| blob
posts/using_Apache_Nifi_Kafka_big_data_tools.mdwn	[new file with mode: 0644]	patch \| blob
posts/using_Apache_Nifi_Kafka_big_data_tools.org	[new file with mode: 0644]	patch \| blob
posts/using_R_to_automate_reports.mdwn	[new file with mode: 0644]	patch \| blob
posts/using_R_to_automate_reports.org	[new file with mode: 0644]	patch \| blob
posts/using_spatial_features_and_openstreetmap.mdwn	[new file with mode: 0644]	patch \| blob
posts/using_spatial_features_and_openstreetmap.org	[new file with mode: 0644]	patch \| blob