Left of Right.: statistics

Showing posts with label statistics. Show all posts

Wednesday, 20 April 2022

New inflation values out today - be careful how you interpret this!

TL;DR: When StatsCan measures changes in prices, they do so for a fixed basket of goods - this includes maintaining quality at a fixed level. When the quality of goods increases over time this creates problems - especially when consumers cannot choose to still purchase the lower quality good for a cheaper price - the result is that the actual inflation of this good, likely, is significantly more than the posted rate of inflation.

A common argument is that the average household today is better off than John D Rockefeller because we now have access to microwaves and the internet. Thus, this drastic increase in the quality and availability of new goods must mean that we live significantly better lives. The real question is, can we honestly compare the standard of living across such an expanse of time given drastically different goods, and the quality of goods available.

Just the same, an attempt at this adjustment is used in calculating the Consumer Price Index (CPI). recognizing that we have access to superior quality goods today Vs what we had 10, 20, or 30 years ago. This improvement in quality must be accounted for in computing the change in the price of a fixed basket of goods.

I talk about this a lot with my students - CPI inflation is intended to be a measure of the change in aggregate consumer price levels for a fixed basket of goods to provide information to policymakers regarding how consumer prices have changed year over year. This provides insight as to what has been happening with prices, but it is not the full story and it is not a good measure of the cost of living.

But first, why is this problematic? This is problematic as the CPI measure of inflation is often used by employers, pensions, unions, and others to determine the change in the cost of living. This leads to the thinking "Oh, CPI inflation was 2% last year, so as long as I get a 2% pay raise I am no worse off". Unfortunately, as we will see - this may not be true. Specifically, your observed rate of inflation may be significantly higher than the reported CPI inflation - basically, the problem comes down to assumed substitutability.

(This helps to explain why even though wage growth has been slightly higher than CPI inflation, the average wage earner today feels as if their purchasing power is less than it used to be)

To evaluate this, let's focus on the aspect of CPI inflation, rented accommodation - while shelter costs on whole are weighted as 26.8% of the CPI basket of goods, rented accommodation accounts for only 6.4% of the total CPI weighting (because we assume that most households are owners - however, a similar problem exists for owned accommodation as the problem we discuss here).

To begin, we can look at the Canadian Mortgage and Housing Corporation (CMHC) historical records showing the median rental price in the Capital Regional District (CRD) from October 2011 through to October 2021 (the latest available published data), this shows that average rental prices have increased by 4.97% on average over the last ten years (source).

To compare this to the latest CPI inflation information, we see that over the last ten years the rental accommodation aspect of CPI inflation for the CRD has only increased by an average of 1.92% (source).

So on one hand, we have the CMHC saying that rents have increased by 4.97% annually (on average), while we have Statistics Canada through the CPI saying that rents have only increased by 1.92% annually. Where does this large discrepancy come from?

While these two values are computed through different surveys, and over slightly different time periods, it is easy to presume that as both surveys sample from the same population, we should have approximately the same values given the large sample size - that is to say, sampling error does not explain this difference.

What does then? Partly it is in how Statistics Canada and all OECD countries compute the CPI. By definition, the CPI is measuring the change in the price of a fixed basket of goods. For some goods, this is not problematic. IE. The price of a litre of milk in 2011 was $X, then in 2021 the price of litre of milk is now $Y - in this case, milk is milk and has not significantly changed over the last ten years.

But what about when measuring expenditure on other items? Cars, Computers, Cell Phones, Housing? All of these goods have had quality increases over the last decade (a 2021 computer is not the same as a 2011 computer!) As a result, Stats Canada needs to recognize that quality has increased, and thus needs to determine what would be the change in price for a constant quality (fixed basket) rather than the change in price due to the fact that it is a fundamentally new good being sold (again, it would be tough to argue that a 2021 computer is the same good as a 2011 computer).

Statistics Canada does this by using a matched-model method to measure pure price changes. That is, attempting to determine what the price of a good would be if we could somehow keep the quality constant.

While the idea behind this fundamentally makes sense and is necessary - there are some major problems. Often as quality increases, the consumer no longer has the option to obtain the cheaper, original-quality option. For example, as cell phone speed, and features drastically increase each year, you are stuck paying for these features even if you do not want/need them because there are few if any phones on the market that do not include these new improved features. The same can be said for housing or vehicles. In fact, if features (quality) increase substantially while price only increases marginally, this matched-model method may actually report that the price for this good has decreased from year to year!

This becomes exceptionally problematic if the consumer does not actually differentiate based on quality. With respect to housing, what if the consumer is primarily concerned with access to shelter rather than access to different qualities of shelter.

If this is the case, then shelter becomes homogenous irrespective of quality - That is to say that the potential consumer does not significantly see a difference between low-quality shelter and high-quality shelter. Given the current tightness of the rental (and housing) market, this makes sense - and we would expect to witness similar market prices irrespective of quality.

If we take the year built to be a proxy of quality (IE, newer built homes have newer better quality features) we could test this hypothesis by comparing the two possible outcomes:

If we witness little if any price difference based on year built, then the consumer primarily cares about the availability of shelter, irrespective of quality.
If we witness newer places renting for higher amounts, then consumers value quality and thus are willing to pay a premium to access higher quality shelter. (If this is the case, then we should be accounting for this change in quality when computing inflation!)

Likely, we will witness a bit of both happening, so the real question is where are we sitting today Vs. where were we sitting in the past? Well, we find the following (Source)

How do we interpret the above table? Initially, we find a large spread between old builds and new builds, this would signify that in 2011, there is a desire for quality - renters were willing to pay more for a quality (newer) unit over a lower quality (older) unit.

As we move forward to today (2021) we witness that this spread has flattened out. in 2011 the maximum rental price was 43% higher than the minimum, while in 2021 the maximum was only 9.7% higher.

This is signifying that renters are caring less about quality than they care about access to units, thus resulting in a reduction in the quality premium. That is to say, while StatsCan, through the CPI, still discounts higher rents to account for increased quality - the renters are not caring about the quality so much as they are caring about access to shelter.

To summarize, CPI Inflation says that the cost of rental accommodation has only increased by 1.97% Vs CMHC's reported 4.97%. The reason for the significantly lower CPI inflation increase is due to the fact that there has (on average) been an increase in the quality of rented accommodation. To account for this increase in quality, the price increase must take the quality increase into account in order to compare "apples to apples". This "apples to apples" comparison yields only a 1.97% increase in rental prices over the last ten years.

While this method of comparison is preferred to compute changes in prices from an economic and policymaker standpoint, this is problematic when these same metrics are used to compute changes in the cost of living as they will often under-estimate the true change in cost - this is especially true when the consumer does not have the option (or ability) to continue to choose the lower (original) quality option.

The crux of the argument is to use caution when interpreting CPI inflation as there are many assumptions that go into these calculations, and the recently reported annual inflation rate of 6.7% (source) may not be telling you what you think it is.

Any comments or questions please feel free to message me or comment below.

Wednesday, 14 April 2021

COVID False positives.

Source: BCCDC

I have often been forwarded articles like this one here which is stating the large probability of receiving a false negative on a COVID-19 test.

This, at first light, appears to be a huge problem.

From the article, we have the reported probability of receiving a false positive based on how many days you have had this disease for:

Day 1 of disease: 100% chance of false positive.

Day 4 of disease: 67% chance

After symptoms begin to show (When you would likely then go to get a test)

Day 5 of disease 38%

Day 8 of disease 20%

As I said, this appears to be frightening. Of those getting the test (who have symptoms) something close to 2 out of 5 tests will say that the person is COVID negative when in fact they have the disease.

So let's work through this. Let's say that you are sick (or someone in your family is sick) and you decide to get a COVID test. You go through the process, get your brain tickled, then get the phone call later that evening. Good News. The result is negative.

But then you read this article. Are you actually COVID free? or did you just receive a false negative? What is the likelihood that you actually have COVID given that you received a negative test result?

Well, we can actually work this out quite easily.

Let's first utilize our general multiplication rule for events that are not mutually exclusive:

This says that the probability that both events A and B occur is the probability of A occurring multiplied by the probability of event B occurring, given that, A has already occured. To put this into a more straight forward sense:

Let suppose you have a cooler of beverages. 3 colas and 4 rootbeers. The probability you reach in (without looking) and pull out a cola and a rootbeer is a probability you pull out a cola and then the probability you pull out a rootbeer, given that, you have already pulled out a cola.

Now the important piece to remember is that this goes both ways.

The probability, P(Cola and Rootbeer) is identical to the opposite P(Rootbeer and Cola). that is, the order is not necessarily important, as long as we finish with one of each.

Thus we have that:

Why is this helpful to us?

Almost there - let's open this up, define (A) and (B), and then things should start to become clear.

Where if we set: (A) = "Covid" or (C) and (B) = "Negative test" or (N) we obtain:

From here we can work through what we know (or can find out) in order to answer our question.

What was our question again?

What is the likelihood that you actually have COVID given that you received a negative test result?

Right, that is, we are looking for P(C|N) so we need to determine values for all the other variables:

P(C)
P(N)
P(N|C)

So how do we go about solving all this?

Well, let's start out with P(C). Currently, there are just over 100k active cases in BC. Given a population of just over 5 million that puts you at about a 2.5% chance of having COVID. Yes, exposures, or area in which you live/work/play is going to impact this, but we don't have that, so let's keep it simple.

Let's then take a look at P(N). in total, just over 2 million tests have been done in BC (2 336 090), with just over 200K of those being a positive result (217 485) that is we can work out the P(+) to be 9.3%. Using our compliment rule, we can then obtain the P(N) to be 1-P(+) or 90.7%.

Finally, we need to determine the P(N|C) That is, what is the probability you get a negative test result given that you have COVID.

Hey, that is our false-negative rate as reported above. Let's start by using the 38% false negative. so summarizing all this we have:

P(C) = 0.025
P(N) = 0.907
P(N|C) = 0.38

We can then re-arrange our above formula to solve for P(C|N) and then make the appropriate substitutions.

Making our substitutions:

What does this mean? it means that given the low rates of actual COVID occurrence in the province and despite the relatively high rates of false negatives. the probability that you actually have COVID given that you received a False-negative is only 1%. That is reasonably low.

What about if you had a test on day 4 (Pre-symptoms) which when you think about it, is extremely quick for you to realize - notified of an exposure (X) days ago, feeling fine, but will book a test just the same:

Again, despite the alarmingly high false-negative rate. the probability that you actually have COVID given that you received the negative test result is exceptionally low (less than 2%)

Okay; so what is the moral of the story.

Articles like this come out - and they end up breeding fear and distrust of the tests - potentially leading to people not taking the tests at all. They may think the test is not worth it, the results are meaningless. This of course is problematic in determining caseloads, tracking spread, and combating and turning the tides for this disease.

Despite these high rates of false negatives. This is still an extremely useful tool and an extremely valuable resource to utilize in our fight against COVID.

Now - if infection rates were to skyrocket. suppose that instead there was an 80% chance that you had COVID. At that time we would have to question whether or not this test was effective, as the false-negative rate would be extremely problematic. But with our current "low" relative caseloads and infection rates, this large of a false-negative is not a huge concern.

Wednesday, 1 August 2018

Earnings by College Major

Source: http://digg.com/2018/college-degrees-highest-salaries-visualized

Trying to figure out what your major should be?

I'll just leave this here.

Interesting follow up though would be to see what the graduate to job opening ratio looks like in each field!

Thursday, 12 July 2018

The Impact of Changing House Prices on GDP in BC

Source: https://www.armstrongeconomics.com/markets-by-sector/real_estate/real-estate-in-decline/

Yesterday (11th of July 2018) the Bank of Canada continued to increase interest rates, as many expected.

Since the increase in the interest rate, the media coverage has been flooded with conversation around the impact this will have on homeowners. Specifically, it is well presented that Canadian households currently have a pile of debt and will have trouble continuing to service their debt if their payments or obligations increased. you can read a Bank of Canada article on the subject here.

Building off of these discussions, although quite separate, I began to wonder. Here in BC the Finance, Insurance and Real Estate (FIRE) industry make up essentially 25% of our provincial GDP.

As governments continue to engage in policies which aim to make housing more affordable (decrease or slow price growth) and as the Bank of Canada continues its upward movement of interest rates (decreasing the demand and supply of real estate); we have some serious headwinds on house price growth. The question of interest then: Given the size of the FIRE industry in BC, for some change in the house price, how does this filter through to impact our provincial level of output?

To answer this I conducted a simple time-series analysis which allowed me to jointly model both house prices (Teranet national bank composite house price index for BC) as well as the provincial GDP (Statistics Canada).

In order to ensure stationarity, these variables have a log-difference transformation applied to them, giving them the interpretation of the annual percent change. Each can be viewed independently below:

With these variables, I then apply a one standard deviation shock to the transformed House Price Index (HPI), which works out to be about a 4.6% point annual change in the index. Observing how this shock filters itself through both the HPI and GDP over time we see the impact of this shock. This impact is presented below.

First, evaluating the impact of a 4.6 percentage point shock to the House Price Index (HPI) on itself. What we witness is no big surprise, the housing price index jumps in the shocked year (year 1) and then slowly returns to it's normal. With a 95% Confidence level, this shock to house prices has been fully absorbed within 2 and a half years.

Recall we are dealing with growth rates here. Imagine the HPI is doing its thing, then, out of the blue, it jumps by 4.6 percentage points. the effect is an immediate increase in the index, followed by 2 and a half years of additional (but slowing) growth before returning to its pre-shock level of growth.

Looking at the impact of a shock to the HPI on GDP we witness an impact which was expected. Our shock happens in year one, however, this does not filter through to impact our level of GDP until the second year. At this point, the GDP jumps to an estimated increase of 2 percentage points (fairly large given average growth rates of GDP). However, this impact quickly subsides and is showing no statistical effect 2 and a half years after the shock.

Through this, we can determine the elasticity of GDP to the HPI (for some % change in HPI, what is the impact on the % change in GDP). Thus we can determine the elasticity of GDP to be 0.435, meaning that GDP is not overly sensitive to a change in the HPI, that is GDP would be inelastic. Just the same we can take this to mean that for a 1% point change in the HPI, we would expect to witness a 0.435% point change in the GDP.

So, if we do see a collapse of house prices, this may filter into a bad few years for the BC Economy. Keep in mind, in 2014 when oil prices collapsed causing Alberta's GDP to collapse, Oil and Gas (with support services) accounted for aproximately 8% of Alberta's GDP. Given BC's reliance on the FIRE industry (25% of GDP), a collapse in the price of real estate could very well have a major impact on our provincial economy.

What are your thoughts on this, feel free to comment below.

Thursday, 25 January 2018

Cost of red lights on Vancouver Island

I, like many residents of Victoria, frequently make the trip up island to visit the rugged beauty and outdoors of the north island. Like many, I find my self increasingly frustrated with the traffic lights on a highway.

For those not familiar with Vancouver Island. there is one main artery, highway, running from Victoria, north to Nanaimo. At Nanaimo, this highway (Hwy 1) enters the city to the ferry terminal before continuing across the strait in Vancouver. For those of us who want to continue further north than Nanaimo we more or less stay on the same highway, although it changes names to be Hwy 19.

Although this is a highway, stretching approximately 130 km from the CRD to Parksville and is the main route to travel North/South along the island, it is littered with traffic lights continually stopping traffic and creating congestion along the route.

Finally, in a recent trip, I watched to my horror as I was stopped an astounding 23 times during this stretch for a red light - I figured (as many people say) that I hit every single red light along the route.

Well, in fact, I did not, there are 42 traffic lights along the route between the Goldstream turn off (start of true Hwy at the edge of the city) and the Parksville turn off (where oddly enough, fewer people drive but they have done away with lights in favour of overpasses...). That is I was only stopped by about 50% of the lights.

Regardless this had me thinking about the social cost of these traffic lights due to idling and additional fuel usage through acceleration.

The first thing to determine was the average idling cost. Now there are many different types of vehicle on the road, so utilizing information from Natural Resources Canada to determine the % of vehicle by class (here) - as well as idling information from the US (here) I attempted to link idling information up to vehicle class and determine a weighted average of fuel used while idling. this worked out to a low 0.01884 L/minute. or 0.000314 L/Second

With a loose estimate of idling usage, at a cost of gas currently at $1.36/L this works out to a cost of:

$0.0256/minute or $0.000427/second ... does not seem too horrendous.

Next task then was to determine how much fuel is used every time we need to accelerate back up to speed. Casually googling this information yielded that acceleration can increase fuel consumption by 10-30% ... to pick the middle path, I chose to utilize an increase of 20%.

If acceleration causes an increase in fuel usage by 20%, I needed to figure out what base fuel usage is. utilizing the above date I worked out that the average fuel usage could be expected to be around 11.34 L/100km (remember we have everything from small sub-compacts to Semi trucks driving the road). this yields us an average cost of $15.42/100km

Assuming this 20% increase in fuel usage, and further assuming that it takes us a full Km to get back up to highway speeds from a full stop, this gives us a cost of accelerating at $0.1851 each time we have to stop.

So we have our cost of idling, we have our cost of accelerating. anecdotally I find that on average if I am stopped, I am stopped for at least 20 seconds. extrapolating our previously calculated amounts, this gives me an average cost per redlight of $0.19364 - Just under $0.20 each time we have to stop.

At this point, the whole task seems rather trivial. Even if I was stopped at every red light (42) that would only be an extra fuel cost to me of $8.13.

So turning to the ministry of transportation I collected their traffic volume data for this highway over the stretch of interest (here). I found that on average 26,728.39 vehicles are traveling this stretch on any given day. Now things begin to add up ... but clearly, they are not all stopped at all red lights!

Suppose that a driver has a 20% chance of being stopped by any given traffic light, we can use a binomial distribution to determine how many times these 26,728.39 cars are stopped over their travels.

We find based off this that on an average trip we will be stopped at 10.53 lights (which seems about right from my experience). Further - my event of being stopped 23 times does not even register as a likely outcome!

From this distribution, though we determine that on average on any given day vehicles on whole stop 281,478 times between over this distance.

That is, at an average cost of $0.19 per stop, we have a daily fuel cost of $55,102.04 ... no longer a trivial amount! Extrapolating this out for a full year (365 days) and we have an annual fuel cost resulting from these traffic lights as $20,111,244.60 - that is $20.11 Million dollars a year at present fuel prices.

Add on to this environmental impacts (from burning all this extra fuel) as well as extra transportation time for shipping companies etc. and these red lights turn out to be quite an expensive toll on society!

What are your thoughts on these traffic lights? feel free to comment below.

Update May 2018: the Gas price has increased from $1.36 to $1.55 due to this spike in gas prices - estimated annual cost due to traffic lights has increased to $22,922,044.

Thursday, 20 July 2017

Changes to enrollment priorities within SD 61.

Image Source: https://www.sd61.bc.ca/

A while back now, early June 2017, the Greater Victoria School District (SD 61) released a press release on the results of a survey they had sent out, as well as the resulting policy changes, a change in enrollment priorities within SD 61.

At the time I had a few parents bring this report to my attention, asking me to look into the survey results, as they felt that the released results were misleading, or rather that the proposed policy changes did not appear to line up with the opinions of most parents (in their view).

Initially, my first thought was "Everything is probably legitimate and it's great to see some evidence based decision making on the part of the school board" ... but within minutes of reading the press release, I began to wonder if this press release was the result of a rushed ad hoc job on the part of the school district, or if they were purposely trying to twist the evidence to support a pre-determined policy. Let's hope that it was the former and nothing sinister is at work here.

The full press release can be found here for those interested.

Early in the results, SD 61 claims that their survey had a 70% response rate. (starting page 12 of this document)

3450 to the parent survey
418 to the student survey

Yielding a total of 3868 respondents. 70% response rate, pretty good! (mind you no back ground as to the methodology as to how these surveys were distributed).

The next bit is the geographical distribution of the respondents to the parent survey, I have created a little table (below) to demonstrate the breakdown including the relative frequencies as well as comparing these relative frequencies to the relative frequency of the population in each municipality and school aged children. Keep in mind, given limited access to data (and time on my part ... this is only a blog!) we are looking at different years for each of these points -- but we should expect the relative frequencies to stay relatively constant over this short time period:

The first thing we should notice is that despite the earlier claim of having 3450 parent respondents, we only have a geographical break down of 3168 surveys, meaning 282 (8.2%) of the respondents were not included or dropped for what ever reason.

The second thing we should notice is the large discrepancy between the relative frequencies of the municipalities - specifically, Saanich is grossly under represented while Oak Bay is grossly over represented. (The two yellow rows are identified as such as there are no true relative frequencies to compare these too).

Here is a bar chart of the above table -- because pictures are nice too.

So, page one (truthfully page 13 if you're following along) of the report so far - some questions raised, but perhaps nothing too misleading so far -- Let's explore the reported results of the survey.

The first thing to note is that these reported results are only from the parent survey, the details of the student survey are not presented. Again pay attention to the numbers here.

The first question is along the lines of enrollment priorities. specifically, should siblings have priority (in order to keep siblings together at same school) or should catchment students have priority? meaning siblings may be split between schools.

The results (as presented in the report) are as follows:

Based off of this - it seems as if a strong majority of respondents (almost 61%) support catchment school enrollment over keeping siblings together. But pay attention to the total respondents ... only 2971. Turns out 469 respondents "skipped" this question for one reason or another.

The big question then - is it important that 469 respondents skipped this question? should these 469 respondents be dropped from the results? or should we include these 469 as perhaps a "no opinion" category? Let's include these skipped responses and see how our relative frequencies change.

If we include these skipped responses -- now only 52.65% of respondents support catchment over siblings ... not such a loud statement anymore! Additionally, with the 469 skipped included, we only have 3450 respondents. What happened to the other 10 respondents?

Sadly as we move through the other responses we see a similar trend. we have 469 respondents who decided to skip a question, and were just completely omitted as a result.

We see the same situation here - The question being, who should have class priority each year, non-catchment returning students or new catchment students -- the reported results (omitting the skipped) say that almost 58% support returning non-catchment students. If we include the 469 who skipped the question, less than 50% support returning non-catchment students having priority, a sudden change in results if we are interested in majority support.

Again, total responses are only 3440, not the 3450 stated as total respondents. 10 are still missing.

The final reported question is asking about what should happen when students finish at their current school and are set to transfer to their middle or high school. Specifically, should out of catchment students simply follow the school path (certain elementary schools feed into specific middle schools etc.) or should these students be required to go to their catchment school, and have to apply to follow the school path.

The interesting part of this question is that we only have 2914 responses, with no information about skipped responses or any hint about the other 536 omitted respondents. Do we assume that 536 (15.5%) of respondents skipped this question? Or did these 536 spoil their response, circling both answers? unfortunately, we have no insight, only that 2914 answered this question as opposed to 2971 which answered the prior two.

Sadly, it is reports and lazy statistics like this which are supporting government policy. The school board has already met and revised enrollment priorities based off of these results.

The biggest question I have is this. If there are so many errors in this 'polished' press release, can we trust any of the methodology or process used in determining any of these results? remember, all that is being presented in the above is summary statistics - we have to trust that the individuals who put this report together properly sorted, compiled and calculated these statistics properly. Given the issues, I have just raised - I have my doubts.

What are your thoughts on this? feel free to comment below.

Tuesday, 9 May 2017

A short TED talk I found interesting

first, you may have noticed that posts have slowed down as of late. For those who have been regularly reading and checking back - this is not a new long-run trend. I have found myself in a busy semester, lacking the time to regularly develop and post new articles. That being said, expect a few short posts, like this, and then full ramp up again once we hit mid-June, July.

Onto the video:
Mona Chalabi: 3 ways to spot a bad statistic
https://go.ted.com/CyJb

I found this particularly relevant given all the media attention and criticism (even rejection) of statistics and data/evidence-based decision making.

Some alarming discussion along these lines can be found with a quick google search, and with the recent deletion of some EPA data sets in the United States (Climate change can't be happening if we can't prove it?):

http://www.latimes.com/business/hiltzik/la-fi-hiltzik-epa-climate-20170501-story.html

What are your thoughts on these? feel free to comment below.

Left of Right.

Search This Blog

Labels