Wednesday, February 8, 2012

2011 Exam Results


If you sat for an upper-level CAS exam in 2011, the probability that you’ll be sitting for that exam again in 2012 is 73.1%. Out of 3,195 exams administered in 2011, only 858 met the Exam Committee’s standards for a minimum qualified candidate. With the exception of the half exams 5A and 5B, every exam’s pass ratio was below the 12-year average (2000-2011). Any way you slice it, these results are frustrating, but are they as surprising as they appear at first glance?

The Exam Committee recently published commentary on the unusually poor 2011 pass ratios, which can be found here. I was somewhat surprised by what the Exam Committee chose to say. In my view, there are two major questions to be answered:

1.       Are 2011 pass ratios really as unusual as they appear? Are they consistent with prior years or are they outliers?
2.       If the 2011 pass ratios are significantly lower than usual, what is driving the change?

The article by the Exam Committee doesn’t really address either of these questions, except insofar as they declare that they consider all of the 2011 exam results “outliers” and discuss grading and pass-mark-setting methods as possible drivers. In other words, they’re on the defensive. Their article is mostly a justification of their grading policy rather than a considered analysis of these seemingly unusual results.

The article strikes me as statistically a bit sketchy, as well. For instance why even bother to address the pass rate on Exam 7 when only nine people sat for it? Compared to a typical year that has around 1,000 exam-takers, it’s hardly a surprise that results from a population of nine appear anomalous. Throughout the article, quantification is generally lacking.

To satisfy my own curiosity, then, I crunched some numbers. I looked at the number of exams taken, number of exams passed, raw pass ratio, and effective pass ratio from 2000 through 2011 for each exam. I mapped new Exam 8 to old Exam 9, new Exam 9 to old Exam 8, new Exam 6-US/CAN to old Exam 7-US/CAN, new Exam 7 to old Exam 6, and new Exam 5 to old Exam 5. I also mapped new Exam 5A (the ratemaking half of Exam 5) to old Exam 5 for comparison purposes. Unless otherwise specified, I included all years 2000 through 2011 in the results discussed below, with the exception of new Exam 7 for which credible data is not available due to the extremely low number of test-takers in 2011.

Eventually I will put all of the metrics I looked into on this site, but for now here are the most interesting highlights:



All full-length exams had pass ratios below the average.

2011 was objectively a below-average year, with the notable exception of the two half-exams, 5A (ratemaking) and 5B (reserving), which, compared to the old Exam 5 and the old Exam 6, respectively, showed extraordinarily high pass rates.

Three of the full-length exam pass ratios are statistical outliers.

The pass ratios for new Exams 8, 6-US, and 5 are the only ones that fall outside of two standard deviations of the mean. Using Chauvenet’s criterion, these three also qualify as statistical outliers. Notably the results for the half-exams 5A and 5B are also statistical outliers, but on the high end.

New Exam 6-US results were the worst.

Excluding Exam 7, the pass ratio for Exam 6-US was the worst, nearly 3 standard deviations below the mean.

Other Considerations

While the general education structure did not radically change from 2000 to 2010, the CAS did make a significant change to their grading policy starting in 2006. From 2000 to 2005, the Exam Committee targeted a specific pass ratio when setting exam pass marks. From 2006 on, however, they revised this policy in an effort to pass all candidates who demonstrated sufficient knowledge of the material and fail all those who did not. One would expect, therefore, to see greater variability in pass ratios after 2005.

In comparing the actual standard deviations for each exam for years 2000-2005 and 2006-2010 (excluding 2011 due to the radical change in exam material and structure), I found that variance did in fact increase in most cases. Old Exams 5 and 8 actually have shown less variation in pass ratio since the grading policy change. Old Exam 9 showed the greatest change, with the standard deviation increasing 120.8%.

Since there is a significant difference in variability after 2005, I applied Chauvenet’s criterion to the years 2006-2011 in order to test whether 2011 would still be considered an outlier. The 2011 results for Exam 6-US remained an outlier as did the results of the full Exam 5. All other results, however, do not qualify as outliers.

Data Analysis Summary

While the results of the 2011 sittings are irregular, once we account for uncredible data (new Exam 7) and the change in grading policy in 2006, the label “outlier” becomes questionable for most exams. 2011 had the lowest number and percentage of passing candidates in the last twelve years, but that in and of itself is not enough to make it a statistical outlier. After all, some year has to be the lowest (and let’s hope 2011 was it.) In my opinion, the only exam that looks like a real outlier is Exam 6-US.

Primary Drivers

Now for the second key question: what factors drove the unusual exam results in 2011? First we need to consider, a priori, what we expected to see in 2011. Three factors stand out in my mind as most relevant: who is taking the exam, how much the exam has changed from prior exams, and how long the exam is.

In the section on Exam 5, the Exam Committee’s article uses the big difference in candidate performance between the full exam and the two half exams to support the thesis that overall performance varies widely between groups of candidates. Personally I don’t see why a belief that candidate quality would vary significantly over time is controversial. Consider for instance that the number of candidates has increased over time, and average quality of an elite group nearly always decreases as group size increases.

We could test this theory If we could assign candidates to “classes.” Unfortunately I don’t have sufficient data to explore this question properly. I did, however, look at the pass ratios by exam by year to see whether performance tended to move in the same direction for all exams in a given year. They do appear to be correlated.

I’ve heard varying opinions as to whether having previously taken a particular exam makes you more or less likely to succeed on subsequent attempts. I’ve never seen data to support one view or the other, but my guess is that a candidate who fails on her first attempt at one exam is likely to fail on her first attempt at another, and that her probability of passing on the second attempt is about the same as the general population.

Finally, while I don’t have data by “attempt number,” we can deduce some useful information about the 2011 exam-takers due to the change in examination structure. Here’s what we can guess:

Exam 5 (full exam) – This is either your first time taking an upper-level exam, or you previously failed old Exam 6 (and never took old Exam 5), or you previously failed both old Exam 6 and old Exam 5. Either way, your performance is likely to be below-average, especially since the repeat-sitting advantage is diminished when the exam changes significantly, as Exam 5 did. In addition this exam is four hours long. Expect poorer than average results.

Exam 5A (ratemaking half) – You previously passed old Exam 6, and either previously failed old Exam 5, or never took it before. Either way, you’ve previously succeeded on an upper-level exam. The exam has changed from the prior, but it’s only two hours long. Expect better than average results.

Exam 5B (reserving half) – You previously passed old Exam 5, but either failed or didn’t sit for old Exam 6. Since 5B is essentially a much easier version of old Exam 6, even failing 6 before would likely be an advantage. It’s also only two hours long. Expect better than average results.

Exam 6-US/CAN – This could be your first upper-level exam, or you failed old Exam 7-US/CAN. The syllabus changed significantly, as it has several times in the past, but otherwise there is no particular reason to consider it more difficult than usual or the candidates any less prepared. The exam is four hours long. Expect average results.

Exam 7 – The nine candidates who sat for this exam had to have previously failed (or didn’t take) old Exam 6 and are either skipping Exam 5B or are attempting to take both Exam 7 and Exam 5B in the same sitting. The latter is much more likely. Tackling six hours of exam (Exam 7 is four hours long) in a single sitting is probably not the best idea. Additionally the syllabus changed significantly. Expect poorer than average results.

Exam 8 – Either this is your first attempt or it’s been 1.5 years or more since you sat for old Exam 9, reducing the advantage of a repeat-sitting. The exam is shorter than in the past at three hours long and contains mostly the same material, which should make it easier than some of the others. [CORRECTION: I was mistaken about the order of the exams under the old structure. It would have been just one year between sittings, not 1.5, and therefore you would expect average results, not poor results.] Expect poorer than average or average results.

Exam 9 – Either this is your first attempt or it’s been just six months since you sat for old Exam 8. This could be an advantage, and in any case isn’t likely to be a detriment. The exam is also shorter now at three hours long and contains roughly the same material. [CORRECTION: I was mistaken about the order of the exams under the old structure. It would have been one year between sittings, six months, and therefore you would expect average results, not better-than-average results.] Expect average or better results.

Expectation versus Reality

Comparing the actual results to our predictions makes some things clear and others more mysterious. The unusually high pass ratios for Exams 5A and 5B make intuitive sense, as do the low pass ratios for Exam 5 and Exam 7. But Exam 6, the biggest outlier with the lowest pass ratio, wasn’t expected to change all that much. Exams 8 and 9 also had lower pass ratios than one might have expected, even if they aren’t clear outliers. What’s going on?

I can only offer three theories:

1. The CAS efforts to reduce travel time by offering lower exams more frequently has resulted in candidates sitting for upper exams with less work experience under their belts. Personally I’ve often wondered how I could possibly learn everything I need to know for an upper-level exam if the material were 100% new to me. More work experience not only gives you more exposure to the concepts that turn up on exams, it also gives you anecdotes to draw upon when answering exam questions. This is a significant, perhaps indispensible, advantage.

2. Candidates have come to rely almost exclusively on secondary materials created by exam prep courses rather than the actual exam papers. Because there are relatively few courses to choose from, many candidates are taking the same course and will therefore have similar strengths and weaknesses. If a popular prep course’s curriculum turns out to be inadequate, more candidates will fail than would if everyone studied independently. This could explain the poor results for Exam 6, as I considered the accounting materials from TIA, a popular prep course, woefully inadequate.

3. The upper exams are getting harder. They’ve been telling us this was coming for a long time. Maybe it’s finally here. The days of straight-forward questions and regurgitation of lists is behind us. Only that doesn’t mean you get to do less memorizing. On the contrary, you have to do everything you did before and then a little more to answer today’s more conceptually challenging, application-based exam questions.

The only other significant change I'm aware of in terms of the exams themselves is the new method used to determine exam length and difficulty. An article appeared in the most recent Future Fellows describing the new practice, which you can find here. In short, recent Fellows who performed extremely well on their exams are being used to testdrive exams. Given the considerable and consistent complaints about exam length these past two sittings, it's possible that these former candidates are harsher judges than the subject matter experts who used to be the primary determiners of exam length.

What to Do about It

If you moved quickly through the lower exams, it will help to recognize that you’re at a disadvantage in terms of work experience. It should also help to remember that the upper exams are not like the lower exams. They require more independent work. I can’t speak for Exams 8 and 9, but for 5 through 7 I found that reading the actual papers – several times – was key. You can’t rely on someone else’s interpretation of the syllabus or the readings, not when the stakes are so high and the exams keep changing. Unfortunately, though, that’s really it. There’s no magic trick that will guarantee you pass every time.

Take advantage of what influence you do have with the Exam Committee, whether it’s through your friendly neighborhood Candidate Representative or exam surveys or other means. Be sure to be professional, however. Even a valid point will be ignored if presented inappropriately. The CAS keeps a close eye on Actuarial Outpost and other forums, so be mindful of that as well.

If you find this conclusion unsatisfying, we’re in perfect agreement. It really amounts to “Well, that’s how it is,” Just keep swimming,” etc. Still, knowledge is power, even when it’s knowledge that you’re powerless, right? Right.

Best of luck to everyone! Here’s to a better 2012.

References:
Data on historical pass ratios by exam: http://casact.org/admissions/examstatsum.pdf
Exam Committee commentary on 2011 pass ratios: http://casact.org/admissions/roth_PassRatio.pdf
Exam Committee commentary on pass ratios for full versus partial exams: http://www.casact.org/cms/index.cfm?fa=viewArticle&articleID=1611

Also a note of caution when looking for answers on the CAS web site: I came across a number of articles related to exam policies on the site that were clearly outdated. Pay attention to the date material was published. When it comes to exam questions, I wouldn't give too much credence to anything published before 2006. Note also that the idea of a target pass ratio of 40% and a target pass mark of 70% comes from a wish expressed by the Board at one point, not  from the Exam Committee. It is not the official practice of the Exam Committee to target 40% of exam-takers scoring 70% or higher on each exam.

2 comments:

  1. Very insightful! I gotta say wow didn't know one can do so much with some boring passmark data.
    At the same time I'd better start studying very hard for exam 6,7,8.

    ReplyDelete
  2. I'm convinced it's less about studying hard than about studying smart. There was always something different about the upper-level exams; Bloom's has only served to augment that difference. Hopefully I'll soon write down my thoughts on exam-taking in a post-Bloom's Society.

    ReplyDelete