Monday, August 5, 2013

Geometry in Motion

On a visit to the Exploratorium this weekend, we were hanging out near the harmonograph, watching kids make really cool pictures, when the Cinema Arts Saturday Cinema program started just across the way. Since the program for this week was titled Geometry in Motion, I told the people I was with that I was going to watch. We all enjoyed the program, and each had a different favorite film.  Since most of the five films shown are available to watch for free online, I thought I'd share them with you. 

The first was Symmetry from the 1961 IBM Mathematics Peep Shows by Ray and Charles Eames. This one is available in a free iPad app from IBM, but unfortunately I haven't been able to find it elsewhere. It is a very short introduction to formal definitions of symmetry, defining the degree of symmetries of an object as the number of different ways it can be placed in a box that fits it exactly. This made me really want a dodecahedron box, and a dodecahedron to fit in it exactly! I'm not sure I have the construction skills and equipment to make that happen, though. I was also entertained by the cameo appearance of group multiplication tables at the end of this very short film; the reference without explanation felt similar to the jokes in kids' movies that are really aimed at parents. 

The second film, Aesthetic Species Maps, is available to watch online from its creator, David C. Montgomery. This film set the tone for the rest of the program, which consisted of basically wordless art films with strong ties to geometry. Each segment was assembled from still images taken of many specimens of the same type of plant of animal, in approximately the same orientation. For me the first segment was the most interesting, because you could clearly see variation of the degree of symmetry between specimens. 

The next film we saw I can't find anywhere online. It was Eights by Seth Olitzky, and it used computer animation to play with symmetrical figures in a way reminiscent of kaleidoscopes or screensavers. I stared at a lot of screensavers around the time the film was made, though, and it was more interesting than any of them. 

After that we watched a more modern take on computer animated geometry shorts, Nature by Numbers. Etérea Estudios, the creator of this film, has a webpage for it with not just the movie, but also still images from it and a page explaining the mathematics behind it. I'd seen a lot of the material towards the beginning of the video before, whether as the countless fibonacci spirals that had filled my notebooks since we learned to make them in 6th grade, or more recently in this series of videos about being a plant by Vi Hart. At the end, though, I had a big surprise, as a Voronoi tessellation turned a grid of sunflower seeds into a dragonfly's wing. This was my favorite film, so I'll embed it here, too, though if it makes you curious about things, Etérea's website is a great place to start.



The final film, Rectangle & Rectangles by René Jodoin, is available from the National Film Board of Canada. This one didn't directly explore a mathematical concept as much as some of the other, but it was one of the more fascinating films to watch and is even better to watch online.  It uses strobe effects and a variety of growing and shrinking rectangles to create an interesting and somewhat confusing visual experience. The Exploratorium loves to create a "Why is this happening?" moment with its exhibits, so I can definitely see why they would show this film! In the theater, I did a lot of fast blinking to try to understand the technique behind the visual effects. I'd love to have a frame-by-frame replay option for this, but the freedom to pause and rewind at will makes the streaming version nearly as good. 

Thus ended the film series, and thus ends this post, although we saw plenty of other fascinating things at the museum. I hope you enjoyed these films, and I hope you get a chance to visit the Exploratorium and see some of the other stuff for yourself.


Friday, July 19, 2013

More on statistical significance and small samples

In yesterday's post, I pointed out that for small sample sizes and cases where success is unlikely, standard tests of statistical significance can use even one observation of "success" to reject the null hypothesis. This is jarring, since statistical significance sounds important and official, like it should be more rigorous than our intuitions about what the data says. So what's going on here, and when do we have to worry about it?

We'll work through this with an example.  Suppose we conduct a poll of five wizards and five muggles, and all the wizards and four of the muggles eat at least one piece of chocolate per day, while one of the muggles is on a diet and eats no chocolate at all.

Some Definitions, Applied 

The null hypothesis is the hypothesis that every observation is being drawn from the same distribution, or that the treatment group has the same distribution as the control group. In this case, the null hypothesis might say "Wizards and muggles eat the same amount of chocolate,"  or perhaps "The same proportion of wizards and muggles eat (or don't eat) chocolate." It's good to be precise about what you are measuring; we'll use the second formulation this time.

In the case of our survey about chocolate, we might be tempted to conclude that the null hypothesis is wrong, because more of the wizards eat chocolate.  But first we should decide whether we've just come to the conclusion by chance-- after all, we've only talked to ten people total.  This is where statistical significance comes in.

To determine whether our results are statistically significant, we first must decide how willing we are to reject the null hypothesis when it is actually true. That is, suppose that the Actual Real Truth is that the same percentage of wizards and muggles abstain entirely from chocolate. How willing are we to conclude that they don't? It's common to accept a 5% or 1% chance of rejecting the null hypothesis wrongly, though some disciplines are okay with even a 10% chance. Whatever chance we accept, we'll be looking for statistical significance "at that level", for instance, statistical significance at the 5% level, otherwise known as statistical significance with p < .05.

Checking for Statistical Significance (Through Simulation)

To show that our results are statistically significant at the 5% level, we have to show that if the null hypothesis is true, we would expect to get results as extreme as ours less than 5% of the time, if we repeated our experiment many times.  To do this, we don't actually repeat the experiment many times-- remember, we don't know if the null hypothesis is true in the real world. Instead, we can simulate repeating it many times in a computer world where the null hypothesis is true, or we can use analytical methods to find out what would be the results of carrying out such simulations.

We use what we know in order to make the null hypothesis more precise so that we can carry out our simulations. The null hypothesis that we started with in our example just said that wizards and muggles are equally likely to eat chocolate, but now we have data that says that 9 of 10 people surveyed consume chocolate. So the best we can do is to assume that the null hypothesis is true, and 90% of people eat at least one piece of chocolate per day and 10% eat no chocolate at all. Since the null hypothesis says there is no consumption difference between wizards and muggles, in our simulation these figures are the same for both groups.

One way to simulate this situation is simply to erase the wizard/muggle data from our observations and replace it randomly, so that a random five observations are from "wizards" and  the other five are from "muggles". If we simulate this way (also known as sampling without replacement) then every time either the wizards or the muggles appear to eat more chocolate, to an extent exactly as extreme as was seen in our original sample, because the one abstainer is always either a wizard or a muggle.  We would conclude that our findings are not statistically significant.

Another way to simulate it is to sample with replacement. This time, we will make five "wizard" observations and five "muggle" observations, each with a 90% chance of eating chocolate and a 10% chance of abstaining.  We will get some cases when exactly one of our observations is an abstainer, like in the original sample, and others where none are, or where several people abstain.  The situation is more complicated than the previous case, and much of the time we do get samples showing exactly the same consumption of chocolate for wizards and muggles, but over half the time, we don't. Again, our single observation difference is not considered statistically significant. This should agree with our intuition that we need a larger sample size if we want to find a difference between the populations that was not immediately obvious.
Randomly sampling with replacement 10,000 times, the most common situation is for wizards and muggles to have the same number of abstainers in our sample, but this case makes up fewer than half of our samples. 

Tweaking the Example

Things get more complicated if we have much more data about one population than the other. For instance, suppose the five wizards we surveyed were all the wizards in the world. Then it is inappropriate to suppose that we could have sampled from a population that includes wizards who do not eat chocolate; all the wizards do eat chocolate. And one of the muggles doesn't. There is no way the null hypothesis could be true now.  This is not just statistical significance, which says that if the null hypothesis is true our results are unlikely. Our study actually disproved the null hypothesis, which is much stronger. (And, practically speaking, almost never the case.)

The example above is a special case of a more general situation in which much more is known about the distribution of one population than about the other. Another example of such a situation is the case in which our study was carried out by wizard researchers who knew very few muggles, so that they hadn't surveyed 5 wizards and 5 muggles, but 5000 wizards and 5 muggles. Let's say they found 50 wizards who did not eat chocolate and, as before, 1 muggle who did not eat chocolate. They can do either of our tests above; let's see what happens.

If they sample without replacement, they're looking at 5005 total observations, of which 51 are chocolate-abstainers. Randomly assigning five of these observations to be "muggles" 10,000 times, in 9,511 cases I got no muggles who were chocolate abstainers. This is (just) over 95% of my samples, so in fewer than 5% of cases, I got a result as extreme as the wizard researchers' original result.[1] From one muggle chocolate-abstainer in their sample, they could conclude statistical significance at the 5% level.

Sampling with replacement is technically easier and gives similar results. In 9,526 of my 10,000 repetitions, no muggles were chocolate abstainers. Once again, this would allow the researchers to conclude a statistically significant difference at the 5% level based on the one muggle chocolate abstainer in their original sample.  This doesn't agree so well with our intuitions; although the researchers now have a sample of wizards that seems large enough to determine their chocolate consumption habits with some precision, the sample of muggles still feels awfully small for most purposes.  

A Broader View

By having a lot of information about the proportion of wizards who eat chocolate, the researchers in the last example are able to use very little information about the proportion of muggles who do to conclude that the two populations are different. This isn't all bad, and sometimes such techniques are necessary. But their results, even though statistically significant by both methods described, aren't as solid as that makes them sound. The small size of their sample of muggles makes it especially easy for them to accidentally get a sample that is not representative of the total muggle population by bad sampling methodology or for some other reason.  Imagine polling the next five people you see about some issue, and then imagine polling the next 1000 people you see. Both these polls use the same method to get a non-random sample of the population of your area, but the first poll will likely have respondents that are much less diverse, and a much less representative sample, simply because there are so few of them.  

This example of researchers who have an easier time finding subjects in one of the groups they're studying than another is not purely fictional; there are many reasons it may occur in reality. An intervention might be much more expensive or difficult than the appropriate control procedure and than the follow-up data collection, so that it is easier to fund or conduct a study with a large control group and small treatment group than with two groups of equal sizes. Researchers might be trying to compare a group that is a small fraction of the population with a group that is a much larger fraction of the population.  In the case of the study I was working on in yesterday's post, with our original study design we expected to have trouble finding the targets of our intervention in order to follow up with them and therefore expected to survey a much larger number of people in the control group (in that case, the population from which we had originally drawn targets for our intervention) than from the treatment group.  

In any case, if a study design allows for statistical inferences to be drawn from a small number of surprising observations in the treatment group, caution is warranted.  Use common sense as a back-up check on the meaningfulness of statistical significance, just as you use statistical significance as a back-up check on the meaningfulness of your intuitive reaction to the data.

Code used in this post is available on github.

[1]: In this case, any result but the most likely result. Normally we'd need to check both for cases where the muggle group has too many abstainers and for cases where it has too few, but since zero is the most likely number for it to have under the null hypothesis, in this case we do not have to be concerned with there being too few.



Thursday, July 18, 2013

Statistical power at small sample sizes

Recently I was working on a team to design a study where we expected to find relatively few examples of the phenomenon we were studying. While it wasn't out of the question that we'd find an example in the control population, we expected to find very few examples there, and only a few more in the treatment group. I made a chart to help us understand how large a sample size we'd need to decide our intervention was making a difference, and it looked something like this:
Each line on this chart shows the probability that we would find our treatment had an effect significant at a given level, depending on the size of our treatment group. But why are there sudden drops in the probability that we'd find a statistically significant effect? And why, for so many sample sizes, does it not appear to matter whether we're seeking an effect significant at the 5% or 10% level? 

The answer is that this is the wrong chart to draw, in this situation. If we expected only 1 in 2000 observations from the control group to be a success, and 1 in 500 observations from the treatment group to be a success, a simple application of the definition of significance levels gives the graph above. But besides looking weird, it is misleading. Notice where it implies that we'd have about a 30% chance of concluding our treatment works, with a sample size of about 225? Well, that's the probability we'd get at least one success in 225 trials.
This chart, where each line shows the probability of getting at minimum a given number of successes, depending on the size of the treatment group, is much easier to read, and much more illuminating. Even if a single success in the treatment group would technically be statistically significant, our study would be much more persuasive if we chose a sample size that allowed us to expect to find two or three successes, at a minimum. 

Overlaid, the charts are even more useful. They show the probability of getting a certain number of successes at a given sample size, together with the level of statistical significance that would imply. And they answer our questions about the first chart-- the drops come when a particular level of statistical significance requires us to find an additional success, and the times when two levels of significance are equally likely are times when they are requiring us to find equal numbers of successes.


Code used in this post is available on github.

Friday, June 21, 2013

Why Calculus?

A much delayed sequel to the previous post...


My investigation of the prominence of calculus in mathematics education starts in the American Mathematical Monthly in the 1950s. I learned from Klein[1] that in the 1950s the mathematics curriculum in the U.S. changed to include calculus in the high school curriculum, at least for some students. I chose to start looking for primary sources with the Monthly because I am familiar with the journal, and because it seemed likely to have some articles dealing with mathematics teaching at the college level, and possibly at the high school level.

Benjamin Finkel founded the American Mathematical Monthly in 1894 as a journal aimed primarily at teachers of mathematics. The Monthly has been in constant (though not quite monthly) operation ever since, but it soon shifted focus to collegiate mathematics.[2] Most articles in it, both now and in the 1950s, discuss mathematics itself, either by way of exposition or as presentations of original research and novel proofs. However, articles discussing mathematics education, the role of mathematics in society or the liberal arts, biographical notes, and descriptions of conferences and meeting are not uncommon, particularly in earlier volumes. In the 1950s, the Notes section of the Monthly had a subsection for Mathematical Notes and another for Classroom Notes, with the latter often discussing approaches to teaching particular topics but sometimes giving a more general description of a particular program or course.

I looked at volumes 57-63 of the Monthly, which were published between 1950 and 1956, inclusive. The articles focused on mathematics education mostly focused on mathematics education at the college level, with particular interests in education of future primary and secondary school teachers and whether colleges ought to run mathematics courses for first year students not planning to do further work in mathematics or science (and what such a course should include). Several articles addressed perceived defects in mathematics teaching in primary and secondary school, mostly as a problem which colleges should address by better preparing teachers of mathematics at those levels. Two articles in particular called for specific changes in the high school curriculum.

"Mathematics in School and College," excerpted from the 1952 book General Education in School and College, appeared in the Monthly in June-July 1953. The book was the result of a study conducted by Andover, Exeter, Harvard, Lawrenceville, Princeton, and Yale, and provided both a survey of the present curricula of the institutions involved and recommendations for modifications that would allow the preparatory schools involved to better prepare their students to take advantage of college instruction.

The study concluded that the school mathematics curriculum was ripe for change. The contemporary curriculum consisted of, for most students, two years of algebra, one of plane geometry, and one of trigonometry and solid geometry. The fourth year was optional, and some students in some schools ("less than one in five") progressed more quickly and finished high school with a year of calculus. The committee conducting the study believed that the basic direction of this curriculum was appropriate, but that it included several inessential topics, and some topics which were studied at too great a length. After a few snarky remarks[3], they recommended for "condensation or omission" the topics of solid geometry (the "greatest single offender"), complex numbers, determinants, the geometry of the circle, and logarithmic solution of triangles. With the time saved, the committee suggested that students should learn calculus, statistics, or both.

The committee had many reasons to recommend that schools expand their calculus program. Compared to solid geometry and the logarithmic solution of triangles, calculus is an inspiring subject; the study claims that "the student who has once grasped the meaning of differentiation and integration sees the world after in a larger and more significant way." For scientists and engineers, calculus is a fundamental tool, and even some students who would not become scientists were expected to take physics courses in college, which could be improved if the students had some prior exposure to calculus. The argument also had the weight of tradition; calculus was "the standard freshman course in the best of our colleges" and so the logical course to follow the rest of the high school mathematics program. Finally, some schools already had small calculus programs, and so certainly had qualified teachers.

The committee found the case for teaching statistics in some respects stronger than the case for teaching calculus, citing that statistical notions "are among the fundamentals of modern social measurements" and that "an awareness of the real meaning of these notions is a protection to the consumer as well as a necessity for the producer of information." Furthermore, statistical reasoning teaches students "that mathematics has uncertainty and that uncertainty can be mathematically treated." However, time constraints and concerns about teacher preparation led the study to conclude that most schools "should move toward a curriculum in which the basic 12th grade course is the introduction to the calculus."

"Mathematics in the Secondary Schools for the Exceptional Student" appeared in the Monthly in May 1954. This article discussed the results of what appears to have been a follow-up study to the one discussed in the previous article. The new study was conducted by twelve colleges and twenty-seven high schools. The study was the School and College Study of Admission with Advanced Standing, intended to provide a curriculum which schools could offer for advanced students who would then be able to receive college credit for courses which followed the curriculum.

This second study suggested an integrated math curriculum for the first three years of high school, after which the exceptional students for which the program was designed would be ready for "a substantial introduction to differential and integral calculus, with enough applications to bring out the meaning and to illustrate the fundamental importance of this subject." By passing a standardized exam on this course, they would then be able to receive college credit. The program for the last year is still familiar, and for good reason-- this study was the pilot program for what is now the Advanced Placement program.

Although the curriculum proposed in the study was designed for especially bright or driven students, the committee noted that the material covered in the first three years would be suitable for most high school students. They suggested that schools either run courses covering the same material more slowly for those students not aiming to take the calculus course or provide them with alternative courses for the fourth year of study, such as statistics or solid geometry. The committee did not recommend any means of granting college credit for some courses, however. In the case of solid geometry, this was probably because the course was already common at the high school level and not necessary to progress through the college curriculum, and in the case of statistics the concerns from the previous study about the number of qualified teachers presumably indicated that demand for such a mechanism would be low.[4]

From here, it looks like following the emphasis on calculus as a final math course in high schools would be much like tracking the development of the Advanced Placement program as a whole. I won't be doing that study any time soon, though, since I am no longer regularly in a research library. Expect my next post to be on something completely different!

[1]: http://www.csun.edu/~vcmth00m/AHistory.html
[2]: http://www.maa.org/pubs/ammhistory.html
[3]: My favorite is, "Of course it is possible to design problems of bewildering complexity in every subject from long division to trigonometry; it is also a waste of time."
[4]: Wikipedia indicates that the AP Statistics exam was first administered in 1996.

Euclidean construction paintings

I've enjoyed straightedge-and-compass constructions since high school—I remember that high school geometry was my favorite math class up...