Change the data

Patient: "Doctor, it hurts when I do this."

Doctor: "So, don't do that."

We've been spending time on State Space Search. It's a great topic. We deal with or at least introduce:

  • Recursion
  • Blind search
  • Heuristic search
  • foreshadowing things like A* and Dijkstra's algorithm.

and more. Today, however. I want to talk about something else.

We started by developing a maze solver. It reads a text file representing the maze and then proceeds to find an exit. One version of the source code can be found here.

It's really cool to see how such a short program, about 10 lines of work code, can solve such an open sounding problem. From there we talk about state spaces, graphs, etc. We then moved on to the Knight's tour. By viewing it as a state space problem we can look at it just like the maze.

We represented a state as a board with the knight's current position and where it's been. An easy way to do this is to use an array of ints. So we have an empty 5x5 board:

0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

Or a board after a few moves:

1 0 0 0 0
4 0 2 0 0
0 0 5 0 0
0 3 0 0 6
0 0 0 0 0

The kids saw three base cases:

  1. When our count got up to n^2 (and in fact, we're done)
  2. When we land on a non-zero space (when we just return or backtrack)
  3. When we try to move off the board, for an index out of bounds error.

I wanted to look at that third one. We talked for a bit about using an if or a try/catch but I pointed out that I didn't like either. Looking at our maze code, we never checked bounds there. Why not. Well it turns out that our maze had wall all around. It was stored in a 2D array but the entire outer edge was wall. Why not do the same for the chess board:

-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1  0  0  0  0  0 -1 -1 
-1 -1  0  0  0  0  0 -1 -1 
-1 -1  0  0  0  0  0 -1 -1 
-1 -1  0  0  0  0  0 -1 -1
-1 -1  0  0  0  0  0 -1 -1 
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1

Now, as long as we start on the board, if the Knight jumps off the edge, it will end on a -1 square and backtrack. By modifying our data structure and data to contain a border, we've eliminated the special case of index out of bounds.

I always like doing that.

Some Links

Source code for Knight's tour

Fibonacci by the tail

We're ramping up for recursion in my junior classes - state space search, nlg(n) sorts, etc. As a refresher, we took a quick look at the Fibonacci numbers.

Now, some people seem to think that it's a tired problem. It's mathy, it's played out, it's boring etc.. They just might be missing the point.

The beauty isn't in the problem itself, but rather, that it's a platform on which you can look at many problem solving techniques.

We can look at the basic, straightforward , imperative solution:

{% highlight java %} public int fib1(int n) { int a=1,b=1; for (int i=0;i<n;i++){ int c=a+b; a=b; b=c; } return a; }

It's straightforward and fast - no recursion needed.

Next, we can look at the basic recursive version:

{% highlight java %} public int fib2(int n) { if (n<=1) return 1; else return fib2(n-1)+fib2(n-2); }

The advantages (of recursive solutions in general):

  • It's a direct translation from the recursive mathematical formula.
  • It's elegant, clean, and concise.
  • It can make hard problems much easier (see: Towers, Hanoi).
  • We can use same thought process that led to this solution to solve problems like finding our way out of a maze.

The downside:

  • It can be VERY slow.

So, how do we address this?

One way is via memoization - when we find a value, store it in a table, then we can use the look up table instead of recalculating over and over:

java public int[] fibnums = new int[100000]; public int fib3(int n) { if (n<=1) return 1; else if (fibnums[n] != 0) return fibnums[n]; else { fibnums[n] fib3(n-1)+fib3(n-2); return fibnums[n]; } }

This is a terrific thing to show a class since it's easy for students to wrap their heads around, it really speeds things up, and it's a precursor to lots of neat algorithms.

Finally, we can look at re-writing Fibonacci using tail recursion. This one can be a little hard for students to grasp. I like building it up from the iterative solution. In that solution, we use a, and b to "walk down" the list of Fibonacci numbers. At any point in time, a and b represent where we are in the sequence. We also use c but that's really just a temporary place to add a and b together.

The problem with doing this in a recursive solution is that we can't have a and b as local variables as each recursive call will have new a and bs and no information will be transferred.

Since we're working in Java, it doesn't take long for some students to come up with the idea of using instance variables to store a and b and just use the recursion for the "loop.":

{% highlight java %} public int a=1, b=1 public int fib4(int n) { if (n==1) return a; else { int c=a+b; a=b; b=c; return fib4(n-1) } }

Great, but using instance variables in this way is very inelegant and messy. Better, use extra parameters to store the values from call to call:

{% highlight java %} public int fib5(int n,int a, int b) { if (n==1) return a; else return fib4(n-1,b,a+b) }

Looking at Fib5(5) we get for n, a, and b:

  • 5,1,1
  • 4 1,2
  • 3,2,3
  • 2,3,5
  • 1,5,8

At which point we just return the 8

Clean, elegant, fast, and easy to understand.

Each of these four techniques are important and will be used time and time again and here we have one simple problem that allows us to explore them all.

Some Links

Project Euler: Problem #2 - Even Fibonacci numbers

Memoized Fibonacci Numbers with Java 8

The quadratic formula and Fibonacci numbers

Monte Carlo Simulations, Fibonacci Numbers, and Other Number Tests: Why Developers Still Need The Basics

TED: Arthur Benjamin: The magic of Fibonacci numbers - Arthur Benjamin (2013)

Fibonacci Numbers in the Real World

StuyCS family from coast to coast

I think my one regret over the years is that I haven't done much travel. So, when I had the opportunity to go to the 2014 Tapia conference, I jumped at the chance. I didn't get to see too much of Seattle, but that's OK. Now I've more incentive to go back.

In addition to seeing new sites, it also gave me an opportunity to see friends that I don't get to see too often.

I frequently talk about the StuyCS family. That it's all about people and all about community. I'm proud that I started this family and I'm always blown away by the guys. It's one thing for alums to swing by the school a couple of years after they graduate but it's another when they want to see me as much as I them even after ten or twenty years.

It was a little tricky to coordinate between conference commitments and their schedules but we ended up having two dinners.

First I got together with Mike (StuyCS '97), his wife Linsday, and Helene. I'll always remember Mike for being the police officer when he, William, Emil (I think) and Paul were the Village People for Halloween. It was amazing to see him, meet his wife talk about old times and see what he's doing today.

Helene isn't StuyCS, but she is an educator of a kindred spirit, over the years, I've met many CS educators. Some "get it" many don't. Helene clearly does. We clicked right off the bat. As with my alums outside of the city, I regret that we only get to see each other once every few years.

Two nights later, I got together with Sam (StuyCS '96), Daemon (StuyCS '06) and Matt (StuyCS '07) along with Boback, Alan, and Ephraim, three other high school teachers who attended the conference.

I hope we didn't bore the teachers with our high school stories, but I really had a blast. We talked about old times, we talked of current issues, good food and good friends. We spanned 27 years of Stuy but that didn't matter. Get a bunch of bright interesting like minded people together and good things happen.

I feel amazingly blessed that I can travel across the country and students from 10 and 20 years ago want to get together to catch up. I guess I should really say friends, not students.

I've been thinking a lot about my career as I close in on 25 years of teaching. What impact I've had, what I could have done better, what I did right.

I think I can live with the StuyCS family as my legacy.

Tapia 2014

Spent the past few days at the Tapia Conference in Seattle.

The Tapia conference is billed as a "Celebration of Diversity in Computing." The bulk of the attendees seemed to be students. Undergrads and grads. The undergrads were mostly juniors and seniors, but a bunch of freshmen and sophomores were there as well. Of course there were faculty members to join them as well as people from industry.

What was I doing there? Well, 20 high school teachers were given scholarships to attend a workshop run by Dan Garcia. Dan is the driving force behind one of the implementations of CS Principals course - a new course being rolled out by the College Board. As you're all well aware, I'm not a fan of the College Board but I'm interested in Dan's work as common friends speak well of him.

I'll talk about his course next time, after I've gone through his workshop. For now, let's talk about the conference.

I wasn't sure what to expect but let me tell you, I was blown away. Many of the break out sessions didn't "speak" to me as a high school teacher, but that's fine - the conference was for the youngsters. I can't say for sure, but it seemed to me that they really hit the breakout sessions on the head.

Then there were the plenaries. Great speakers and a great range of topics.

They had:

  • Cieko Asakawa talking about assistive technologies
  • Dan Garcia on CS Education
  • Latanya Sweeney - transparency and privacy
  • James McLurkin - robotics (and some computational geometry)
  • Kathryn McKinley - Systems
  • Marcus Mitchel - A general talk on hard problems, winding paths, and more

An incredible rage of topics. All the speakers were terrific - they all talked about their life and career paths and were all inspirational

They also presented a smorgasbord of computer science possibilities for the students.

Wow.

I wish this could be bottled up and given to all aspiring technologists.

I only got to attend because I was on scholarship as a high school CS teacher. It was an amazing few days and I can't recommend the Tapia conference highly enough for young computer scientists out there.

Shell Games - an introduction

A few weeks ago, I noticed this Twitter conversation between Alfred Thompson and Steve Keinath

I briefly considered proposing a session for the conference but it was just a day or two before the deadline, I don't know if I'm going to be able to attend the conference, and besides, who said anything I proposed would be accepted.

Still, I liked the idea - I've been an educator for 23 years, a Linux user for most of that time and an Unix user for longer. I'm a firm believer in operating system as toolkit and so I think I'll take Steve and Alfred's suggestion and try to put together a series of posts on using Linux from a CS educators point of view.

So, before we begin - a little background.

I can proudly say that I've been Windows free since about 2000. That's when I decided to wipe the lat traces of Microsoft from my hard drives. Prior to that I just booted up MS-DOS or Windows to play games or to use a Excel or Word.

Since the early days of Linux - back before Slackware, I dual booted. Before Linux, I dialed into public Unix systems such as Panix or The Big Electric Cat. At home, I tried to make MS-DOS as Unix like as I could. I ran the MKS Toolkti, and used my own shell (a project every young programmer should attempt).

Why am I posting this now? It's a new semester and I find myself, as usual, leveraging the Linux shell. It was time to set up a mailing list for the class.

I'm able to go to our school's data system and grab a tab delimited file that looks something like this:

Code    Section Period  Last    First   ID  Official    Advisor OSIS    Email
grY22tBs    01  6   Hxk Blu GFy 9272    7rr gEs 274989649   zlu3lxk@QylKR.oqy
grY22tBs    01  6   HiQqvlRu    Blku    9918    7PP YHZHm   200878353   zzl8@yu.oqy
grY22tBs    01  6   plxk    ClSKv   9226    7II PHXrNY  274661826   olxkvl@QylKR.oqy
grY22tBs    01  6   pxKk    BqVxFl  9026    7II PHXrNY  224608174   zo6461@lqR.oqy
grY22tBs    01  6   pqxuk   NRK 9234    7dd gHAMmNd 270217219   uRKo90@QylKR.oqy
  

It's tab delimited but I scrambled the letters so as to not reveal any student info.

Oh, how did I do that scrambling? Easy. First, I combined some basic utilities to make a random permutation of the upper and lower case letters and stored them in a shell variable. Don't worry, I'll explain these commands in upcoming posts:

perm=`echo "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ | sed "s/\(.\)/\1\n/g" | sort -R | tr -t "\\n" ":" | sed "s/[^a-zA-Z0-9,@]//g"`

Then I used tr (translate) to exchange the real letters for the matching letter in the random permutation:

cat students.tsv | tr a-zA-Z $perm > students.scrambled

So back to the real work. I needed to isolate the students email addresses. The process:

  1. convert the tabs to commas
  2. Pull out the students in my AP class (code MKX22X) from the list of all students
  3. Pull out the 10th column
  4. These are the emails, save them to a file
  5. So, I typed:

    cat students.tsv | grep MKS22X | sed "s/\t/,/g" | cut -d, -f10 > emails

    grep filters out lines that have MKS22X in them and sed replaces the tabs (\t) with commas and cut pulls out the email addresses. It's all stored in a file named emails.

    Now, I just have to import these into my maillist software (mailman).

    add_members -r emails myclasslist

    So, that's it, easy peasy.

    I'll be away for most of this week at the Tapia conference and then I'll be playing catch up, but I'm hoping to do a series of posts talking about my Linux toolset and how I use it.

    I hope you all find it interesting and maybe even useful.

Summing up at SumAll

Our hackers have been coming down to SumAll every Saturday for a while now. It's a great space and we've been given run of the office. That said, the hackers really had no idea what SumAll is all about.

Today, we had our first guest speaker. Davin Chew, founder and CTO of SumAll. Davin's also a member of the StuyCS family, he was my student back in the nineties. When Davin heard we needed space for our Saturday program, he and the SumAll team graciously offered to host.

Davin spoke to the hackers about what SumAll does, how it's trying to, in Davin's words "Democratize data." How they're building a platform that lets a small business or even an individual really study their data. They've got some great visualizations and tools and you all should check them out (and yes, according to their web site, there is and always will be a free tier).

Davin talked a little about how he and his co-founder sat in the room we were all in and how they came up with SumAll. He talked about passions, starting out a new compnay and a variety of other topics.

It's always great to connect young hackers with people doing great work in the real world so they can get a glimpse of the awesome things they will be able to accomplish in the coming years.

Some Links:

Dane Atkinson of SumAll: A Big Data Strategy That Goes Beyond Cutting and Pasting

My Top 5 Social Media Analytics Platforms

What I learned from Pete Seeger

This past Monday I, like may woke to hear about the passing of Pete Seeger. Ten days earlier we were up at Symphony Space for the "Woody's Children" 45th Anniversary Concert. Pete was supposed to perform but was ill. At 94 years of age, I wondered if the end was near. I was at the concert with my family as well as with my buddy Ben. Fitting since Ben and I grew up on Pete's music while many of our peers did not.

If you're not familiar with Pete Seeger, his life and his work, I really encourage you to check out the 2007 documentary The Power of Song. He was truly a remarkable man. One of the most influential singer, songwriter, activists of our times yet remarkably humble and approachable.

I remember seeing Pete when I was a schoolboy. He would have been in his fifties at the time. Thirty years later, he was singing for my son's second grade class in the same public school in New York City.

Pete's been eulogized by far better writers than I so let me just share some of Pete's influences on me.

Something that Pete said, highlighted in the PBS documentary - "Think global, act local." Peter attributed it to Rene Dubos. As a teacher and someone trying to make a difference through education, I think about this a lot.

As teachers, we have to act locally. We work with our kids every day and can make a difference in individual lives. While we're doing this, the global picture is a mess. Teachers are vilified, schools are closed, and policy is set by people who have never been in the classroom.

There is hope, however. Just like Pete and the Clearwater movement local action, spearheaded by the likes of Diane Ravitch seems to be taking hold. We're starting to see common core exposed and the types of evaluation systems pushed by the Gates foundation debunked. Think global, act local.

Now, as you know, I'm a computer science educator and there are many similarities between CS education and general education. Most of the noise you hear is from non educators, or at least non K-12 educators with actual real experience and a real track record.

These voices range from well informed, to well meaning, to misdirected.

Last year the CSTA held a twitter conversation with K12 CS Education thought leaders and there wasn't a k-12 teacher amongst the listed participants.

Code.org, a well meaning organization that has done a great job exciting people abotu CS Ed has a quotes page. The closest thing to a teacher there is Randi Weingarten and she was just in the classroom for a cup of coffee. The only other person on the list with "teach" in their title is Wendy Kopp who believes that poorly trained temps make better teachers than experienced pros.

So, how will the conversation turn? Will groups and individuals with influence engage those of us on the ground with actual results, or will will receive dictates from on high? Time will tell.

Meanwhile, all I can do is continue to act local. I can't say what will happen in the greater CS education landscape but I know that every day I will see 150 kids and 20 more every Saturday. My kids, who will be part of the Stuy CS and now CSTUY family. They might not know now what they're getting, but I do and I know I've made a difference.

Show me the data

Show me the data

Last time, I shared my thoughts on the recent coverage of AP CS statistics. Compiled and presented by Barbara Ericson and later reported on all over the place.

To summarize my point of view - yes there is a problem but some of the highlighted data points can easily be shown to be irrelevant and therefore can hurt the greater goal of access to great CS education for all.

This felt like the launching pad for a great class discussion.

I presented this to my classes last week:

and asked "what do you make of this?" I was curious what the classes would say - they know I'm concerned with gender equity and know that at Stuy we've done better than most.

I can say that I was very happy with their answers.

We got things like:

How many took the test?

What states? How many African-Americans in ....

and

can we see the numbers?

So, we started talking about the actual numbers. It's amazing when you actually look at the facts, truths come out.

I was extremely pleased that my students saw right past the spin and searched for the actual numbers.

In my last post I explained why the stated claims - no girls in three states, no blacks / Hispanics on 11 were horrible examples. Why focus on these when it's so easy to find better examples:

The overall 20% female test takers is an easy stat to point to, but why point out Alaska's lack of African-American test takers when it's statistically irrelevant. Why not point out that in NY, only 68 African-American test takers and 150 Hispanic (out of 1858 test). Given that NY is about 16% black ad 17% Hispanic, we would have expected around 300 from each group.

In any event, I'm delighted that my students are perceptive enough to see past headlines, dig down to data, and make their own conclusions.

At Last - CS Gender Equity in Multiple States!!!!!

Some buzz seems to have circulated on the data compiled by Barbara Ericson on AP Computer Science test takers in 2013. In addition to going to the source there was a piece in The Atlantic and in Education Week.

Some exciting results:

  • In two states just as many girls passed the AP CS exam as guys!!!!!
  • In two other states, just 6 more guys passed than girls!!!!!
  • In yet another state, 100% of the girls passed, doing far better than the approximately 75% of the boys that passed!!!!

Wow, we're doing terrific.

Ok, maybe not so much.

But then, I'm not saying anything false, just misleading.

And that's the problem with the articles I've been reading on Barbara's report. Take a look and you'll see. The Education Week headline is:

No Girls, Blacks, or Hispanics Take AP Computer Science Exam in Some States

Both articles talk about how no girls took APCS in three states. Later in the article, it clarifies the situation - Wyoming had no test takers male or female, Mississippi had only 1, which wasn't a passing grade, and Montanna only 11 test takers total.

I find the headline disingenuous.

There are other tidbits like the fact that there were no hispanic test takers from Alaska. No mention of the fact there were only 21 test takers overall and that Alaska only has 6% of it's population identifying itself as hispanic (compared to a national number of 16%).

Clearly we have problems -- not enough students are exposed to good computer science teaching and there certainly are under-represented groups, but I feel that this type of reporting detracts from the cause. All you have to do is look at the data and the credibility of the reporter is shot.

I see the same type of misleading remarks made by drop in programs that claim to solve the problem -- "your kid will learn to make an app in N weeks," or "our kids are better prepared for the AP exam than kids who take the AP class." That's also all nonsense but these folks are selling a product so at least it's a little more understandable.

Then there's the question of using the AP exam as a measure in the first place - I for one don't feel that AP CS is a particularly good course and I'm not sold on CS Principles as the answer either.

In any event, there are a lot of important inferences we can make from the report and I encourage you to download the spreadsheet and take a look.

Not enough students are being exposed to computer science and not all groups are getting the same exposure, but when we spin the data rather than illuminate it, we don't really help our cause.

Rot13 - Gateway <s>Drugs</s> Techniques

I've written before about That One Inspirational Curriculum - the idea that it's not the topic in the syllabus but rather what the teacher does with it.

Some times a simple problem can lead to some really neat concepts.

Take what we did in my AP classes over the past couple of days.

I wanted a nice little warm up after the break so we wrote a simple rotation cipher. We started with a little encode routine - take a string and rotate the letters by some offset. For example if we use an offset of 3, 'hello' becomes 'khoor' - each letter being shifted over thee places.

Pretty easy for the class but even a simple problem like this lets us talk about a few things, including:

  • we can exploit the ASCII ordering but have to remember to deal with the offsets. That is in ASCII, an 'a' is 97, we can't just calculate c = (c+offset)%26. We have to first shift the letter down to 0, add and mod, and then shift back c = ((c-'a') +offset)%26 + 'a'
  • We can talk about neat Internet history such as how (rot13)[http://en.wikipedia.org/wiki/ROT13] was used to hide spoilers and offensive material on the internet.

I then ran a program that broke the encryption. I also showed the students how it didn't work on single words but did on full sentences.

By hand, decodingn even a simple cipher is a pain.

With computer assist, it's easy - just print out all 26 possible rotations and pull out the right one.

Our question was how do we have the computer do it all on its own?

I asked them to think about how they might write a program to accomplish this -- and that's when the magic starts.

They came up with a few interesting ideas:

  1. For each of the 26 rotations, choose the one with the most vowels.
  2. For each word in each rotation, give a point for each word with a vowel and choose the rotation with the highest score.
  3. Look for common short words in each rotation and then check other words against a dictionary.
  4. Do the letters appear with the same frequencies as in the English language

We noticed that all of these suggestions are based on our knowledge of English. What if the message was in a different language or even a different alphabet?

We decided to use choice 4 - the letter frequencies.

Magic part 1 - using known data to figure out a pattern for unknown data

Even if we don't know the frequencies, if we can get a sample document in our language, we can figure them out. We downloaded a text from Project Gutenberg and used it to build an array of 26 elements called CorpusFreqs. CorpusFreqs[0] would hold the frequency of the letter 'a' in our sample document (that is, how many times 'a' appears over the total number of letters in our document), CorpusFreqs[1] the frequency of 'b' etc.

Now we have a model that we can use to compare our rotations to.

Magic part 2 - wow, that 9th grade math is actually useful

At this point, it usually isn't clear how to compare our rotations to the model frequencies we calculated. Time to simplify,

We can look at another problem: Selecting a house.

Me: if we can only have one criteria, what would it be?

Class: Neighborhood

Me: Ok, let's rate each neighborhood between 0 and 100

Me: We can draw two houses on a line, which is better?

Class: The one with the larger value!

Me: What if we add a third house? Which is it more similar to?

Class: The one it's closer to.

Next:

Me: Well, what if we add another feature? Cost - let's map low cost to 100 and high cost to 0 and all the other costs in between.

Me: If we want to visualize a house, how can we do it?

Class: We can use a graph - like x,y points use location,cost points.

So we do it.

Me: This is the least desirable house (0,0) and this is the best one (1,1). I plot two houses and ask

Me: Which is more desirable?

Class: That one (they indicate the one closer to 1,1).

Me: How can you tell

Class: It's closer

Me: How do we figure it out?

Class: The Distance Formula!!!!!

So now we add a third feature - size. It's pretty easy to show that the distance formula extends to three-space and in fact to even higher dimensions.

Magic part 3 - from 3 dimensions to 26

So now, we bring it back to our cipher. For the house example, we used points in 1,2, and 3 dimensions (and we actually talked about it in 4D) so we used 1 through 4-space vectors to represent the points and used Euclidean distance formula to see what houses were similar to each other, or what points where near to each other

From there, it's easy to see that the frequency array we built from ou sample text is a 26-space vector and that we can build a similar vector for each rotation.

From there we can use the distance formula to see how close each rotation is to the sample document vector. The rotation with the closest vector is pobably our solution.

So from a simple warm up problem we're at the gateway to some serious techniques that will come back time and time again as the students move through their CS education and careers.

Fun!




Enter your email address:

Delivered by FeedBurner

Google Analytics Alternative