Posts about introhttp://cestlaz.github.io/categories/intro.atom2018-09-19T23:47:52ZMike ZamanskyNikolaInverted Index Projecthttp://cestlaz.github.io/posts/inverted-index/2016-11-26T16:52:56-05:002016-11-26T16:52:56-05:00Mike Zamansky<div id="outline-container-org5455010" class="outline-2">
<h2 id="org5455010"></h2>
<div class="outline-text-2" id="text-org5455010">
<p>
I haven't spoken much about the class I've been teaching this
semester. It's an intro CS course - a programming heavy intro. I
decided to use Python with a transition at the end to C++. The
transition is to mirror Hunter's normal first CS course that ends with
a C++ intro to prepare the students for next semester's CS course
which is a more intense OOP class using C++ - the language we use in
our core courses.
</p>
<p>
Throughout the semester I've tried to use a variety of interesting
application areas so as to try to give the students some idea of the
possibilities that studying CS will open up for them.
</p>
<p>
After covering Python dictionaries and lists I thought we'd play by
building an inverted Index.
</p>
<p>
The basic idea is to map a set of words back to source files. For
example, given the following four one line files:
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left">
<col class="org-left">
<col class="org-left">
<col class="org-left">
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">files</th>
<th scope="col" class="org-left"> </th>
<th scope="col" class="org-left"> </th>
<th scope="col" class="org-left">contents</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">file.01</td>
<td class="org-left"> </td>
<td class="org-left"> </td>
<td class="org-left">if you prick us do we not bleed</td>
</tr>
<tr>
<td class="org-left">file.02</td>
<td class="org-left"> </td>
<td class="org-left"> </td>
<td class="org-left">if you tickle us do we not laugh</td>
</tr>
<tr>
<td class="org-left">file.03</td>
<td class="org-left"> </td>
<td class="org-left"> </td>
<td class="org-left">if you poison us do we not die and</td>
</tr>
<tr>
<td class="org-left">file.04</td>
<td class="org-left"> </td>
<td class="org-left"> </td>
<td class="org-left">if you wrong us shall we not revenge</td>
</tr>
</tbody>
</table>
<p>
You could build a data structure mapping each word back to the file(s)
that contain it (partially shown here),
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left">
<col class="org-left">
<col class="org-left">
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Word</th>
<th scope="col" class="org-left"> </th>
<th scope="col" class="org-left">Files containing It's</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">if</td>
<td class="org-left"> </td>
<td class="org-left">file.01 file.02 file.03 file.04</td>
</tr>
<tr>
<td class="org-left">you</td>
<td class="org-left"> </td>
<td class="org-left">file.01 file.02 file.03 file.04</td>
</tr>
<tr>
<td class="org-left">prick</td>
<td class="org-left"> </td>
<td class="org-left">file.01</td>
</tr>
<tr>
<td class="org-left">us</td>
<td class="org-left"> </td>
<td class="org-left">file.01 file.02 file.03 file.04</td>
</tr>
<tr>
<td class="org-left">do</td>
<td class="org-left"> </td>
<td class="org-left">file.01 file.02 file.03</td>
</tr>
</tbody>
</table>
<p>
You can, of course, store more information - how many times a word
appears in a file, where it appears, etc.
</p>
<p>
This is a fairly easy structure to build. A dictionary where the keys
are the words in the file and the values are lists of the documents
containing the words.
</p>
<div class="highlight"><pre><span></span> <span class="n">inverted_index</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'if'</span> <span class="p">:</span> <span class="p">[</span><span class="s1">'file.01'</span><span class="p">,</span><span class="s1">'file.02'</span><span class="p">,</span><span class="s1">'file.03'</span><span class="p">,</span><span class="s1">'file.04'</span><span class="p">],</span>
<span class="s1">'you'</span> <span class="p">:</span> <span class="p">[</span><span class="s1">'file.01'</span><span class="p">,</span><span class="s1">'file.02'</span><span class="p">,</span><span class="s1">'file.03'</span><span class="p">,</span><span class="s1">'file.04'</span><span class="p">],</span>
<span class="s1">'prick'</span> <span class="p">:</span> <span class="p">[</span><span class="s1">'file.01'</span><span class="p">],</span>
<span class="s1">'us'</span> <span class="p">:</span> <span class="p">[</span><span class="s1">'file.01'</span><span class="p">,</span><span class="s1">'file.02'</span><span class="p">,</span><span class="s1">'file.03'</span><span class="p">,</span><span class="s1">'file.04'</span><span class="p">],</span>
<span class="s1">'do'</span> <span class="p">:</span> <span class="p">[</span><span class="s1">'file.01'</span><span class="p">,</span><span class="s1">'file.02'</span><span class="p">,</span><span class="s1">'file.03'</span><span class="p">],</span>
<span class="o">...</span>
<span class="p">}</span>
</pre></div>
<p>
In addition to letting us work with dictionaries and lists, we can
also review file access and even the python CSV module if we want.
</p>
<p>
We can immediately write simple queries – "what document(s) contain
the word 'prick,' but things get more interesting if you write
functions to perform <b><b>and</b></b> and <b><b>or</b></b> queries - "what document(s)
contain the words 'prick' <b><b>or</b></b> 'do'" for instance.
</p>
<p>
Why are we building this (besides as a data structure and programming
exercise)? I've seen a number of references to using an inverted index
when building a web search engine. In fact, I think that's something
you do early on in the Udacity Mooc. I just wanted to play with
information retrieval.
</p>
<p>
I remembered that there was a collection of information, including
last statements from <a href="https://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html">executed offenders in Texas</a>. Someone conveniently
converted it into a <a href="https://docs.google.com/spreadsheets/d/1HAjZTtPriClY-X3n9whTkA4i5D7bn3bmtHnLoaVICvI/edit#gid=1">Google Spreadsheet</a>. The format's a little
different from our simple four file example but then there's more
data. It's straightforward enough to download the spreadsheet as a CSV
file and then read it with a Python program that builds it into an
inverted index.
</p>
<p>
Now we have some interesting data to play with.
</p>
<p>
How many offenders used words like "sorry" or "apologize?" How about
references to religion? We can do all sorts of <b><b>and</b></b> and <b><b>or</b></b>
queries.
</p>
<p>
We just played with this a bit but I could see all sorts of
explorations. What about taking some great work of literature and
turning it into an inverted index by chapter. You could query
characters or certain words and see where and when they appear in the
book. A new and different way of exploring literature.
</p>
<p>
So, there you have it - an interesting little project we played with
this past semester. We did it in an intro Python course but I could
see it as an interesting project in AP CS A using hashmaps and lists.
</p>
</div>
</div>Madlib Madnesshttp://cestlaz.github.io/posts/2013-04-30-Madlib_Madness.md/2013-04-30T00:00:00-04:002013-04-30T00:00:00-04:00Mike Zamansky<div><p>Earlier in the term, our intro classes spent a little time learning
some basic HTML. We don't spend a lot of time on it, just enough so
that the students can present their work in a static web site. The end
goal, though, was to programatically generate the web sites - there's
nothing quite as empowering to a student as when they can present their
work to the world.</p>
<p>Finally, it's all coming together.</p>
<p>Now that the classes are comfortable with Python, we can have some
fun. We all remember Mad Libs - that wacky word game where you select
unknowingly select words to substitute into a basic story and
hilarity ensues.</p>
<p>We did our own versions using Python files, lists and
dictionaries. </p>
<p>Here are some of the results:
1. <a href="http://homer.stuy.edu/~richard.zhan/19-Madlibs.py">
http://homer.stuy.edu/~richard.zhan/19-Madlibs.py
</a>
2. <a href="http://homer.stuy.edu/~veronika.azzara/madlibifystory.py">
http://homer.stuy.edu/~veronika.azzara/madlibifystory.py
</a>
3. <a href="http://homer.stuy.edu/~belinda.liang/18-MadLibsMiniProject.py">
http://homer.stuy.edu/~belinda.liang/18-MadLibsMiniProject.py
</a>
4. <a href="http://homer.stuy.edu/~kyle.oleksiuk/MadlibifyProject5.py">
http://homer.stuy.edu/~kyle.oleksiuk/MadlibifyProject5.py
</a>
5. <a href="http://homer.stuy.edu/~phillip.huynh/story.py">
http://homer.stuy.edu/~phillip.huynh/story.py
</a></p>
<p>The students wrote a basic story with substitution points. Their
programs then randomly replaced these points with words from an
assortment of categorized lists. </p>
<p>Enjoy!!!!!</p></div>Who won the election -- Quadratic to Linear Time!!!!!http://cestlaz.github.io/posts/2013-03-23-Who_won_the_election-Quadratic_to_Linear_Time.md/2013-03-23T00:00:00-04:002013-03-23T00:00:00-04:00Mike Zamansky<div><p>Last week was crazy. Busy, stressful, late night after late night. It
ended, though, on a great note.</p>
<p>A young lady in my intro class found me in my office near the end of the day:</p>
<blockquote>
<p>Student: Mr. Z, I wanted to make sure to catch you before vacation!</p>
<p>Me: What's up?</p>
<p>Student: I wanted to tell you that today's lesson was AWESOME!!!!!!</p>
</blockquote>
<p>Wow. I've been teaching 23 years and that's never happened before!!!!</p>
<p>So, what was the hubbub about?</p>
<p>We've been doing list processing in Python over the past few days. We already did the basics, such as finding the largest element in a list:</p>
<p _="%" endhighlight>{% highlight python linenos %}
def find_max(L):
maxval = L[0]
i=0
while i<len if l>maxval:
maxval=L[i]
i += 1
return maxVal</len></p>
<p>We've also done basic searching, counting elements, removing elements, etc.</p>
<p>Today we started with finding the mode of a list of grades.</p>
<p>Most students approached the problem as a maximum problem. Assume the
first item is the mode and find it's frequency, then proceed through
the list each time seeing if the current node occurs more fequently
than the "mode so far." Pretty much the same idea as find_max (but in this case, returning a list of all the modes).</p>
<p _="%" endhighlight>{% highlight python linenos %}
def mode(L):
modecount = L.count( L[0] )
modes = [ L[0] ]
i = 1
while i < len(L):
c = L.count(L[i])
if c > modecount:
modecount = c
modes = [ L[i] ]
elif c==modecount and L[i] not in modes:
modes.append( L[i] )
i += 1
return modes</p>
<p>Pretty cool. The kids are doing something pretty sophisticated here. </p>
<p>Time to look deeper. We started running this on larger and larger data
sets. Things started really slowing down at about 20K. We then timed
things to get some numbers (thanks
<a href="http://stackoverflow.com/questions/5998245/get-current-time-in-milliseconds-in-python">StackOverflow</a>). </p>
<p>What was going on. The students pretty quickly honed in on the line
that called L.count(L[i]) -- <strong>Hidden Complexity</strong>. </p>
<p>We haven't done big-O notation but the class easily saw that count had
to go through the entire data set and we ended up with an N^2
algorithm. For example, if we have 10 items, the main loop executes 10
times and each time, count goes through the entire list (10 items) as
well. If we go to 100 items, it becomes 100x100.</p>
<p>What to do????</p>
<p>Time to talk about what's probably the most discussed instance of mode
finding - elections. The winner is "the mode of the ballots."</p>
<p>Of course we don't use the above algorithm. We usually tally or count the ballots. We go through the ballots once, each time adding one to the appropriate candidates "bucket." </p>
<div align="center">
<a href="http://cestlaz.github.io/img/tally.png" rel="lightbox">
<img width="50%" src="http://cestlaz.github.io/img/tally.png" class="" alt="">
</a>
</div>
<p>From here, it's a short step to see that we can use a list. It's
indices represent the grade values and the data in the list the counts
or tallies:</p>
<p _="%" endhighlight>{% highlight python linenos %}
def fastmode(L):
i=0
counts = []
while i<max(L)+1:
counts.append(0)
i+=1
i=0
while i < len(L):
counts[ L[i] ] += 1
i += 1
modecount = max(counts)
modes = []
i=0
while i < len(counts):
if counts[i]==modecount:
modes.append(i)
i=i+1
return modes</p>
<p>We go through the list once to build the tallies and then the "tally"
list once to get the modes. Simple, straightforward, and linear
time!!!!!!!!!</p>
<p>The original routine started to hit a roadblock at about 20K items,
here we got to one million without breaking a sweat.</p>
<p>The take away:</p>
<ul>
<li>Get it working first.</li>
<li>Then profile to find your bottleneck</li>
<li>Look at the problem in a different way</li>
<li>Using data structures in a clever way can really improve performance.</li>
</ul></div>Layers of a lessonhttp://cestlaz.github.io/posts/2012-12-17-layers-of-a-lesson.html/2012-12-17T00:00:00-05:002012-12-17T00:00:00-05:00Mike Zamansky<div align="center">
<img width="50%" src="http://cestlaz.github.io/img/turtle-anim.gif" class="" alt="">
</div>
<p>
</p><p>
My last post I was talking about the fact that as teachers, our
knowledge and experience is frequently trivialized. The tenor of the
times is that anyone can design a course, anyone can teach, and in
fact, we don't even need teachers, just videos or computer based
systems. If you've ever tutored a friend, you're more than qualified.
</p><p>
That might be a strong statement but everywhere you look you see
"education" programs designed and implemented by non teachers. It
seems that it's believed that teaching only involves the most
superficial of transfers of information.
</p><p>
Today, I thought I'd look at a lesson I taught the other week. How
I've seen similar material presented and how my colleagues and I might treat
the subject.
</p><p>
We use <a href="http://ccl.northwestern.edu/netlogo/">NetLogo</a> in
our Sophomore level intro course. It's a highly parallel version of
logo. It's very visual, it's great for modeling and you can introduce
deep, meaningful concepts such as parallel processing in a gentle
manner.
</p><p>
Early on the kids have to learn how to manipulate the turtles. In
NetLogo you write a single program and it's run by all the turtles "at
once." The image above is one of their early "experiments." Have the
turtles wiggle out of the center, but when they get to an invisible
border, start spinning. They do a number of variations on this theme.
</p><p>
A solution might look like this:
</p><pre>
; asked in a turtle context
to gospin
ifelse abs xcor < 8 or abs ycor < 8
[ wiggle ] ; wiggle implementation not shown
[ left 5 ]
end
</pre>
<p>
Let's call level one just talking about the solution by looking at the
program as a sequence of instructions. Specifically relating the
instructions to the problem, showing how it solves it, and that's it.
</p><p>
This is the simplest level. A book, video, or online courseware can
approach teaching at this level. A non computer scientist teacher or a
non teacher computer scientist could do so as well. Students might
learn a bit but I wouldn't hope for much inspiration or
creativity to come out of it.
</p><p>
Let's move to level two.
</p><p>
Here we might talk about "what the turtles are doing." They're always
doing something, either wiggling or spinning. This is a step in the
right direction. When done right, the students start thinking about
the problem in a more general sense but they're still looking at the
problem as something that exists only in the world of NetLogo. They
are more likely develop patterns than in level one, but it's still
limited.
</p><p>
Level three is where things get interesting. On the surface, the
problem is just a nice introduction to programming turtles in
NetLogo. At a deeper level, it's an opportunity to introduce the kids
to State Machines. A new way of thinking about problems and problem
solving.
</p><p>
Students understand the idea of a "state." For example, in class,
they're in a "seated state," maybe in a "note taking state," etc. It's
easy to see that they don't know what their day will bring but they
constantly make decisions based on their "state." Likewise, they can
think about the turtle as in a state. It's either in a wiggling state
or a spinning state and based on their situation they can either
continue in their current state or transition to the other one:
</p><div align="center">
<img width="50%" src="http://cestlaz.github.io/img/spin-state.png" class="" alt="">
</div>
This opens up a new way of thinking and it's easy to see how this
extends to other problems, for example, a ghost from pacman:
<div align="center">
<img width="50%" src="http://cestlaz.github.io/img/pacman-ghost-state.png" class="" alt="">
</div>
A good teacher thinks about working across these levels. He adjusts
based on the class and looks for opportunities to develop these deeper
concepts.Pair Programming Tag Team Shootouthttp://cestlaz.github.io/posts/2012-03-01-pair-programming-tag-team-shootout.html/2012-03-01T00:00:00-05:002012-03-01T00:00:00-05:00Mike ZamanskySo today we changed things up a bit.<br><br>Instead of having a typical lab type periods, we tried the Pair Programming Tag Team Shootout.<br><br>We aren't annualized so while the kids that have been with me since September have been working in pairs for a while, the other half of the class is just getting used to how we do it. I also wanted to get the kids to mix a little more.<br><br>Hence the shootout.<br><br>Everyone got a sheet with a bunch of problems on it:<br><br><a href="http://www.scribd.com/doc/83400730/Shootout" style="-x-system-font: none; display: block; font-family: Helvetica,Arial,Sans-serif; font-size-adjust: none; font-size: 14px; font-stretch: normal; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal; margin: 12px auto 6px auto; text-decoration: underline;" title="View Shootout on Scribd">Shootout</a><iframe class="scribd_iframe_embed" data-aspect-ratio="0.772727272727273" data-auto-height="true" frameborder="0" height="600" id="doc_39108" scrolling="no" src="http://www.scribd.com/embeds/83400730/content?start_page=1&view_mode=list&access_key=key-2kfbc4856l652aq1j5ik" width="100%"></iframe><br><br><br>I then paired them off randomly.<br><br>The idea is complete the first problem, find a new partner, repeat.<br><br>By the end of the period each student worked with between five and seven partners.<br><br>I'm having them send me their solutions and partners tonight.<br><br>The early response was good -- it's speeding up them getting to know each other and it was a nice change of pace. We had some problems coordinating switching problems, but we'll do better next time.<br><br>All in all a good day.<br><br><br><div class="blogger-post-footer"><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/468689896075458340-8898870113233862529?l=cestlaz.blogspot.com" alt=""></div>Let me Google that for youhttp://cestlaz.github.io/posts/2012-02-08-let-me-google-that-for-you.html/2012-02-08T00:00:00-05:002012-02-08T00:00:00-05:00Mike ZamanskyPiloting a new course this semester - Intro to Computer Science part 2. Between the existing Intro part 1 and this, we should be able to do a pretty thorough job in preparing our kids for the future.<br><br>We decided that we wanted the kids to make deliverables in the form of web pages - plain old html written by hand. Part of the idea was to demystify things, part was to let the kids show off their work, part was to have something that they can generate programatically as the course progressed, and part was to give them a tool they might find valuable beyond their computer science classes.<br><br>We also wanted to help teach the kids how to find information and how to learn things on their own. Despite the fact that our students use computers all the time, they possess a widely varying skill set. With that in mind, here's what we tried to do:<br><br>After a brief introduction to what a web page is (just a text file with markup) and showing them the bare<br>minimum of markup:<br><br><br><script src="https://gist.github.com/1775745.js?file=simple.html"></script><br><br>I recommended a simple editor - gedit - while resisting all my inner urges for all things emacs, and then showed them an image of a web page:<br><br><div class="separator" style="clear: both; text-align: center;"><a href="http://www.stuycs.org/courses/ml2/zamansky/work/hw-1/SampleHTMLWebPagePicture.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="177" src="http://www.stuycs.org/courses/ml2/zamansky/work/hw-1/SampleHTMLWebPagePicture.gif" width="320"></a></div><br>The end goal was to make a page that had all of the elements in the above image but I also asked:<br><br><ul><li>How did they go about finding out how to make the page?</li><li>Where did they search?</li><li>what turned up bad results (and what were they)?</li><li>what turned up good results (and what were they)?</li></ul><div>I was very pleased with the results. Just about all the kids are now able to make a web page with the components in the image above. More importantly, this is what came out of our discussion:</div><div><br></div><div><ul><li>Everyone used Google exclusively as a search engine.</li><li>The range of queries ranged from things like "html tutorial," "making a web page," and just plain "html" to maybe not so good things like "gedit web page."</li><li>No one used social search or used facebook.</li><li>They mostly all found sites such as w3schools. </li></ul><div>I'm hoping this is a good first step in having the students find things on their own and not be afraid to try things. I think it's an encouraging start.</div></div><div><br></div><div><br></div><div><br></div><div><br></div><div class="blogger-post-footer"><img width="1" height="1" src="https://blogger.googleusercontent.com/tracker/468689896075458340-984256400226014120?l=cestlaz.blogspot.com" alt=""></div>